On Thu, Oct 11, 2012 at 11:03 AM, Pablo N. Mendes <pablomen...@gmail.com>wrote:
>
> Good point, Tom.
>
> That article is in Category:Hindi_films, not Category:Hindi_songs and it's
>> a Film, not a song, so it's not going to meet the requirements of your
>> query.
>
>
> But maybe the class hierarchy comes to the rescue (Work is a supertype of
> Song and Film)?
>
The main point is that the extracted triples are semantic nonsense because
they conflate multiple subjects under a single URI.
You've got
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/property/runtime>
"9300.0"^^<http://dbpedia.org/datatype/second>
.
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/property/length>
"276.0"^^<http://dbpedia.org/datatype/second>
.
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/property/length>
"1908.0"^^<http://dbpedia.org/datatype/second>
.
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/property/length>
"221.0"^^<http://dbpedia.org/datatype/second>
.
Which is the length of what? They all refer to the same subject.
Similarly "Pappu Can't Dance" isn't a song. It's an (alternate?) title for
the film according to the RDF. A human knows it's a song because of the
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/property/title> "Pappu
Can't Dance"@en .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/ontology/Work/runtime>
"155.0"^^<http://dbpedia.org/datatype/minute>
.
To make what Venkatesh wants to work happen, you'd need to teach the
extractor to figure out what the "main" subject of a page was and then have
it mint new subject URIs for all related concepts represented on the page
which are different (and don't have their on Wikipedia page) such as sound
track album, songs on a sound track album, etc. Then you'd also need to
teach it that the physical proximity of the track listing and the
soundtrack infobox implies that they refer to the same subject. Finally,
you'd have to make this robust in the face of different editing &
structuring styles by different Wikipedians.
I'd love to see the extractor get this smart, but I'm not holding my breath.
Tom
> On Thu, Oct 11, 2012 at 4:15 PM, Tom Morris <tfmor...@gmail.com> wrote:
>
>>
>>
>> On Thu, Oct 11, 2012 at 9:48 AM, Venkatesh Channal <
>> venkateshchan...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I think I am still missing on how to query the information.
>>>
>>> On executing the query:
>>> SELECT * WHERE { ?s <http://purl.org/dc/terms/subject> <
>>> http://dbpedia.org/resource/Category:Hindi-language_films>. ?s
>>> rdf:type <http://dbpedia.org/ontology/Film> } LIMIT 100
>>>
>>> One of the values got was -
>>> http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na.
>>>
>>> To find the information about all triples that has the film name as
>>> subject. The idea was to find song and singer of those songs. The query
>>> executed:
>>>
>>> select * where { <http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
>>> ?p ?o . }
>>>
>>> One of the values returned is -
>>> http://dbpedia.org/property/title"Pappu Can't Dance"@en
>>>
>>> Here "Pappu Can't Dance"@en is a song.
>>>
>>> I executed the following query to find artist associated with the song
>>> that begin with the character 'P'. The song "Pappu Can't Dance" was
>>> not returned.
>>>
>>> Select distinct * where { ?song <http://purl.org/dc/terms/subject> <
>>> http://dbpedia.org/resource/Category:Hindi_songs> . ?song rdf:type <
>>> http://dbpedia.org/ontology/Song> . ?song <
>>> http://dbpedia.org/ontology/artist> ?artist . ?song <
>>> http://dbpedia.org/ontology/runtime> ?runtime . filter
>>> (regex(str(?song),'P'))} limit 100
>>>
>>> The correspong wikipedia link is -
>>> http://en.wikipedia.org/wiki/Jaane_Tu..._Ya_Jaane_Na
>>>
>>>
>> That article is in Category:Hindi_films, not Category:Hindi_songs and
>> it's a Film, not a song, so it's not going to meet the requirements of your
>> query. It looks like the DBpedia extractor attempted to extract as much
>> information as possible from the page, but that strategy, combined with the
>> way Wikipedians edit is causing confusion.
>>
>> Even the article contains infoboxes for both the film and the soundtrack
>> album, the subject is principally, in my opinion, the film. Including
>> triples related to the soundtrack album associated with the same URI is
>> just going to cause confusion.
>>
>> Tom
>>
>>
>>
>>> Appreciate your feedback and help.
>>>
>>> Thanks and regards,
>>> Venkatesh
>>>
>>> On Thu, Oct 11, 2012 at 6:20 PM, Julien Cojan <julien.co...@inria.fr>wrote:
>>>
>>>> Hi Mohamed,
>>>>
>>>> ------------------------------
>>>>
>>>> *De: *"Mohamed Morsey" <mor...@informatik.uni-leipzig.de>
>>>> *À: *"Julien Cojan" <julien.co...@inria.fr>
>>>> *Cc: *"Venkatesh Channal" <venkateshchan...@gmail.com>,
>>>> dbpedia-discussion@lists.sourceforge.net
>>>> *Envoyé: *Jeudi 11 Octobre 2012 10:49:45
>>>>
>>>> *Objet: *Re: [Dbpedia-discussion] Fetching song and movie related
>>>> information
>>>>
>>>> Hi Julien,
>>>>
>>>> On 10/10/2012 05:34 PM, Julien Cojan wrote:
>>>>
>>>> Hi,
>>>>
>>>> Are you running the query on http://live.dbpedia.org/sparql ?
>>>> I don't get any artist associated to
>>>> http://dbpedia.org/resource/Piyu_Bole in this endpoint.
>>>> You must have got from http://dbpedia.org/sparql which is not synched
>>>> with wikipedia.
>>>>
>>>> As Mohamed said, the wikipedia page was redirected to
>>>> http://en.wikipedia.org/wiki/Parineeta_(2005_film).
>>>> There is something weird though, there is a triple
>>>>
>>>> <http://dbpedia.org/resource/Piyu_Bole><http://dbpedia.org/resource/Piyu_Bole>
>>>> <http://dbpedia.org/ontology/wikiPageRedirects><http://dbpedia.org/ontology/wikiPageRedirects>
>>>> <http://dbpedia.org/resource/Parineeta_(2005_film)><http://dbpedia.org/resource/Parineeta_(2005_film)>.
>>>>
>>>> but describe
>>>> <http://dbpedia.org/resource/Parineeta_(2005_film)><http://live.dbpedia.org/sparql?default-graph-uri=&query=describe+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FParineeta_%282005_film%29%3E&should-sponge=&format=text%2Fplain&timeout=0&debug=on>gives
>>>> only this triple
>>>> and describe
>>>> <http://dbpedia.org/resource/Parineeta_%282005_film%29><http://live.dbpedia.org/sparql?default-graph-uri=&query=describe+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FParineeta_%25282005_film%2529%3E&should-sponge=&format=text%2Fplain&timeout=0&debug=on>very
>>>> little.
>>>>
>>>>
>>>> sorry I didn't get what you mean by very little.
>>>> I tried query " describe
>>>> <http://dbpedia.org/resource/Parineeta_%282005_film%29><http://dbpedia.org/resource/Parineeta_%282005_film%29>",
>>>> and it gives 109 triples.
>>>>
>>>>
>>>> Same for me now, when I tried I had about 10. Maybe the page was being
>>>> extracted after a change.
>>>>
>>>> There is bug though about the use of two different URI/IRIs for
>>>> Parineeta_(2005_film)<http://live.dbpedia.org/sparql?default-graph-uri=&query=describe+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FParineeta_%282005_film%29%3E&should-sponge=&format=text%2Fplain&timeout=0&debug=on>.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Anyone knows why ?
>>>>
>>>> Julien
>>>>
>>>>
>>>>
>>>> --
>>>> Kind Regards
>>>> Mohamed Morsey
>>>> Department of Computer Science
>>>> University of Leipzig
>>>>
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Don't let slow site performance ruin your business. Deploy New Relic APM
>>> Deploy New Relic app performance management and know exactly
>>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>>> http://p.sf.net/sfu/newrelic-dev2dev
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Don't let slow site performance ruin your business. Deploy New Relic APM
>> Deploy New Relic app performance management and know exactly
>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>> http://p.sf.net/sfu/newrelic-dev2dev
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>
>
> --
> ---
> Pablo N. Mendes
> http://pablomendes.com
> Events: http://wole2012.eurecom.fr
>
>
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion