I think what Marco meant was: the mapping says it's an object
property, so we should extract a URI, even if the property value is
just a string.

In the case of the musician infoboxes on it wiki, that would work, but
in many other cases, it wouldn't. For example:
http://en.wikipedia.org/wiki/Glenn_Danzig contains "label = Plan 9,
Evilive". Just that string, no links. We could use a heuristic to
split the string into multiple links etc, but I don't think there's a
good, clean solution. With a naive approach we would extract
<http://en.dbpedia.org/resource/Plan_9,_Evilive>, which would be
wrong.

There is a simple rule though: If the Wikipedia template renders
strings as links, then we should extract strings as URIs. Otherwise we
shouldn't.

The problem is that our code can't find out what the template does
(well, it could, but that would be almost as hard as rendering
templates). But humans can. So to implement that rule, we have to add
a feature to the mappings wiki, as I described in my previous mail, so
users can add a flag saying "yes, plain string values in this property
should be extracted as URIs".

It seems that Italian Wikipedia templates often work like this, while
English templates rarely do. To make that behavior possible, the
Italian templates use multiple properties like genere, genere2,
genere3 etc, while the English templates use one property which the
editors can fill with links or strings as they like.

Cheers,
JC

On Thu, May 10, 2012 at 6:18 PM, Pablo Mendes <pablomen...@gmail.com> wrote:
>
> Hi Marco, Jona,
>
>> Shouldn't the mapping be the king or this could be wrong in other ways?
>
>
> Absolutely!!! User-generated content is our "business".
>
> Let me repeat, with added emphasis:
>
> "When there is a link, then the object property MUST be generated directly
> FROM THE LINK PROVIDED BY THE USER.
> If there isn't a link FOR AN OBJECT PROPERTY, then we should try to find one
> in the page."
>
> More details...
>
> Case 1: a link is provided as value. Solution: keep it as is
> Case insensitivity in URIs is nonsense. We should treat URIs as if they were
> numbers or any other kind of opaque id. Even if in Wikipedia we know that
> they play around with "readable IDs".
>
> Case 2: a link is not found, only a string.
> Now, for strings, case insensitivity starts to make sense. But we should be
> aware that it can cause errors. Perhaps having a separate dataset for
> "guessed property values" would be the safest.
>
> One exception where tampering with user provided values is acceptable: when
> other user-provided value corrects it. That's the case for redirects for
> example. So if the value of a property is a redirect page, we should point
> it to the end of the redirect chain.
>
> That's my opinion.
>
> Cheers,
> Pablo
>
> On Thu, May 10, 2012 at 5:45 PM, Marco Amadori <marco.amad...@gmail.com>
> wrote:
>>
>> 2012/5/10 Jona Christopher Sahnwaldt <j...@sahnwaldt.de>:
>> > I just made that little change. I had looked at the code before, so it
>> > was very simple. We now also get the triple for "Heavy metal", but
>> > that's it:
>> >
>> >
>> > http://mappings.dbpedia.org/server/extraction/it/extract?title=Glenn+Danzig
>>
>> It seems a good news, but....
>>
>> > Let's hope that this doesn't introduce too many extraction errors.
>> > It's quite unlikely though. I looked around a little and finally found
>> > a case that we would treat differently after this this change.
>> > Potentially wrong, but it's highly unlikely. The pros will certainly
>> > outweigh the cons.
>>
>> > http://en.wikipedia.org/wiki/Neocon
>> > http://en.wikipedia.org/wiki/NeoCon
>>
>> Nice example on why case insensitiveness is bad. :-)
>>
>> I think I'm changing my mind about that issue. The code should not try
>> to be smart at all.
>>
>> My question is, if the mapping is hand made and trusted over the page,
>> why in the code we trust the page (links) more than the mapping?
>>
>> E.g. in the Artist mapping if I says that 'genere' is genre in the
>> dbpedia owl, I should be trusted more than the page which could not
>> have the links.
>>
>> Shouldn't the mapping be the king or this could be wrong in other ways?
>>
>> --
>> ESC:wq
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to