Pablo,

I wrote that we could use a heuristic, but I also meant that we should
not do this. Apparently the rest of my sentence was not clear enough.
:-)

Cheers,
JC

On Fri, May 11, 2012 at 1:43 AM, Pablo Mendes <pablomen...@gmail.com> wrote:
>
> In the context of...
>>
>> "When there is a link, then the object property MUST be generated directly
>> FROM THE LINK PROVIDED BY THE USER.
>> If there isn't a link FOR AN OBJECT PROPERTY, then we should try to find
>> one in the page."
>
>
> Jona said...
>>
>>  We could use a heuristic to
>> split the string into multiple links etc, but I don't think there's a
>> good, clean solution. With a naive approach we would extract
>> <http://en.dbpedia.org/resource/Plan_9,_Evilive>, which would be
>> wrong.
>
>
>
> Sorry, I don't understand this. When I said we can *find* a link, I didn't
> mean that we should *invent* one. I really mean going into the text and
> finding if a URI with an exact match in the anchor text is linked from the
> page, much like it seems that the current coreference code does. Or did I
> misunderstand something?
>
> cheers
> Pablo
>
> On Thu, May 10, 2012 at 8:11 PM, Jona Christopher Sahnwaldt
> <j...@sahnwaldt.de> wrote:
>>
>> I think what Marco meant was: the mapping says it's an object
>> property, so we should extract a URI, even if the property value is
>> just a string.
>>
>> In the case of the musician infoboxes on it wiki, that would work, but
>> in many other cases, it wouldn't. For example:
>> http://en.wikipedia.org/wiki/Glenn_Danzig contains "label = Plan 9,
>> Evilive". Just that string, no links. We could use a heuristic to
>> split the string into multiple links etc, but I don't think there's a
>> good, clean solution. With a naive approach we would extract
>> <http://en.dbpedia.org/resource/Plan_9,_Evilive>, which would be
>> wrong.
>>
>> There is a simple rule though: If the Wikipedia template renders
>> strings as links, then we should extract strings as URIs. Otherwise we
>> shouldn't.
>>
>> The problem is that our code can't find out what the template does
>> (well, it could, but that would be almost as hard as rendering
>> templates). But humans can. So to implement that rule, we have to add
>> a feature to the mappings wiki, as I described in my previous mail, so
>> users can add a flag saying "yes, plain string values in this property
>> should be extracted as URIs".
>>
>> It seems that Italian Wikipedia templates often work like this, while
>> English templates rarely do. To make that behavior possible, the
>> Italian templates use multiple properties like genere, genere2,
>> genere3 etc, while the English templates use one property which the
>> editors can fill with links or strings as they like.
>>
>> Cheers,
>> JC
>>
>> On Thu, May 10, 2012 at 6:18 PM, Pablo Mendes <pablomen...@gmail.com>
>> wrote:
>> >
>> > Hi Marco, Jona,
>> >
>> >> Shouldn't the mapping be the king or this could be wrong in other ways?
>> >
>> >
>> > Absolutely!!! User-generated content is our "business".
>> >
>> > Let me repeat, with added emphasis:
>> >
>> > "When there is a link, then the object property MUST be generated
>> > directly
>> > FROM THE LINK PROVIDED BY THE USER.
>> > If there isn't a link FOR AN OBJECT PROPERTY, then we should try to find
>> > one
>> > in the page."
>> >
>> > More details...
>> >
>> > Case 1: a link is provided as value. Solution: keep it as is
>> > Case insensitivity in URIs is nonsense. We should treat URIs as if they
>> > were
>> > numbers or any other kind of opaque id. Even if in Wikipedia we know
>> > that
>> > they play around with "readable IDs".
>> >
>> > Case 2: a link is not found, only a string.
>> > Now, for strings, case insensitivity starts to make sense. But we should
>> > be
>> > aware that it can cause errors. Perhaps having a separate dataset for
>> > "guessed property values" would be the safest.
>> >
>> > One exception where tampering with user provided values is acceptable:
>> > when
>> > other user-provided value corrects it. That's the case for redirects for
>> > example. So if the value of a property is a redirect page, we should
>> > point
>> > it to the end of the redirect chain.
>> >
>> > That's my opinion.
>> >
>> > Cheers,
>> > Pablo
>> >
>> > On Thu, May 10, 2012 at 5:45 PM, Marco Amadori <marco.amad...@gmail.com>
>> > wrote:
>> >>
>> >> 2012/5/10 Jona Christopher Sahnwaldt <j...@sahnwaldt.de>:
>> >> > I just made that little change. I had looked at the code before, so
>> >> > it
>> >> > was very simple. We now also get the triple for "Heavy metal", but
>> >> > that's it:
>> >> >
>> >> >
>> >> >
>> >> > http://mappings.dbpedia.org/server/extraction/it/extract?title=Glenn+Danzig
>> >>
>> >> It seems a good news, but....
>> >>
>> >> > Let's hope that this doesn't introduce too many extraction errors.
>> >> > It's quite unlikely though. I looked around a little and finally
>> >> > found
>> >> > a case that we would treat differently after this this change.
>> >> > Potentially wrong, but it's highly unlikely. The pros will certainly
>> >> > outweigh the cons.
>> >>
>> >> > http://en.wikipedia.org/wiki/Neocon
>> >> > http://en.wikipedia.org/wiki/NeoCon
>> >>
>> >> Nice example on why case insensitiveness is bad. :-)
>> >>
>> >> I think I'm changing my mind about that issue. The code should not try
>> >> to be smart at all.
>> >>
>> >> My question is, if the mapping is hand made and trusted over the page,
>> >> why in the code we trust the page (links) more than the mapping?
>> >>
>> >> E.g. in the Artist mapping if I says that 'genere' is genre in the
>> >> dbpedia owl, I should be trusted more than the page which could not
>> >> have the links.
>> >>
>> >> Shouldn't the mapping be the king or this could be wrong in other ways?
>> >>
>> >> --
>> >> ESC:wq
>> >
>> >
>>
>>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to