Hi,

I am pasting an old developers-list thread between me and Max (I could not
find it on the archive search for a link)
I think it is more or less about the same bug. I don't remember if it was
fixed or not

Cheers,
Dimitris

-----------------------------------------------
Thanks for pointing to this thread. The case reported by Roberto was
actually a bug in NodeUtil.splitPropertyNode.
It always produced one too many elements while the last one did not
have any children.

For the ObjectParser we still need to fix the regex though, because a
whitespace TextNode is still a node.

Cheers,
Max


On Tue, Jul 5, 2011 at 02:32, Dimitris Kontokostas <jimk...@gmail.com>
wrote:
> Hi,
>
> it came to me an old thread from Roberto Mirizzi [1] that also involves
> whitespaces between links.
> Fixing the regex could apply to both, so I agree...
>
> Cheers
> Dimitris
>
> [1] http://sourceforge.net/mailarchive/message.php?msg_id=26916982
>
> On Mon, Jul 4, 2011 at 3:11 PM, Max Jakob <max.ja...@fu-berlin.de> wrote:
>>
>> Hi,
>>
>> On Wed, Apr 6, 2011 at 22:12, Dimitris Kontokostas <jimk...@gmail.com>
>> wrote:
>> > I am working on the FlagTemplateParser and I noticed something strange
>> > in
>> > the Mapping Extraction results.
>> > When there are multiple values seperated by the DataParser split regex
>> > ("""<br\s*\/?>|\n| and | or | in |/|;|,""")
>> > the values must not have a trailing or leading space, otherwise they
are
>> > not
>> > parsed
>> >
>> > for example
>> > from the following string (from an Infobox Property)
>> > {{GRE}}<br>{{CYP}} <br> {{ALB}}<br >{{MKD}}<br />{{SRB}}<br>{{UKR}}<br>
>> > {{RUS}},{{TUR}},{{EGY}} , {{USA}}<br>{{CAN}}
>> >
>> > extracts values only for {{GRE}} {{MKD}} {{SRB}} {{UKR}} {{TUR}}
{{CAN}}
>> > I haven't test it, but it could affect other parsers as well
>>
>> Line 36 in ObjectParser.scala is responsible, more specifically the if
>> clause:
>>
>> case templateNode : TemplateNode if(node.children.length == 1) =>
>>    resolveTemplate(templateNode) match ...
>>
>> The rational behind it is that templates should only be extracted if
>> there is no other data around it. Only then we can be certain that the
>> property links to the country information.
>> For counter-example, in the case of the flag template, there might be
>> a person name behind the flag icon. The property is then most probably
>> about the person and not about the person's nationality. I found this
>> on the page of the American Revolutionary War [1]:
>>
>> {{Infobox military conflict
>> ...
>> |commander1={{flagicon|United States|1777}} [[George Washington]]
>> ...
>> }}
>>
>> Clearly, the extraction of the following triple should be avoided:
>> res:American_Revolutionary_War ont:hasCommander res:United_States
>>
>> That is why the ObjectParser first checks if there are other nodes
>> under the property node. In the commander example, there is another
>> InternalLinkNode(George_
Washington) under the
>> PropertyNode(commander1). In your example, there are TextNodes with
>> "left-over whitespaces" after splitting.
>> So I think that maybe adjusting the Regex is actually the way to go.
>> What do you think?
>>
>> Sidenote: the triple for Greece is not extracted when running with
>> English because it does not use an ISO code [2] but one of the other
>> alias names for flag templates (IOC or FIFA) [3]. Maybe we need to
>> extend the map for English in FlagTemplateParserConfig...
>>
>> Cheers,
>> Max
>>
>> [1]
>>
http://en.wikipedia.org/w/index.php?title=American_Revolutionary_War&oldid=437678888
>> [2]
>>
http://download.oracle.com/javase/1.4.2/docs/api/java/util/Locale.html#getISO3Country%28%29
>> [3]
>>
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Flag_Template#Alias_names
>
>
>
> --
> Kontokostas Dimitris
>


On Tue, May 8, 2012 at 6:56 PM, Jona Christopher Sahnwaldt
<j...@sahnwaldt.de>wrote:

> Hi Marco,
>
> yes, this is a bug. I don't know what's going on.
>
> http://en.wikipedia.org/wiki/Glenn_Danzig contains:
>
> genre = [[Heavy metal music|Heavy metal]], [[blues rock]], [[horror
> punk]], [[deathrock]], [[Classical music|classical]]
>
> All of them are extracted.
>
> http://it.wikipedia.org/wiki/Glenn_Danzig contains:
>
> |genere = Heavy Metal
> |genere2 = Alternative Metal
> |genere3 = Punk rock
> |genere4 = Hardcore punk
>
> But only Punk rock is extracted:
>
> http://mappings.dbpedia.org/server/extraction/it/extract?title=Glenn+Danzig
>
> > Same thing for 'dbprop-it:nome', which maps to 'foaf:name'.
> Works for me - the sample extraction page contains
>
> http://it.dbpedia.org/resource/Glenn_Danzig__lenn__1
> http://xmlns.com/foaf/0.1/name  Glenn
> http://it.dbpedia.org/resource/Glenn_Danzig__lenn__1
> http://xmlns.com/foaf/0.1/surname       Danzig
> http://it.dbpedia.org/resource/Glenn_Danzig__lenn__1
> http://xmlns.com/foaf/0.1/surname       all'anagrafe Glenn Allen Anzalone
>
> But that 'genere' thing is strange. Maybe template properties that end
> with numbers are not mapped correctly? I looked through the code but
> didn't find an obvious problem. We'll have to start a debugger, I
> guess.
>
> Cheers,
> JC
>
>
> On Mon, May 7, 2012 at 12:58 PM, Marco Fossati <hell.j....@gmail.com>
> wrote:
> > Hi Jona,
> >
> > We have just generated fresh dumps for the Italian DBpedia with the
> > latest extractors code version and found that some data is lost in the
> > mapping-based dataset.
> > If you have a look at this example [1], 'dbprop-it:genere' property has
> > 4 objects, while 'dbpedia-owl:genre' only has 1.
> > Same thing for 'dbprop-it:nome', which maps to 'foaf:name'.
> > I checked the same resource in the English version [2] (the property is
> > dbprop:genre) and the data is there.
> > Is it a mapping extractor bug?
> > Cheers,
> >
> > Marco
> >
> > [1] http://it.dbpedia.org/page/Glenn_Danzig
> > [2] http://dbpedia.org/page/Glenn_Danzig
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Dbpedia-discussion mailing list
> > Dbpedia-discussion@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>



-- 
Kontokostas Dimitris
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to