There are different pages with the same problem:

http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fres+%3Ftopic+where+%7B+%0D%0A%3Fres+dbpedia-owl%3Aindustry+%3Font_ind+.+%0D%0AOPTIONAL+%7B+%3Fres+dbpprop%3Aindustry+%3Fprop_ind+%7D+.+%0D%0AFILTER+%28+%21bound%28%3Fprop_ind%29+%29+.%0D%0A%3Ftopic+foaf%3AprimaryTopic+%3Fres+.+%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on

select distinct ?res ?topic where {
?res dbpedia-owl:industry ?ont_ind .
OPTIONAL { ?res dbpprop:industry ?prop_ind } .
FILTER ( !bound(?prop_ind) ) .
?topic foaf:primaryTopic ?res .
}

Would be could if we could identify and fix faulty pages or try to find
another heuristic rule for the Infobox extractor.

WDYT?

Cheers
Andrea




2013/12/19 Andrea Di Menna <[email protected]>

> Hi Amit,
>
> thanks for posting your question :)
> The rule you mention defines a key to be valid when is not a plain number
> (i.e. it does not have only digits) - i.e. they are explicit.
> This because templates can have either explicit or implicit parameters:
> - an explicit parameter has a name
> - implicit parameters are identified by their position, so they have no
> name but only an index
>
> E.g.
> {{Template|name=...|surname=...}} => properties are { "name" => ...;
> "surname" => ...}
> and
> {{Template|...|...}} => properties are { "1" => ...; "2" => ...}
>
> The MinPercentageOfExplicitPropertyKeys is used to skip useless templates.
>
> The real problem with the page you mention is not the percentage we are
> using, but how the template is filled in with data:
>
> {{Infobox company
> | name      =  International Speedway Corporation|
> | logo      =  [[Image:Iscmotorsportslogo.png]]
> | type      =  [[Public company|Public]]  |
> | traded_as  = {{NASDAQ|ISCA}}<br />{{OTCQB|ISCB}}
> | foundation        =  1953 (as Bill France Racing, Inc.)|
> | location          =  1 Daytona Boulevard<br />[[Daytona Beach, Florida]]
>  32114-1243|
> | key_people        =  [[Bill France, Sr.]], founder<br/>[[Jim France]],
> CEO<br/>[[Lesa Kennedy]], president|
> | industry          =  [[Auto racing|Motorsports]]|
> | products          =  Sporting events|
> | revenue           =  {{decrease}} $633.91 million [[United States
> dollar|USD]] (2010, November)|
> | operating_income  =  {{decrease}} $115.64 million [[United States
> dollar|USD]] (2010, November)|
> | net_income        =  {{decrease}} $54.53 million [[United States
> dollar|USD]] (2010, November)|
> | num_employees     =   1,000 (full time) |
> | homepage          =  [http://www.iscmotorsports.com/
> www.iscmotorsports.com]|
> }}
>
> There is a bunch of useless misleading trailing pipes ("|") in the
> template properties.
> The effect is that the parser thinks there is a number of implicit
> parameters which will be counted in the list of params (hence the template
> is below the threshold of 75%).
> Can you fix the wikipedia article?
>
> More general question: which are the allowed chars in a implicit template
> param?
>
> Cheers
> Andrea
>
>
> 2013/12/19 Amit Kumar <[email protected]>
>
>> Hi,
>> Today while looking at the extracted dataset we found we are not getting
>> any infobox properties output for some pages.
>> For example if you try for
>> http://en.wikipedia.org/wiki/International_Speedway_Corporation
>>
>> Debugging told me that the problem lies in the Infobox Extractor
>>
>> val MinPercentageOfExplicitPropertyKeys = 0.75
>> Š
>>
>> val countExplicitPropertyKeys = propertyList.count(property =>
>> !property.key.forall(_.isDigit))
>> if ((countExplicitPropertyKeys >= MinPropertyCount) &&
>> (countExplicitPropertyKeys.toDouble / propertyList.size) >
>> MinPercentageOfExplicitPropertyKeys)
>> {
>> ..
>> ..
>> }
>>
>> What is I think it says, is that we should only parse templates where it
>> finds minimum 75% of Keys in the (key,value) to be valid keys. The above
>> mentioned wiki page doesn't makes the cut. Can someone tell the about this
>>  75% cut off. I tried with 50% limit it gives the desired output ? I know
>> lowering it will start giving more data some of which might be bad
>> quality.
>>
>>
>>
>> Regards
>> Amit
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dbpedia-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to