The next release (2015-10) is underway. we will announce a beta release soon
Regarding article coverage, we never had 100% coverage of all wikipedia
articles

2014: 3.02M (typed main articles) out of 4.58M articles
2015-04:  2.95M (typed main articles) out of 5.03M articles

so due to the bug we provide ~ the same number of types resources as the
previous release but we should have provided (a lot?) more since the
article number increased

Best,
Dimitris

On Tue, Dec 15, 2015 at 12:27 PM, Vihari Piratla <viharipira...@gmail.com>
wrote:

> Thanks Dimitris for a detailed response.
> I see 2,945,956 unique titles in instance-types_en.nt.bz2 and 2,716,774
> unique titles in instance-types-transitive_en.nt.bz2. The number of unique
> titles in the two files together is 2,945,956.
> Currently, Wikipedia contains 5,031,836 articles in English. I am assuming
> the dump is missing 2 million or so titles because of the bug in the
> extraction framework.
>
> When can we expect the 2016 release?
>
> Thanks
>
> On Mon, Dec 14, 2015 at 8:53 PM, Dimitris Kontokostas <jimk...@gmail.com>
> wrote:
>
>> Hi Vihari,
>>
>> The main reason for the size reduction is due to the split between direct
>> & transitive types [1]
>> There was a bug [2] that indirectly affected some type assignments but is
>> now fixed and the next release will not have this problem.
>> Also note that besides SD-Types, in this release we published two
>> additional type datasets, dbatx and LHD [3]
>>
>> Regarding your 2nd question ('__'). These resources are extracted from
>> additional infoboxes in the same page but when they cannot be merged, we
>> create additional resources.
>> This is also a way to create intermediate node mappings
>> <http://mappings.dbpedia.org/index.php/Template:IntermediateNodeMapping>through
>> the mappings wiki e.g. in [4]
>>
>> [1]
>> http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types-transitive_en.nt.bz2
>> [2] https://github.com/dbpedia/extraction-framework/issues/404
>> [3] http://wiki.dbpedia.org/dbpedia-data-set-2015-04
>> [4] http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_officeholder
>>
>> On Mon, Dec 14, 2015 at 1:12 PM, Vihari Piratla <viharipira...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I am a software developer, we use DBpedia instance type or mapping-based
>>> type files in a pipeline to recognize entities.
>>> We found that the latest instance-types resource available at
>>> http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types_en.nt.bz2
>>> is much smaller than the corresponding 2014 release
>>> http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/instance_types_en.nt.bz2
>>> .
>>> As a result, the latest instance file is missing many entries present on
>>> Wikipedia such as Taj_Mahal, J._Paul_Getty_Museum, Grand_Canyon.
>>> What is the reason for the reduced size (110MB->35MB)
>>> Is this a bug?
>>> Are there some other files that we have to consider along with this file?
>>>
>>> We also sometimes see entries with '__', as in "Abraham_Lincoln__1" in
>>> the line
>>> <http://dbpedia.org/resource/Abraham_Lincoln__1> <
>>> http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
>>> http://dbpedia.org/ontology/TimePeriod>
>>> What does '__' mean? Where can I find more information about these
>>> things.
>>>
>>> Thanks
>>> --
>>> Vihari PIratla
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>
>
>
> --
> V
>



-- 
Kontokostas Dimitris
------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to