Thanks Nicolas ! :)

1. Scraping rendered wikipedia html pages seems like it would be noisy in
terms of data quality. Isn't that so?
2. If we delegate to MediaWiki API, is this option scalable if we had to
parse the wikidump on daily basis?

thanks,
Mandar

On Wed, Feb 18, 2015 at 1:11 PM, Nicolas Torzec <torz...@yahoo-inc.com>
wrote:

> Hi Mandar :)
>
> DBpedia does not handle nested templates. It may work for some specific
> (simple-enough) templates but it is in no way generalized.
>
> That's why consumer-grade projects consuming Wikipedia data either:
> 1) Scrape Wikipedia HTML pages directly: i.e. template interpretation is
> done by MediaWiki, on wikipedia.com or on dedicated Wikipedia mirrors.
> 2) Set up their own Wikipedia extraction framework, which may interpret
> templates directly or delegate to MediaWiki using its API.
>
> Nicolas.
>
>
>
>
>
>
>   On Wednesday, February 18, 2015 10:56 AM, Mandar Rahurkar <
> rahur...@gmail.com> wrote:
>
>
> Thanks Guys for  your comments !
> Release data information for April Love (film) is available
> http://dbpedia.org/page/April_Love_(film)
>
> but not for http://dbpedia.org/page/Actrius
>
> And if you examine the wikipedia page, they both seem to use nested
> template:
> http://en.wikipedia.org/w/index.php?title=April_Love_(film)&action=edit
>
> So maybe this is more than one issue?
>
> thanks,
> Mandar
>
>
>
>
> On Wed, Feb 18, 2015 at 9:22 AM, Alexandru Todor <to...@inf.fu-berlin.de>
> wrote:
>
> Hi Vladimir, Mandar,
>
> The mappings extractor can't handle nested templates:
> http://sourceforge.net/p/dbpedia/mailman/message/32867924/ .
> @Dimitris : I know this is on your to do list, any progress so far ?
>
> Cheers,
> Alexandru
>
> On Wed, Feb 18, 2015 at 5:49 PM, Vladimir Alexiev <
> vladimir.alex...@ontotext.com> wrote:
>
> Hi Mandar!
>
> Run these queries on http://yasgui.org/, selecting
> http://dbpedia.org/sparql as endpoint.
>
> First check the raw property dbo:released:
>
> PREFIX dbo: <http://dbpedia.org/ontology/>
> PREFIX dbp: <http://dbpedia.org/property/>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> select * {?x a dbo:Film; dbp:released ?rel
>     filter exists {?x rdfs:label ?lab filter(strstarts(?lab,"Act"))}}
> order by ?x limit 100
>
> As you can see many movies have it, but not Actrius.
> So Volha is right, the problem is that in that movie it's not a plain date.
>
> > How were you able to extract that information?
>
> It's in https://en.wikipedia.org/w/index.php?title=Actrius&action=edit:
>    | release = {{Film date|1996|||}}
>
> I tried to make a mapping:
> http://mappings.dbpedia.org/index.php/Mapping_en:Film_date
> to extract release year and location (there can be several).
>
> But it doesn't extract anything. Maybe templates INSIDE template fields
> are not used for extraction?
> Issue: https://github.com/dbpedia/mappings-tracker/issues/46
> Test cases:
> http://mappings.dbpedia.org/index.php/Mapping_en_talk:Film_date
>
> If that's the case, we could map it to another date template here:
>
> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/config/dataparser/DateTimeParserConfig.scala#L97
> But Volha, can it extract SEVERAL dates from one template?
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to