One of the goals of the infovore project is to develop something that
targets this latency problem.
https://github.com/paulhoule/infovore/wiki
I’ve talked with a number of organizations that use DBpedia and Freebase
data and almost all of them have either no solution or an incomplete solution
for dealing with changes over time, something that’s absolutely necessary for
sustainable social-semantic systems. Many of them have considered developing
it but decided against developing it in house.
When Freebase changed the format of the RDF dump I was able to adapt in less
than a week (most of the time delay was that no official dump came out that
week and I didn’t know what was going on); after fixing my code I was able to
run against it interactively.
Infovore is not using Hadoop so much for “big data”, but rather for “low
latency”. Not extremely low latency, but once I trust the system enough it
ought to have Freebase processed before I wake up on Sunday. The files are
smaller than the official dump and will load faster, both things that will
lower latency for the consumer.
Right now the process is limited by the not-so-parallel process of
ungzipping and re-gzipping the Freebase dump, but I believe a processing
pipeline much more complex than the current one could still be run in less than
a hour if you throw enough AWS instances at it
The framework ought to work for any RDF data, including DBpedia (for which
it has been tested), and I have a lot of stuff planned, including something
that could “smush” Dbpedia identifiers to Freebase identifiers or the other way
around to create a merged data set.
Yes, what I am doing today is much simpler than what DBpedia is doing, but
I’m taking a multi-pronged approach that focuses on process as much as
technology. I’m keeping a notebook of how much time it takes me to do
everything and learning how to squeeze out the errors and waste time with a
battery of methods that are being documented. It is possible to run clusters
in Amazon EMR by simply providing a credential pair – you don’t need to know
much at all about AWS or Hadoop.
I invite all of you to follow the this project and github and also follow
the Google Group
https://groups.google.com/forum/#!forum/infovore-basekb
where you’ll get roughly two status reports a week and where people with
questions get quick answers.
I can definitely use contributions too, because the list of things I’d
like to see are long and my own work will be focused on my own needs. Even if
you don’t contribute, I welcome feature requests on the issue tracker.
From: Kingsley Idehen
Sent: Monday, September 23, 2013 1:37 PM
To: dbpedia-discussion@lists.sourceforge.net
Subject: Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider
infobox coverage, additional type statements, and new YAGO and Wikidata links
On 9/23/13 1:00 PM, Tom Morris wrote:
Congratulations on the new release!
On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer <ch...@bizer.de> wrote:
1. the new release is based on updated Wikipedia dumps dating from March /
April 2013 (the 3.8 release was based on dumps from June 2012), leading to
an overall increase in the number of concepts in the English edition from
3.7 to 4.0 million things.
What accounts for the long latency between the date of the dumps and the date
of the release?
Tom
A number of things:
1. Dataset QA -- the datasets are generated from mapping efforts
2. Dataset Loading & QA
-- Linked Data Deployment (i.e., new URIs resolve to the new data)
-- SPARQL Endpoint (new data is accessible via SPARQL endpoint) .
Kingsley
------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
--------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion