Hi guys,
Thank you for brainstorming with me and giving me some guidance. I'll look
into the HDT file format, and the logic in Server.js working with the HDT
file to see what seems to be the most reasonable way of implementing this.
I like the idea of keeping a cache while regenerating the HDT file
index/contents; but I'll need to do a bit more research first.
I'll write back again soon, and maybe start preparing my proposal.

Also, since contributing to the project before the SoC is desirable; is
there an issue in the bug tracker that you think I should look at to get
started?

Thanks again,
Pablo.

On Fri, Mar 20, 2015, 10:42 PM Ruben Verborgh <ruben.verbo...@ugent.be>
wrote:

> Hi Pablo,
>
> > 1. Update the HDT file (According to What is HDT, HDF is a read-only
> format, so it might not be feasible).
>
> …which is why we have this project :-)
>
> Can you find a way to update HDT files?
> Can you improve HDT so that it allows updates?
> Could you perhaps make a combination
> of different HDT files to give the live result?
>
> It's definitely feasible—just not within the current limits.
> The challenge in this project is to find out!
>
> > 2. Possibly, keep an in-memory cache of triples, where we would keep
> modified triples permanently (This could potentially raise the memory
> requirements of the server...)
>
> Maybe have a pipeline so that new triples
> are slowly migrated from cache to disk?
>
> > 3. Keep a file in disk, in a format that can be written to/read from
> efficiently, and keep information about updated triples here (This seems
> like a reasonable option...)
>
> Possible too!
>
> > If that's the case, then to 'start up' a 'Live' TPFS, we need to know
> the time when the HDT file was generated, and then we need to run the
> 'triple update' function over all the triples that have been changed since
> then. Correct? (This would make startup potentially quite slow, but I guess
> that's okay).
>
> Sure, but in the meantime, the server should remain active.
> I.e., the question is: how can we keep a server running,
> while updating the triples in the meantime/background,
> while still staying easy on server resources (RAM / CPU).
>
> Note that we don't need to find the right answer here and now;
> thinking about multiple directions is part of the project.
>
> Best,
>
> Ruben
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to