Re: [Dbpedia-gsoc] GSoC '15 - Interest in 5.14 (Scalable querying of the live DBpedia data stream)

Pablo Estrada Tue, 24 Mar 2015 04:28:24 -0700

Hello guys!
I have finally submitted my proposal. It's called "Get up and walk! -
Adding live-ness to the Triple Pattern Fragments server" (Sorry about the
cheesy title, I couldn't resist :-) ).
If you'd like me to make any updates, or have any questions, please go
ahead!
Best.


Pablo

On Mon, Mar 23, 2015 at 5:17 AM Ruben Verborgh <ruben.verbo...@ugent.be>
wrote:

> Hi Pablo,
>
> > 1. I think we can implement a class called "HdtLiveDatasource", which
> would share many characteristics with "HdtDatasource", but would have some
> extra fields in the "settings" field of the datasource configuration.
> Particularly, a field such as "update_source" pointing to the changesets
> generated by the extraction framework. The "HdtLiveDatasource" would keep a
> thread to do the work of retrieving and updating the new data.
>
> That would indeed be a nice starting point!
>
> > 2. Currently, there is not much locking/threading logic written into the
> HdtDatasource class
>
> …because JavaScript is single-threaded.
> The server can start up multiple workers,
> but they all live in separate processes.
>
> > So if we were to modify the HDT file 'live', we'd need to add locking,
> and all necessary logic for concurrency. Modifying the same file would
> require to keep an exclusive lock for a potentially long time. This option
> might have big contention.
>
> Apart from the threads, the reasoning is correct.
> A decent switchover mechanism needs to be chosen,
> perhaps temporarily allowing out-of-date data.
>
> > 3. Instead of rewriting the same file, we could write a new file, just
> adding the new data. That way we would only need to have a short exclusive
> lock to switch from old to new file.
>
> I don't think we would really need a lock for that?
> Simply a green light “you can switch now” should be fine.
>
> > This option would have little contention, but large storage
> requirements. I am assuming that the network is the major bottleneck for
> the server, so I hope having one thread writing a new file to disk, and
> other threads retrieving information from another file will not make the
> hard drive into a bottleneck.
>
> Storage is cheap, so that would indeed not be an issue.
> And indeed, network is the major bottl
>
> > What do you think about these options, Ruben?
>
> Good thinking :-)
>
> Ruben

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] GSoC '15 - Interest in 5.14 (Scalable querying of the live DBpedia data stream)

Reply via email to