Hi Pablo,

> 1. I think we can implement a class called "HdtLiveDatasource", which would 
> share many characteristics with "HdtDatasource", but would have some extra 
> fields in the "settings" field of the datasource configuration. Particularly, 
> a field such as "update_source" pointing to the changesets generated by the 
> extraction framework. The "HdtLiveDatasource" would keep a thread to do the 
> work of retrieving and updating the new data.

That would indeed be a nice starting point!

> 2. Currently, there is not much locking/threading logic written into the 
> HdtDatasource class

…because JavaScript is single-threaded.
The server can start up multiple workers,
but they all live in separate processes.

> So if we were to modify the HDT file 'live', we'd need to add locking, and 
> all necessary logic for concurrency. Modifying the same file would require to 
> keep an exclusive lock for a potentially long time. This option might have 
> big contention.

Apart from the threads, the reasoning is correct.
A decent switchover mechanism needs to be chosen,
perhaps temporarily allowing out-of-date data.

> 3. Instead of rewriting the same file, we could write a new file, just adding 
> the new data. That way we would only need to have a short exclusive lock to 
> switch from old to new file.

I don't think we would really need a lock for that?
Simply a green light “you can switch now” should be fine.

> This option would have little contention, but large storage requirements. I 
> am assuming that the network is the major bottleneck for the server, so I 
> hope having one thread writing a new file to disk, and other threads 
> retrieving information from another file will not make the hard drive into a 
> bottleneck.

Storage is cheap, so that would indeed not be an issue.
And indeed, network is the major bottl

> What do you think about these options, Ruben?

Good thinking :-)

Ruben
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to