[ 
https://issues.apache.org/jira/browse/JENA-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460462#comment-13460462
 ] 

Andy Seaborne commented on JENA-327:
------------------------------------

(what counts as 'extremely large stores' in triple/quad count?)

There can be no absolute guarantees about manipulating the database files 
because it limits future features unknown.

A READ action does not depend on internal characteristics so it is the most 
stable option.

There is no need to flush the journal - just back it up like everything else.  
The requirement is that nothing is changing files and having a WRITE lock 
ensures that.  I can't see that changing but, theoretically, it could.

The 3rd option is that you manage TDB activity and hold everything up (maybe 
manually flush the journal because this would likely work with all future 
writeback schemes but currently is not necessary).

What you can't have is detailed low-level guarantees and also evolution of the 
system in the future.

A transaction type of EXCLUSIVE might be useful to add as a general feature but 
it's not necessary currently.  Defining it and supporting it in future systems 
could turn out to be a burden so adding only when needed is a better way 
forward to my mind.
                
> TDB Tx transaction lock to permit backups
> -----------------------------------------
>
>                 Key: JENA-327
>                 URL: https://issues.apache.org/jira/browse/JENA-327
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>    Affects Versions: TDB 0.9.4
>            Reporter: Simon Helsen
>
> With large repositories, it is important to be able to create backups once in 
> a while. This is because recreating an rdf store with millions of triples can 
> be forbiddingly expensive. Moreover, it should be possible to take those 
> backups while still allowing read activity on the store as in many cases, a 
> complete shutdown is usually not possible. Before the introduction of tx, it 
> was relatively straightforward to provide the right locks on the client-side 
> to safely suspend any disk activity for a period of time enough to make a 
> backup of the index. 
> However, since tx, things have become slightly more complicated because TDB 
> Tx touches the disk at other times than when performing write/sync 
> activities. Right now, because of some understanding of how TDB Tx is 
> implemented, it is still possible for clients to avoid disk activities to 
> implement a backup process, but this dependency on TDB Tx implementation 
> details is not very good. Moreover, we anticipate that in the future, the 
> merging process from the journal into the main index may become entirely 
> asynchornous for performance reasons. The moment that happens, client have no 
> control anymore as to when the disk is being touched.
> For this reason, we are requesting the following feature: a "backup" lock (by 
> lack of a better name). Its semantics is that when the lock is taken, TDB Tx 
> guarantees that no disk activity takes place and if necessary pauses 
> activities. In other words, no write transaction should be able to complete 
> and read transactions will not attempt to merge the journal. The idea would 
> be that regular read activities can still continue. The API could be as 
> simple as something like this:
> try {
> dataset.begin(ReadWrite.BACKUP) ;
> <do whatever is necessary to backup the index>
> } finally {
> dataset.end()
> }
> As for the implementation, we suspect you currently have locks in place which 
> could be used to guarantee this behavior. E.g. could 
> txn.getBaseDataset().getLock().enterCriticalSection(Lock.WRITE) be sufficient?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to