Re: Google Summer of Code project for Jackrabbit

Stefan Guggisberg Fri, 26 May 2006 01:57:20 -0700

hi nico

On 5/25/06, Nicolas Toper <[EMAIL PROTECTED]> wrote:

Just to summarize everything we have said on this issue.


There are two kinds of lock: the jcr.Lock and the
EDU.oswego.cs.dl.util.concurrent.*. The two are somewhat not related. Am I
correct?

There are no issues with jcr.Lock (we can still read a node).

We need some mutex to avoid inconsistant IO operations using the
util.concurrent package. I like Tobias approach to add a proxyPM. It  seems
easy. But is this solution elegant enough and maintenable in the long run?
Would it help us later? (I think so since it would allow delayed write which
open the way for a 2 phase locking algorithm.) I am not in this project
since long enough to judge :p

Why didn't Jackrabbit go for serializable transaction by the way? I have
checked the code and it seems we have all the needed kind of locks to
support 2PL (out of scope of the current project of course).

If we plan to support serializable transaction soon, then case 2 is
acceptable. Is this the case?

About Tobias ProxyPM: I am ok to write it although it is out of scope of the
initial project, you all seem to really need it, so let's go for it. Jukka?

For a specific workspace, I would still allow read operations from other
sessions and isolate all write access (this way there will be no conflict).
I can even make persistant the modification using an already existing PM in
case of crash. One question though: I cannot guarantee the transaction would
be later committed without exception. We can choose to ignore this issue or
add an asynchronous way to warn the session. What are your thoughts on this?


we already have this scenario. a session's modifications are
potentially committed
asynchronously and the commit can fail for a number of reasons. that's
fine with me.

cheers
stefan

This means a modification in the core package. Are you all OK with this?


By the way, this kind of algorithm is called a pessismistic receiver based
logging message algorithm. We use it in distributed systems.



Thanks for your support and ideas.
nico
My blog! http://www.deviant-abstraction.net !!




On 5/25/06, Tobias Bocanegra < [EMAIL PROTECTED]> wrote:
>
> i think there is a consensus of what backup levels there can be:
>
> 1) read locked workspaces
> 2) write locked workspaces
> 3) hot-backup (i.e. "SERIALIZABLE" isolation)
>
> in case 1, the entire workspace is completely locked (rw) and no one
> else than the backup-session can read-access the workspace. this is
> probably the easiest to implement and the least desirable.
>
> in case 2, the entire workspace becomes read-only, i.e. is
> write-locked. so all other sessions can continue reading the
> workspace, but are not allowed to write to it. this is also more or
> less easy to implement, intoducing a 'global' lock in the lock
> manager.
>
> in case 3, all sessions can continue operations on the workspace, but
> the backup-session sees a snapshot view of the workspace. this would
> be easy, if we had serializable isolated transactions, which we don't
> :-(
>
> for larger productive environments, only case 3 is acceptable. the way
> i see of how to impement this, is to create a
> 'proxy-persistencemanager' that sits between the
> shareditemstatemanager and the real persistencemanager. during normal
> operation, it just passes the changes down to the real pm, but in
> backup-mode, it keeps an own storage for the changes that occurr
> during the backup. when backup is finished, it resends all changes
> down to the real pm. using this mechanism, you have a stable snapshot
> of the states in the real pm during backup mode. the export would then
> access directly the real pm.
>
> regards, toby
>
>
> On 5/25/06, Nicolas Toper < [EMAIL PROTECTED]> wrote:
> > Hi David,
> >
> > Sorry to have been unclear.
> >
> > What I meant is we have two different kinds of backup to perform.
> >
> > In one use case I call "regular backup", it is the kind of backup you
> > perform every night. You do not care not to grab the content just
> updated,
> > since you will have it the day after.
> >
> > In the other use case I call "exceptional backup", you want to have all
> the
> > data because for instance you will destroy the repository afterwards.
> >
> > Those two differs I think in small points. For instance, for "regular
> > backup", we don't care about transaction started but not committed. In
> the
> > second one, we do.
> >
> > I propose to support only the first use case. The second one would be
> added
> > easily later.
> >
> > I don't know how JackRabbit is used in production environment. Is it
> > feasible to lock workspace once at a time or it is too cumbersome for
> the
> > customer?
> >
> > For instance, if backuping a workspace needs a two minutes workspace
> > locking, then it can be done without affecting availibility (but it
> would
> > affect reliability). We need data to estimate if it is needed. Can you
> give
> > me the size of a typical workspace please?
> >
> > I am OK to record the transaction and commit it after the locking has
> > occured but this means changing the semantic of Jackrabbit (a
> transaction
> > initiated when a lock is on would be performed after the lock is
> released
> > instead of raising an exception ) and I am not sure everybody would
> think it
> > is a good idea. We would need to add a transaction log (is there one
> > already?) and parse transaction to detect conflict (or capture exception
>
> > maybe). We would not be able to guarantee anymore a transaction is
> > persistent and it might have an impact on performance. And what about
> time
> > out when running a transaction?
> >
> > Another idea would be: monitor Jackrabbit and launch the backup when we
> have
> > a high probability no transaction are going to be started. But I think
> > sysadmin already know when load is minimal on their system.
> >
> > Another idea would be as Miro stated, use more "lower" level strategy
> > (working on the DB level or directly on the FS). It was actually my
> first
> > backup strategy but Jukka thought have to be able to use the tool to
> migrate
> > from one PM to another
> >
> > Here is my suggestion on the locking strategy: we can extend the backup
> tool
> > later if needed. Right now even with a global lock, it is an improvement
> > compared to the current situation. And I need to release the project
> before
> > August 21.
> >
> > I would prefer to start with locking one workspace at a time and if I
> have
> > still time then find a way to work with minimal lock. I will
> most  probably
> > keep working on Jackrabbit after the Google SoC is over. Are you OK with
> > this approach?
> >
> > We are OK on the restore operation. Good idea for the replace or ignore
> > option but I would recommend to build it only for existing nodes :p
> > Properties might be more difficult to handle and not as useful (and it
> > raises a lot more questions).
> >
> > nico
> > My blog! http://www.deviant-abstraction.net !!
> >
> >
>
>

Re: Google Summer of Code project for Jackrabbit

Reply via email to