Re: Google Summer of Code project for Jackrabbit

Nicolas Toper Fri, 26 May 2006 09:03:50 -0700

Hi,

I think we all agree now on how to handle "hot backup" and how to avoid
write locking a workspace.


This functionnality was not initially planned in my proposal but since we
need it. I will implement it.

I will write a summary on this issue in the next few days for your
validation.

Cheers,
Nicolas

My blog! http://www.deviant-abstraction.net !!

On 5/26/06, Stefan Guggisberg <[EMAIL PROTECTED]> wrote:


hi nico

On 5/25/06, Nicolas Toper <[EMAIL PROTECTED]> wrote:
> Just to summarize everything we have said on this issue.
>
> There are two kinds of lock: the jcr.Lock and the
> EDU.oswego.cs.dl.util.concurrent.*. The two are somewhat not related. Am
I
> correct?
>
> There are no issues with jcr.Lock (we can still read a node).
>
> We need some mutex to avoid inconsistant IO operations using the
> util.concurrent package. I like Tobias approach to add a proxyPM.
It  seems
> easy. But is this solution elegant enough and maintenable in the long
run?
> Would it help us later? (I think so since it would allow delayed write
which
> open the way for a 2 phase locking algorithm.) I am not in this project
> since long enough to judge :p
>
> Why didn't Jackrabbit go for serializable transaction by the way? I have
> checked the code and it seems we have all the needed kind of locks to
> support 2PL (out of scope of the current project of course).
>
> If we plan to support serializable transaction soon, then case 2 is
> acceptable. Is this the case?
>
> About Tobias ProxyPM: I am ok to write it although it is out of scope of
the
> initial project, you all seem to really need it, so let's go for it.
Jukka?
>
> For a specific workspace, I would still allow read operations from other
> sessions and isolate all write access (this way there will be no
conflict).
> I can even make persistant the modification using an already existing PM
in
> case of crash. One question though: I cannot guarantee the transaction
would
> be later committed without exception. We can choose to ignore this issue
or
> add an asynchronous way to warn the session. What are your thoughts on
this?
>

we already have this scenario. a session's modifications are
potentially committed
asynchronously and the commit can fail for a number of reasons. that's
fine with me.

cheers
stefan

> This means a modification in the core package. Are you all OK with this?
>
>
> By the way, this kind of algorithm is called a pessismistic receiver
based
> logging message algorithm. We use it in distributed systems.
>
>
>
> Thanks for your support and ideas.
> nico
> My blog! http://www.deviant-abstraction.net !!
>
>
>
>
> On 5/25/06, Tobias Bocanegra < [EMAIL PROTECTED]> wrote:
> >
> > i think there is a consensus of what backup levels there can be:
> >
> > 1) read locked workspaces
> > 2) write locked workspaces
> > 3) hot-backup (i.e. "SERIALIZABLE" isolation)
> >
> > in case 1, the entire workspace is completely locked (rw) and no one
> > else than the backup-session can read-access the workspace. this is
> > probably the easiest to implement and the least desirable.
> >
> > in case 2, the entire workspace becomes read-only, i.e. is
> > write-locked. so all other sessions can continue reading the
> > workspace, but are not allowed to write to it. this is also more or
> > less easy to implement, intoducing a 'global' lock in the lock
> > manager.
> >
> > in case 3, all sessions can continue operations on the workspace, but
> > the backup-session sees a snapshot view of the workspace. this would
> > be easy, if we had serializable isolated transactions, which we don't
> > :-(
> >
> > for larger productive environments, only case 3 is acceptable. the way
> > i see of how to impement this, is to create a
> > 'proxy-persistencemanager' that sits between the
> > shareditemstatemanager and the real persistencemanager. during normal
> > operation, it just passes the changes down to the real pm, but in
> > backup-mode, it keeps an own storage for the changes that occurr
> > during the backup. when backup is finished, it resends all changes
> > down to the real pm. using this mechanism, you have a stable snapshot
> > of the states in the real pm during backup mode. the export would then
> > access directly the real pm.
> >
> > regards, toby
> >
> >
> > On 5/25/06, Nicolas Toper < [EMAIL PROTECTED]> wrote:
> > > Hi David,
> > >
> > > Sorry to have been unclear.
> > >
> > > What I meant is we have two different kinds of backup to perform.
> > >
> > > In one use case I call "regular backup", it is the kind of backup
you
> > > perform every night. You do not care not to grab the content just
> > updated,
> > > since you will have it the day after.
> > >
> > > In the other use case I call "exceptional backup", you want to have
all
> > the
> > > data because for instance you will destroy the repository
afterwards.
> > >
> > > Those two differs I think in small points. For instance, for
"regular
> > > backup", we don't care about transaction started but not committed.
In
> > the
> > > second one, we do.
> > >
> > > I propose to support only the first use case. The second one would
be
> > added
> > > easily later.
> > >
> > > I don't know how JackRabbit is used in production environment. Is it
> > > feasible to lock workspace once at a time or it is too cumbersome
for
> > the
> > > customer?
> > >
> > > For instance, if backuping a workspace needs a two minutes workspace
> > > locking, then it can be done without affecting availibility (but it
> > would
> > > affect reliability). We need data to estimate if it is needed. Can
you
> > give
> > > me the size of a typical workspace please?
> > >
> > > I am OK to record the transaction and commit it after the locking
has
> > > occured but this means changing the semantic of Jackrabbit (a
> > transaction
> > > initiated when a lock is on would be performed after the lock is
> > released
> > > instead of raising an exception ) and I am not sure everybody would
> > think it
> > > is a good idea. We would need to add a transaction log (is there one
> > > already?) and parse transaction to detect conflict (or capture
exception
> >
> > > maybe). We would not be able to guarantee anymore a transaction is
> > > persistent and it might have an impact on performance. And what
about
> > time
> > > out when running a transaction?
> > >
> > > Another idea would be: monitor Jackrabbit and launch the backup when
we
> > have
> > > a high probability no transaction are going to be started. But I
think
> > > sysadmin already know when load is minimal on their system.
> > >
> > > Another idea would be as Miro stated, use more "lower" level
strategy
> > > (working on the DB level or directly on the FS). It was actually my
> > first
> > > backup strategy but Jukka thought have to be able to use the tool to
> > migrate
> > > from one PM to another
> > >
> > > Here is my suggestion on the locking strategy: we can extend the
backup
> > tool
> > > later if needed. Right now even with a global lock, it is an
improvement
> > > compared to the current situation. And I need to release the project
> > before
> > > August 21.
> > >
> > > I would prefer to start with locking one workspace at a time and if
I
> > have
> > > still time then find a way to work with minimal lock. I will
> > most  probably
> > > keep working on Jackrabbit after the Google SoC is over. Are you OK
with
> > > this approach?
> > >
> > > We are OK on the restore operation. Good idea for the replace or
ignore
> > > option but I would recommend to build it only for existing nodes :p
> > > Properties might be more difficult to handle and not as useful (and
it
> > > raises a lot more questions).
> > >
> > > nico
> > > My blog! http://www.deviant-abstraction.net !!
> > >
> > >
> >
> >
>
>

Re: Google Summer of Code project for Jackrabbit

Reply via email to