Re: Google Summer of Code project for Jackrabbit

Nicolas Toper Thu, 25 May 2006 11:20:15 -0700

Just to summarize everything we have said on this issue.


There are two kinds of lock: the jcr.Lock and the
EDU.oswego.cs.dl.util.concurrent.*. The two are somewhat not related. Am I
correct?

There are no issues with jcr.Lock (we can still read a node).

We need some mutex to avoid inconsistant IO operations using the
util.concurrent package. I like Tobias approach to add a proxyPM. It  seems
easy. But is this solution elegant enough and maintenable in the long run?
Would it help us later? (I think so since it would allow delayed write which
open the way for a 2 phase locking algorithm.) I am not in this project
since long enough to judge :p

Why didn't Jackrabbit go for serializable transaction by the way? I have
checked the code and it seems we have all the needed kind of locks to
support 2PL (out of scope of the current project of course).

If we plan to support serializable transaction soon, then case 2 is
acceptable. Is this the case?

About Tobias ProxyPM: I am ok to write it although it is out of scope of the
initial project, you all seem to really need it, so let's go for it. Jukka?

For a specific workspace, I would still allow read operations from other
sessions and isolate all write access (this way there will be no conflict).
I can even make persistant the modification using an already existing PM in
case of crash. One question though: I cannot guarantee the transaction would
be later committed without exception. We can choose to ignore this issue or
add an asynchronous way to warn the session. What are your thoughts on this?

This means a modification in the core package. Are you all OK with this?


By the way, this kind of algorithm is called a pessismistic receiver based
logging message algorithm. We use it in distributed systems.



Thanks for your support and ideas.
nico
My blog! http://www.deviant-abstraction.net !!




On 5/25/06, Tobias Bocanegra < [EMAIL PROTECTED]> wrote:


i think there is a consensus of what backup levels there can be:

1) read locked workspaces
2) write locked workspaces
3) hot-backup (i.e. "SERIALIZABLE" isolation)

in case 1, the entire workspace is completely locked (rw) and no one
else than the backup-session can read-access the workspace. this is
probably the easiest to implement and the least desirable.

in case 2, the entire workspace becomes read-only, i.e. is
write-locked. so all other sessions can continue reading the
workspace, but are not allowed to write to it. this is also more or
less easy to implement, intoducing a 'global' lock in the lock
manager.

in case 3, all sessions can continue operations on the workspace, but
the backup-session sees a snapshot view of the workspace. this would
be easy, if we had serializable isolated transactions, which we don't
:-(

for larger productive environments, only case 3 is acceptable. the way
i see of how to impement this, is to create a
'proxy-persistencemanager' that sits between the
shareditemstatemanager and the real persistencemanager. during normal
operation, it just passes the changes down to the real pm, but in
backup-mode, it keeps an own storage for the changes that occurr
during the backup. when backup is finished, it resends all changes
down to the real pm. using this mechanism, you have a stable snapshot
of the states in the real pm during backup mode. the export would then
access directly the real pm.

regards, toby


On 5/25/06, Nicolas Toper < [EMAIL PROTECTED]> wrote:
> Hi David,
>
> Sorry to have been unclear.
>
> What I meant is we have two different kinds of backup to perform.
>
> In one use case I call "regular backup", it is the kind of backup you
> perform every night. You do not care not to grab the content just
updated,
> since you will have it the day after.
>
> In the other use case I call "exceptional backup", you want to have all
the
> data because for instance you will destroy the repository afterwards.
>
> Those two differs I think in small points. For instance, for "regular
> backup", we don't care about transaction started but not committed. In
the
> second one, we do.
>
> I propose to support only the first use case. The second one would be
added
> easily later.
>
> I don't know how JackRabbit is used in production environment. Is it
> feasible to lock workspace once at a time or it is too cumbersome for
the
> customer?
>
> For instance, if backuping a workspace needs a two minutes workspace
> locking, then it can be done without affecting availibility (but it
would
> affect reliability). We need data to estimate if it is needed. Can you
give
> me the size of a typical workspace please?
>
> I am OK to record the transaction and commit it after the locking has
> occured but this means changing the semantic of Jackrabbit (a
transaction
> initiated when a lock is on would be performed after the lock is
released
> instead of raising an exception ) and I am not sure everybody would
think it
> is a good idea. We would need to add a transaction log (is there one
> already?) and parse transaction to detect conflict (or capture exception

> maybe). We would not be able to guarantee anymore a transaction is
> persistent and it might have an impact on performance. And what about
time
> out when running a transaction?
>
> Another idea would be: monitor Jackrabbit and launch the backup when we
have
> a high probability no transaction are going to be started. But I think
> sysadmin already know when load is minimal on their system.
>
> Another idea would be as Miro stated, use more "lower" level strategy
> (working on the DB level or directly on the FS). It was actually my
first
> backup strategy but Jukka thought have to be able to use the tool to
migrate
> from one PM to another
>
> Here is my suggestion on the locking strategy: we can extend the backup
tool
> later if needed. Right now even with a global lock, it is an improvement
> compared to the current situation. And I need to release the project
before
> August 21.
>
> I would prefer to start with locking one workspace at a time and if I
have
> still time then find a way to work with minimal lock. I will
most  probably
> keep working on Jackrabbit after the Google SoC is over. Are you OK with
> this approach?
>
> We are OK on the restore operation. Good idea for the replace or ignore
> option but I would recommend to build it only for existing nodes :p
> Properties might be more difficult to handle and not as useful (and it
> raises a lot more questions).
>
> nico
> My blog! http://www.deviant-abstraction.net !!
>
>

Re: Google Summer of Code project for Jackrabbit

Reply via email to