[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406516#comment-13406516
 ] 

Jukka Zitting commented on OAK-114:
---

Would it be possible to change "for at least 10 minutes" to "for at least 10 
minutes since last access"? That would make life easier for things like large 
export processes that may want to stream out potentially gigabytes of data from 
the repository.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406536#comment-13406536
 ] 

Jukka Zitting commented on OAK-114:
---

The "last access" amendment would solve also Michael's concern if we define 
"access" as the revision id being returned from or passed to one of the MK 
methods.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread Stefan Guggisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406609#comment-13406609
 ] 

Stefan Guggisberg commented on OAK-114:
---

> Would it be possible to change "for at least 10 minutes" to "for at least 10 
> minutes since last access"?

the 'extend lease model', apart from introducing complex state management 
requirements in the microkernel,  would e.g. allow misbehaved clients to 
compromise the stability of the mk. a client could force the mk to keep old 
revisions for ever and prevent vital gc cycles.

i therefore don't think that we should allow clients to (explicitly or 
implicitly) extend the life span
of a specific revision.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread Stefan Guggisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406611#comment-13406611
 ] 

Stefan Guggisberg commented on OAK-114:
---

> there is a race for example between MicroKernel.getRevisionHistory and 
> MicroKernel.getNodes: whatever the fixed retention time might be, when the 
> garbage collector runs after the first but before the second call, the latter 
> will fail.

i can't follow this argument. with my proposal a client is minimally guaranteed 
to be able to read revisions no older than N minutes.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406619#comment-13406619
 ] 

Michael Dürig commented on OAK-114:
---

bq. i can't follow this argument. with my proposal a client is minimally 
guaranteed to be able to read revisions no older than N minutes.

Yes and that guarantee is IMO too weak.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406622#comment-13406622
 ] 

Michael Dürig commented on OAK-114:
---

BTW: why is 10 minutes the correct value?

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-04 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406637#comment-13406637
 ] 

Jukka Zitting commented on OAK-114:
---

bq. i can't follow this argument

Here's a snippet of code that illustrates Michael's point:

{code}
String revision = mk.getHeadRevision();
mk.commit(...);  // Could occur in another thread
TimeUnit.MINUTES.sleep(5);   // Could be any delay <10mins, or no delay at 
all
mk.getNodes("/", revision, ...);
{code}

Say the {{revision}} returned from the first call was committed something like 
an hour ago. Then by the time the {{getNodes}} call is reached it can be that 
the garbage collector has already removed that revision since it's already 
older than 10ms and it isn't the latest revision in the repository.

If that problem isn't fixed, a client can't make any reasonable assumptions 
about how long it can expect a particular revision to stay alive. The only way 
for a client to guarantee that it can see a given revision for at least the 
next 10 minutes would be for it to directly commit that revision, but that's 
definitely not something we want read-only clients to be doing.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-05 Thread Dominique Pfister (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406910#comment-13406910
 ] 

Dominique Pfister commented on OAK-114:
---

The javadoc is possibly not clear enough: a revision returned by 
getHeadRevision remains accessible for _at least_ 10 minutes or _even longer_ 
if it is still the head revision, regardless of the time it was committed. So 
in Jukka's snippet above, the getNodes call wouldn't fail, because only 10 
minutes passed.

Anyway, I think we really need some performance figures first, before we can 
decide whether this policy is too aggressive. OTOH, the current GC logic is 
quite small and straightforward, so it shouldn't be difficult to change it at a 
later time if need arises.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-05 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406997#comment-13406997
 ] 

Jukka Zitting commented on OAK-114:
---

bq. The javadoc is possibly not clear enough: a revision returned by 
getHeadRevision remains accessible for at least 10 minutes ...

Ah, you're right. That solves the problem above.

If we are keeping track of when a particular revision was last returned by 
{{getHeadRevision}}, wouldn't it be simple to use the same mechanism to also 
keep track of when revisions are returned from or passed to other 
{{MicroKernel}} methods? I don't see how that would imply any more "complex 
state management" than what's already needed.

The benefit of switching from "last returned as head revision" to "last 
accessed/seen" for figuring out when a revision is still needed is that we can 
allow unused revisions expire much faster. With the "last accessed/seen" 
pattern there'll be no problem with an expiry time of just a few seconds, which 
would in most cases allow the garbage collector to be much more aggressive than 
with the 10 minute time proposed here.

To illustrate the difference, consider a heavy write case where we have 100 
commits per second hitting the repository. With a 10-minute expiry time the 
repository will have to keep track of something like 60k revisions, whereas 
with a shorter expiry time we can easily push that down to just a few hundred 
revisions.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-05 Thread Dominique Pfister (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407201#comment-13407201
 ] 

Dominique Pfister commented on OAK-114:
---

bq.{quote} If we are keeping track of when a particular revision was last 
returned by getHeadRevision, wouldn't it be simple to use the same mechanism to 
also keep track of when revisions are returned from or passed to other 
MicroKernel methods? I don't see how that would imply any more "complex state 
management" than what's already needed.{quote}

If we just remember the earliest revision returned by getHeadRevision, we need 
just one field and the next GC cycle can skip all revisions committed later. If 
we remember all revisions accessed, we'll end up with some possibly sparse list 
of revisions, and the GC cycle would need to re-link these revisions - modify 
parent commit, re-calculate diff - to get a consistent view.

bq.{quote} The benefit of switching from "last returned as head revision" to 
"last accessed/seen" for figuring out when a revision is still needed is that 
we can allow unused revisions expire much faster. With the "last accessed/seen" 
pattern there'll be no problem with an expiry time of just a few seconds, which 
would in most cases allow the garbage collector to be much more aggressive than 
with the 10 minute time proposed here.{quote}

I can see the advantage, but this would leave the door open for some bogus 
polling client that keeps some very old revision alive, which I'd like to avoid.


> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-05 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407238#comment-13407238
 ] 

Jukka Zitting commented on OAK-114:
---

bq. If we just remember the earliest revision returned by getHeadRevision, we 
need just one field and the next GC cycle can skip all revisions committed 
later.

Good point, though one could use the same mechanism also with the "last 
accessed/seen" alternative by just keeping track of the oldest revision 
accessed during the past few seconds. That opens up the possibility of the 
journal growing without limit if a client keeps an old revision alive for an 
extended amount of time, but there are ways to deal with such cases on a higher 
level (see below).

bq. If we remember all revisions accessed, we'll end up with some possibly 
sparse list of revisions, and the GC cycle would need to re-link these 
revisions - modify parent commit, re-calculate diff - to get a consistent view.

The MicroKernel interface doesn't require the implementation to store such 
information (parent commit, diff, etc.), and I see no big reason why an 
implementation would want to. And since some old revisions will in any case be 
thrown away by garbage collection, such information would need to be rewritten 
regardless of whether the resulting revision list is sparse or not.

bq. I can see the advantage, but this would leave the door open for some bogus 
polling client that keeps some very old revision alive, which I'd like to avoid.

Such denial-of-service problems are best handled on the application or 
deployment level, possibly as an explicit admin operation. The alternative 
approach is just as vulnerable to similar problems, as a bogus client could 
keep making small commits to fill up the journal with dummy revisions that the 
garbage collector couldn't throw away until 10 minutes later. Or it could do a 
copy commit like {{*"/content":"/revisionX"}} to preserve content from a 
specific revision indefinitely regardless of what the garbage collector does. 
There simply is no good way to control or prevent such behavior on this level.


> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (OAK-114) MicroKernel API: specify retention policy for old revisions

2012-07-06 Thread Stefan Guggisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407948#comment-13407948
 ] 

Stefan Guggisberg commented on OAK-114:
---

i agree that large exports are an important use case which is not covered by my 
proposal (predefined fixed retention period). determining the 'ideal' value of 
the fixed retention period is problematic and depends on the use 
cases/applications. 

jukka's 'implicit lease on access' model OTOH has other shortcomings 
(implementation complexity, potential compromise of server stability).

it seems like a single approach doesn't fit all use cases.  

for now i suggest we postpone the discussion until we've gathered enough 
information/experiences about the implications of the different approaches. it 
might well be that we end up leaving the retention policy 
undefined/implementation specific.

> MicroKernel API: specify retention policy for old revisions
> ---
>
> Key: OAK-114
> URL: https://issues.apache.org/jira/browse/OAK-114
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mk
>Reporter: Stefan Guggisberg
>Assignee: Stefan Guggisberg
> Attachments: OAK-114.patch
>
>
> the MicroKernel API javadoc should specify the minimal guaranteed retention 
> period for old revisions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira