[jira] [Created] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-4712:
--

 Summary: Publish S3DataStore stats in JMX MBean
 Key: OAK-4712
 URL: https://issues.apache.org/jira/browse/OAK-4712
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: blob
Reporter: Matt Ryan


This feature is to publish statistics about the S3DataStore via a JMX MBean.  
There are two statistics suggested:
* Indicate the number of files actively being synchronized from the local cache 
into S3
* Given a path to a local file, indicate whether synchronization of file into 
S3 has completed




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.1.diff

First attempt to implement the feature, looking for feedback.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.1.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440043#comment-15440043
 ] 

Matt Ryan edited comment on OAK-4712 at 8/26/16 10:24 PM:
--

OAK-4712.1.diff attached; first attempt to implement the feature, looking for 
feedback.


was (Author: mattvryan):
First attempt to implement the feature, looking for feedback.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.1.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440096#comment-15440096
 ] 

Matt Ryan commented on OAK-4712:


Some questions that need to be resolved:
# For the request to query the sync state of a file, should the file name 
provided be a full path to a local file or a path to a file within OAK (e.g. 
/content/dam/myimage.jpg)?  Current implementation uses a local file path but I 
have been wondering if it should be an OAK path.
# For the request to query the sync state of a file, when converting from the 
externally-supplied file name to an internal DataIdentifier, this 
implementation is performing the same calculation to determine the internal ID 
name as is done when a file is stored.  I have a number of concerns with this:
   - It is inefficient - the entire file has to be read and digested in order 
to compute the internal ID.  This takes a long time for large assets.
   - I've essentially duplicated the logic from CachingDataStore into 
S3DataStore to compute the internal ID.  I hate duplicating the code, but I am 
trying to avoid exposing internal IDs in API, and I am not seeing a good way in 
the current implementation to avoid this without either modifying public API to 
CachingDataStore, or exposing the internal ID via API, or both.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.1.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440102#comment-15440102
 ] 

Matt Ryan commented on OAK-4712:


It should be noted that there is one known problem with this patch.  It appears 
to work fine until I delete a file.  For example, if I delete an asset via the 
REST API, I will see the asset deleted in CRXDE.  However, the file still 
remains in S3.  This MBean as implemented only knows how to check with 
S3DataStore and the corresponding backend, and these all appear to believe the 
file still exists.  So the MBean continues to report that the file's sync state 
is synchronized (i.e. isFileSynced() returns true) even though the file has 
been removed from the JCR.  This point will also need resolved before the patch 
is ready.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.1.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: (was: OAK-4712.1.diff)

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.2.diff

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-08-26 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440043#comment-15440043
 ] 

Matt Ryan edited comment on OAK-4712 at 8/26/16 10:51 PM:
--

OAK-4712.2.diff attached; first attempt to implement the feature, looking for 
feedback.


was (Author: mattvryan):
OAK-4712.1.diff attached; first attempt to implement the feature, looking for 
feedback.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-01 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.3.diff

I'm uploading a new diff which I believe addresses the previous issues.  
Feedback and review appreciated.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-01 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457478#comment-15457478
 ] 

Matt Ryan commented on OAK-4712:


[~amjain] I thought of the same issue that you mentioned, how the path to the 
node in question may not actually have the binary property.  This was certainly 
true in my testing, where I was working with nodes of type dam:Asset.  In that 
case, an image named "pic.jpg" might actually  have the binary property at the 
path "content/dam/pic.jpg/jcr:content/renditions/original/jcr:content".

I will seek for clarification.  But after some thought it occurred to me that 
it shouldn't be Oak's responsibility to figure out what the user making the 
request may have meant, if they were to provide e.g. "content/dam/pic.jpg".  In 
my way of thinking, only the full path to the node containing the binary 
property makes sense at the Oak level.  Applications with a more specific 
purpose than Oak could leverage the functionality by interpreting user input 
for their specific use case.

WDYT?

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-4772) SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest oak-core

2016-09-07 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-4772:
--

 Summary: SegmentDataStoreBlobGCIT.java in segment-tar will not 
build with latest oak-core
 Key: OAK-4772
 URL: https://issues.apache.org/jira/browse/OAK-4772
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segment-tar
Affects Versions: 1.5.7
Reporter: Matt Ryan


```oak-segment-tar``` builds against ```oak-core``` version 1.5.5, but 
specifying a later ```oak.version``` (e.g. 1.5.7) causes the build to fail in 
```SegmentDataStoreBlobGCIT.java``` due to a mismatching override of a parent 
class in ```oak-core```.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4772) SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest oak-core

2016-09-07 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4772:
---
Description: oak-segment-tar``` builds against ```oak-core``` version 
1.5.5, but specifying a later ```oak.version``` (e.g. 1.5.7) causes the build 
to fail in ```SegmentDataStoreBlobGCIT.java``` due to a mismatching override of 
a parent class in ```oak-core```.  (was: ```oak-segment-tar``` builds against 
```oak-core``` version 1.5.5, but specifying a later ```oak.version``` (e.g. 
1.5.7) causes the build to fail in ```SegmentDataStoreBlobGCIT.java``` due to a 
mismatching override of a parent class in ```oak-core```.)

> SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest 
> oak-core
> 
>
> Key: OAK-4772
> URL: https://issues.apache.org/jira/browse/OAK-4772
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segment-tar
>Affects Versions: 1.5.7
>Reporter: Matt Ryan
>
> oak-segment-tar``` builds against ```oak-core``` version 1.5.5, but 
> specifying a later ```oak.version``` (e.g. 1.5.7) causes the build to fail in 
> ```SegmentDataStoreBlobGCIT.java``` due to a mismatching override of a parent 
> class in ```oak-core```.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4772) SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest oak-core

2016-09-07 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4772:
---
Description: {{oak-segment-tar}} builds against {{oak-core}} version 1.5.5, 
but specifying a later {{oak.version}} (e.g. 1.5.7) causes the build to fail in 
{{SegmentDataStoreBlobGCIT.java}} due to a mismatching override of a parent 
class in {{oak-core}}.  (was: oak-segment-tar``` builds against ```oak-core``` 
version 1.5.5, but specifying a later ```oak.version``` (e.g. 1.5.7) causes the 
build to fail in ```SegmentDataStoreBlobGCIT.java``` due to a mismatching 
override of a parent class in ```oak-core```.)

> SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest 
> oak-core
> 
>
> Key: OAK-4772
> URL: https://issues.apache.org/jira/browse/OAK-4772
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segment-tar
>Affects Versions: 1.5.7
>Reporter: Matt Ryan
>
> {{oak-segment-tar}} builds against {{oak-core}} version 1.5.5, but specifying 
> a later {{oak.version}} (e.g. 1.5.7) causes the build to fail in 
> {{SegmentDataStoreBlobGCIT.java}} due to a mismatching override of a parent 
> class in {{oak-core}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4772) SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest oak-core

2016-09-07 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471643#comment-15471643
 ] 

Matt Ryan commented on OAK-4772:


To reproduce, check out the latest, then in the {{oak-segment-tar}} directory, 
run:

{noformat}
mvn clean package -Doak.version=1.5.7
{noformat}

You should see the following errors:

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
(default-testCompile) on project oak-segment-tar: Compilation failure: 
Compilation failure:
[ERROR] 
/oak/oak-segment-tar/src/test/java/org/apache/jackrabbit/oak/segment/SegmentDataStoreBlobGCIT.java:[474,8]
 error: method does not override or implement a method from a supertype
[ERROR] 
/oak/oak-segment-tar/src/test/java/org/apache/jackrabbit/oak/segment/SegmentDataStoreBlobGCIT.java:[496,39]
 error: method sweep in class MarkSweepGarbageCollector cannot be applied to 
given types;
{noformat}

Building with version 1.5.5 (or not specifying a version at all) works fine.

> SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest 
> oak-core
> 
>
> Key: OAK-4772
> URL: https://issues.apache.org/jira/browse/OAK-4772
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segment-tar
>Affects Versions: 1.5.7
>Reporter: Matt Ryan
>
> {{oak-segment-tar}} builds against {{oak-core}} version 1.5.5, but specifying 
> a later {{oak.version}} (e.g. 1.5.7) causes the build to fail in 
> {{SegmentDataStoreBlobGCIT.java}} due to a mismatching override of a parent 
> class in {{oak-core}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4772) SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest oak-core

2016-09-07 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4772:
---
Attachment: OAK-4772.1.diff

Patch to address this issue.

> SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest 
> oak-core
> 
>
> Key: OAK-4772
> URL: https://issues.apache.org/jira/browse/OAK-4772
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segment-tar
>Affects Versions: 1.5.7
>Reporter: Matt Ryan
> Attachments: OAK-4772.1.diff
>
>
> {{oak-segment-tar}} builds against {{oak-core}} version 1.5.5, but specifying 
> a later {{oak.version}} (e.g. 1.5.7) causes the build to fail in 
> {{SegmentDataStoreBlobGCIT.java}} due to a mismatching override of a parent 
> class in {{oak-core}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4772) SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest oak-core

2016-09-07 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471653#comment-15471653
 ] 

Matt Ryan edited comment on OAK-4772 at 9/7/16 8:07 PM:


Patch to address this issue attached.


was (Author: mattvryan):
Patch to address this issue.

> SegmentDataStoreBlobGCIT.java in segment-tar will not build with latest 
> oak-core
> 
>
> Key: OAK-4772
> URL: https://issues.apache.org/jira/browse/OAK-4772
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segment-tar
>Affects Versions: 1.5.7
>Reporter: Matt Ryan
> Attachments: OAK-4772.1.diff
>
>
> {{oak-segment-tar}} builds against {{oak-core}} version 1.5.5, but specifying 
> a later {{oak.version}} (e.g. 1.5.7) causes the build to fail in 
> {{SegmentDataStoreBlobGCIT.java}} due to a mismatching override of a parent 
> class in {{oak-core}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-07 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.4.diff

Requested changes applied in this patch.  It eliminates {{NodeIdMapper}} and 
consolidates the resolution logic into {{S3DataStoreStats}} for all relevant 
{{NodeStoreService}} classes.

Some items of note:
- {{S3DataStoreStats}} was moved from {{oak-blob-cloud}} to {{oak-core}}, 
alongside some other S3-related classes.  Moving the path resolution logic into 
{{S3DataStoreStats}} required this change since it relies on other classes in 
{{oak-core}} to work, and since {{oak-core}} already depends on 
{{oak-blob-cloud}}, leaving S3DataStoreStats in {{oak-blob-cloud}} would have 
ended up with a circular dependency.
- A new interface, {{BlobIdBlob}}, was added, extending {{Blob}}.  This 
represents a kind of blob that also has a blob ID that can be retrieved.  
{{SegmentBlob}} and {{BlobStoreBlob}} now implement this interface, since both 
support {{getBlobId()}}.
- The active nodeStore is provided to the MBean at construction time.  I chose 
this approach over OSGi injection because the MBean also requires an additional 
parameter, the {{S3DataStore}}, and injection would have required that we also 
inject the {{S3DataStore}}, which at first look seems more trouble than it is 
worth (open to discussion however).

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-07 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471936#comment-15471936
 ] 

Matt Ryan edited comment on OAK-4712 at 9/7/16 10:07 PM:
-

Requested changes applied in the latest patch (OAK-4712.4.diff).  It eliminates 
{{NodeIdMapper}} and consolidates the resolution logic into 
{{S3DataStoreStats}} for all relevant {{NodeStoreService}} classes.

Some items of note:
- {{S3DataStoreStats}} was moved from {{oak-blob-cloud}} to {{oak-core}}, 
alongside some other S3-related classes.  Moving the path resolution logic into 
{{S3DataStoreStats}} required this change since it relies on other classes in 
{{oak-core}} to work, and since {{oak-core}} already depends on 
{{oak-blob-cloud}}, leaving S3DataStoreStats in {{oak-blob-cloud}} would have 
ended up with a circular dependency.
- A new interface, {{BlobIdBlob}}, was added, extending {{Blob}}.  This 
represents a kind of blob that also has a blob ID that can be retrieved.  
{{SegmentBlob}} and {{BlobStoreBlob}} now implement this interface, since both 
support {{getBlobId()}}.
- The active nodeStore is provided to the MBean at construction time.  I chose 
this approach over OSGi injection because the MBean also requires an additional 
parameter, the {{S3DataStore}}, and injection would have required that we also 
inject the {{S3DataStore}}, which at first look seems more trouble than it is 
worth (open to discussion however).


was (Author: mattvryan):
Requested changes applied in this patch.  It eliminates {{NodeIdMapper}} and 
consolidates the resolution logic into {{S3DataStoreStats}} for all relevant 
{{NodeStoreService}} classes.

Some items of note:
- {{S3DataStoreStats}} was moved from {{oak-blob-cloud}} to {{oak-core}}, 
alongside some other S3-related classes.  Moving the path resolution logic into 
{{S3DataStoreStats}} required this change since it relies on other classes in 
{{oak-core}} to work, and since {{oak-core}} already depends on 
{{oak-blob-cloud}}, leaving S3DataStoreStats in {{oak-blob-cloud}} would have 
ended up with a circular dependency.
- A new interface, {{BlobIdBlob}}, was added, extending {{Blob}}.  This 
represents a kind of blob that also has a blob ID that can be retrieved.  
{{SegmentBlob}} and {{BlobStoreBlob}} now implement this interface, since both 
support {{getBlobId()}}.
- The active nodeStore is provided to the MBean at construction time.  I chose 
this approach over OSGi injection because the MBean also requires an additional 
parameter, the {{S3DataStore}}, and injection would have required that we also 
inject the {{S3DataStore}}, which at first look seems more trouble than it is 
worth (open to discussion however).

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-08 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474122#comment-15474122
 ] 

Matt Ryan commented on OAK-4712:


This issue should at least be aware of OAK-4772 which was discovered and 
reported while I was working on a patch for this issue.  I'm not sure exactly 
the resolution for OAK-4772 although I've submitted a patch for it, but I think 
we may have build problems with this issue until OAK-4772 is addressed.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-08 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474395#comment-15474395
 ] 

Matt Ryan commented on OAK-4712:


One question we should discuss:  while creating ```BlobIdBlob``` with the 
```getBlobId()``` method doesn't change the visibility of ```getBlobId()``` in 
implementing classes, it does move the method to a different package which 
might have a different exposure surface in OSGi.  Is 
```org.apache.jackrabbit.oak.api``` the right place for ```BlobIdBlob```?

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-08 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474395#comment-15474395
 ] 

Matt Ryan edited comment on OAK-4712 at 9/8/16 4:52 PM:


One question we should discuss:  while creating {{BlobIdBlob}} with the 
{{getBlobId()}} method doesn't change the visibility of {{getBlobId()}} in 
implementing classes, it does move the method to a different package which 
might have a different exposure surface in OSGi.  Is 
{{org.apache.jackrabbit.oak.api}} the right place for {{BlobIdBlob}}?


was (Author: mattvryan):
One question we should discuss:  while creating ```BlobIdBlob``` with the 
```getBlobId()``` method doesn't change the visibility of ```getBlobId()``` in 
implementing classes, it does move the method to a different package which 
might have a different exposure surface in OSGi.  Is 
```org.apache.jackrabbit.oak.api``` the right place for ```BlobIdBlob```?

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-09 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.5.diff

I uploaded OAK-4712.5.diff, with the changes suggested.  Please review.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff, 
> OAK-4712.5.diff
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-12 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.7.patch

This patch applies some of the unit test changes suggested on the last patch.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
>Assignee: Amit Jain
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff, 
> OAK-4712.5.diff, OAK-4712.7.patch, OAK_4712_6_Amit.patch
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-12 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485434#comment-15485434
 ] 

Matt Ryan commented on OAK-4712:


[~amjain] The latest patch I submitted addresses some of the issues you raised. 
 Rather than wait until all the issues have been addressed I wanted to get this 
out for feedback.

The following changes were applied:
* Use a mocked S3DS in S3DataStoreStatsTest, where possible
* Remove usage of test file
* Remove dependency on S3 data store and associated properties - no S3 
configuration required

I wasn't able to simply mock S3DataStore in every test case.  Some of the tests 
require that I be able to control the backend (for example, checking whether a 
sync has completed).  For these I am still using a very simple S3DataStore 
subclass that allows me to replace the backend with something that makes 
testing more meaningful.

I have not yet moved any of the tests into oak-blob-cloud.  I'm not sure it 
makes much sense to do so.  Perhaps I am missing something?  The purpose of the 
test is to exercise the functionality being added which is mostly in 
S3DataStoreStats, which lives in oak-core.  Each test creates an 
S3DataStoreStats object as the system under test.  I tried moving some things 
to oak-blob-cloud, but it looked to me that any test I were to move would end 
up being so limited in what it could do that it wouldn't really be able to test 
much.

I will continue working on the use of MemoryNodeStore instead of the node store 
mock, and the addition of more tests like those suggested.

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
>Assignee: Amit Jain
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff, 
> OAK-4712.5.diff, OAK-4712.7.patch, OAK_4712_6_Amit.patch
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4712) Publish S3DataStore stats in JMX MBean

2016-09-13 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-4712:
---
Attachment: OAK-4712.8.patch

Uploaded OAK-4712.8.patch, which includes the changes from the version 7 patch 
as well as additional unit test refactoring, use of MemoryNodeStore, and 
additional unit tests as requested by [~amjain].

> Publish S3DataStore stats in JMX MBean
> --
>
> Key: OAK-4712
> URL: https://issues.apache.org/jira/browse/OAK-4712
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Matt Ryan
>Assignee: Amit Jain
> Attachments: OAK-4712.2.diff, OAK-4712.3.diff, OAK-4712.4.diff, 
> OAK-4712.5.diff, OAK-4712.7.patch, OAK-4712.8.patch, OAK_4712_6_Amit.patch
>
>
> This feature is to publish statistics about the S3DataStore via a JMX MBean.  
> There are two statistics suggested:
> * Indicate the number of files actively being synchronized from the local 
> cache into S3
> * Given a path to a local file, indicate whether synchronization of file into 
> S3 has completed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-5977) Document enhancements in S3DataStore in 1.6

2017-03-23 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-5977:
--

 Summary: Document enhancements in S3DataStore in 1.6
 Key: OAK-5977
 URL: https://issues.apache.org/jira/browse/OAK-5977
 Project: Jackrabbit Oak
  Issue Type: Technical task
  Components: doc
Reporter: Matt Ryan
 Fix For: 1.7.0


This task is meant to collect and refer work done in 1.6 release which needs to 
be documented in Oak docs. Specially those enhancements which impact system 
administration or new features which need be to enabled as per requirements 
should be documented.

Some related issues include:
OAK-4837
OAK-4712




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6004) [S3DataStore] Remove redundant packages for S3DataStore

2017-03-29 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6004:
--

 Summary: [S3DataStore] Remove redundant packages for S3DataStore
 Key: OAK-6004
 URL: https://issues.apache.org/jira/browse/OAK-6004
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: blob
Affects Versions: 1.6.1
Reporter: Matt Ryan
Priority: Minor


In the {{oak-blob-cloud}} module, there are two packages that appear to be 
redundant:  {{org.apache.jackrabbit.oak.blob.cloud.s3}} and 
{{org.apache.jackrabbit.oak.blob.cloud.aws.s3}} (note the "aws" part).  To 
avoid confusion we should remove one (I believe the "aws.s3" one is the 
outdated one, but I'm not 100% sure).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6163) Add unit test coverage for IOUtils.writeInt/writeLong and IOUtils.readInt/readLong

2017-05-03 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6163:
--

 Summary: Add unit test coverage for IOUtils.writeInt/writeLong and 
IOUtils.readInt/readLong
 Key: OAK-6163
 URL: https://issues.apache.org/jira/browse/OAK-6163
 Project: Jackrabbit Oak
  Issue Type: Test
  Components: commons
Reporter: Matt Ryan
Priority: Minor


There is no unit test coverage for IOUtils.writeInt(), IOUtils.writeLong(), 
IOUtils.readInt(), and IOUtils.readLong() in oak-commons.

I am working on a patch and will have one to submit shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6163) Add unit test coverage for IOUtils.writeInt/writeLong and IOUtils.readInt/readLong

2017-05-03 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6163:
---
Attachment: OAK-6163.1.patch

Attached patch file that adds unit tests to cover IOUtils.writeInt(), 
IOUtils.writeLong(), IOUtils.readInt(), and IOUtils.writeLong().

> Add unit test coverage for IOUtils.writeInt/writeLong and 
> IOUtils.readInt/readLong
> --
>
> Key: OAK-6163
> URL: https://issues.apache.org/jira/browse/OAK-6163
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: commons
>Reporter: Matt Ryan
>Priority: Minor
>  Labels: easyfix, patch, test
> Attachments: OAK-6163.1.patch
>
>
> There is no unit test coverage for IOUtils.writeInt(), IOUtils.writeLong(), 
> IOUtils.readInt(), and IOUtils.readLong() in oak-commons.
> I am working on a patch and will have one to submit shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-03 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995686#comment-15995686
 ] 

Matt Ryan commented on OAK-6164:


I have a patch for this, with a unit test, that I will submit shortly.

> IOUtils.nextPowerOf2() returns lower power of 2 for very high int values
> 
>
> Key: OAK-6164
> URL: https://issues.apache.org/jira/browse/OAK-6164
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Matt Ryan
>Priority: Minor
>
> In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
> However, there are valid signed integer values that this method accepts as 
> input, but for which a lower power of 2 value is returned.
> This occurs for values that are valid signed integer values that are greater 
> than the highest possible power of two value in the signed integer range.  
> Signed integer values have the maximum value of 0x7FFF, but the maximum 
> possible power of two in the signed integer range is 0x4000.  (The 
> current implementation incorrectly identifies the maximum possible power of 
> two as 0x3FFF, due to how it is computed by doing integer division of 
> 0x7FFF / 2.)
> In the current implementation any input in the range of [0x4000, 
> 0x7FFF] is a valid signed integer input, but the method will return 
> 0x3FFF as the next valid max power of 2.
> Two minor things need to be fixed:
> * If the input is 0x4000, the return value needs to be 0x4000, 
> instead of 0x3FFF which is not a valid power of 2.
> * If the input is in the range [0x4001, 0x7FFF] I propose the method 
> should instead throw an IllegalArgumentException and indicate that it is not 
> possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-03 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6164:
--

 Summary: IOUtils.nextPowerOf2() returns lower power of 2 for very 
high int values
 Key: OAK-6164
 URL: https://issues.apache.org/jira/browse/OAK-6164
 Project: Jackrabbit Oak
  Issue Type: Bug
Reporter: Matt Ryan
Priority: Minor


In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
However, there are valid signed integer values that this method accepts as 
input, but for which a lower power of 2 value is returned.

This occurs for values that are valid signed integer values that are greater 
than the highest possible power of two value in the signed integer range.  
Signed integer values have the maximum value of 0x7FFF, but the maximum 
possible power of two in the signed integer range is 0x4000.  (The current 
implementation incorrectly identifies the maximum possible power of two as 
0x3FFF, due to how it is computed by doing integer division of 0x7FFF / 
2.)

In the current implementation any input in the range of [0x4000, 
0x7FFF] is a valid signed integer input, but the method will return 
0x3FFF as the next valid max power of 2.

Two minor things need to be fixed:
* If the input is 0x4000, the return value needs to be 0x4000, instead 
of 0x3FFF which is not a valid power of 2.
* If the input is in the range [0x4001, 0x7FFF] I propose the method 
should instead throw an IllegalArgumentException and indicate that it is not 
possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-03 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6164:
---
Attachment: OAK-6164.patch.1

Attached patch with a fix, and with unit test for IOUtils.nextPowerOf2().

> IOUtils.nextPowerOf2() returns lower power of 2 for very high int values
> 
>
> Key: OAK-6164
> URL: https://issues.apache.org/jira/browse/OAK-6164
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Matt Ryan
>Priority: Minor
> Attachments: OAK-6164.patch.1
>
>
> In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
> However, there are valid signed integer values that this method accepts as 
> input, but for which a lower power of 2 value is returned.
> This occurs for values that are valid signed integer values that are greater 
> than the highest possible power of two value in the signed integer range.  
> Signed integer values have the maximum value of 0x7FFF, but the maximum 
> possible power of two in the signed integer range is 0x4000.  (The 
> current implementation incorrectly identifies the maximum possible power of 
> two as 0x3FFF, due to how it is computed by doing integer division of 
> 0x7FFF / 2.)
> In the current implementation any input in the range of [0x4000, 
> 0x7FFF] is a valid signed integer input, but the method will return 
> 0x3FFF as the next valid max power of 2.
> Two minor things need to be fixed:
> * If the input is 0x4000, the return value needs to be 0x4000, 
> instead of 0x3FFF which is not a valid power of 2.
> * If the input is in the range [0x4001, 0x7FFF] I propose the method 
> should instead throw an IllegalArgumentException and indicate that it is not 
> possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-03 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6164:
---
Attachment: OAK-6164.patch.2

As I thought a bit about patch 1, it occurred to me that adding a throws 
declaration for IllegalArgumentException is not really needed since it is a 
RuntimeException, but doing so changes the public API signature.  So I've 
attached OAK-6164.patch.2 which removes the throws declaration.

> IOUtils.nextPowerOf2() returns lower power of 2 for very high int values
> 
>
> Key: OAK-6164
> URL: https://issues.apache.org/jira/browse/OAK-6164
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Matt Ryan
>Priority: Minor
> Attachments: OAK-6164.patch.1, OAK-6164.patch.2
>
>
> In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
> However, there are valid signed integer values that this method accepts as 
> input, but for which a lower power of 2 value is returned.
> This occurs for values that are valid signed integer values that are greater 
> than the highest possible power of two value in the signed integer range.  
> Signed integer values have the maximum value of 0x7FFF, but the maximum 
> possible power of two in the signed integer range is 0x4000.  (The 
> current implementation incorrectly identifies the maximum possible power of 
> two as 0x3FFF, due to how it is computed by doing integer division of 
> 0x7FFF / 2.)
> In the current implementation any input in the range of [0x4000, 
> 0x7FFF] is a valid signed integer input, but the method will return 
> 0x3FFF as the next valid max power of 2.
> Two minor things need to be fixed:
> * If the input is 0x4000, the return value needs to be 0x4000, 
> instead of 0x3FFF which is not a valid power of 2.
> * If the input is in the range [0x4001, 0x7FFF] I propose the method 
> should instead throw an IllegalArgumentException and indicate that it is not 
> possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-04 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996899#comment-15996899
 ] 

Matt Ryan commented on OAK-6164:


[~mduerig] I wondered about doing that also.  The problem is that there will 
always be a range of values for which we can't compute the next highest power 
of two, for any integer type.  However, it may work to accept an {{int}} as the 
input but return a {{long}} as the output.

For any value in the range [0x4001, 0x7FFF], the next highest power of 
2 would be 0x8000.  Interpreted as an {{int}} value this is actually 
{{Integer.MIN_VALUE}} because of the sign bit, but this of course works as a 
{{long}}.  So this can work for all valid positive {{int}} input values.  But 
if the caller were to cast the result back to an {{int}} they would then end up 
with a negative number (--2147483648) even though the {{long}} value is a 
positive value (2147483648).

So that's a minor problem, but I think your point is that this would be better 
than not accepting all valid positive {{int}} values and throwing an exception 
when the range is exceeded.  I tend to agree.

I'll make another patch.

> IOUtils.nextPowerOf2() returns lower power of 2 for very high int values
> 
>
> Key: OAK-6164
> URL: https://issues.apache.org/jira/browse/OAK-6164
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: commons
>Reporter: Matt Ryan
>Assignee: Michael Dürig
>Priority: Minor
> Fix For: 1.7.0, 1.8
>
> Attachments: OAK-6164.patch.1, OAK-6164.patch.2
>
>
> In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
> However, there are valid signed integer values that this method accepts as 
> input, but for which a lower power of 2 value is returned.
> This occurs for values that are valid signed integer values that are greater 
> than the highest possible power of two value in the signed integer range.  
> Signed integer values have the maximum value of 0x7FFF, but the maximum 
> possible power of two in the signed integer range is 0x4000.  (The 
> current implementation incorrectly identifies the maximum possible power of 
> two as 0x3FFF, due to how it is computed by doing integer division of 
> 0x7FFF / 2.)
> In the current implementation any input in the range of [0x4000, 
> 0x7FFF] is a valid signed integer input, but the method will return 
> 0x3FFF as the next valid max power of 2.
> Two minor things need to be fixed:
> * If the input is 0x4000, the return value needs to be 0x4000, 
> instead of 0x3FFF which is not a valid power of 2.
> * If the input is in the range [0x4001, 0x7FFF] I propose the method 
> should instead throw an IllegalArgumentException and indicate that it is not 
> possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-04 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996899#comment-15996899
 ] 

Matt Ryan edited comment on OAK-6164 at 5/4/17 3:20 PM:


[~mduerig] I wondered about doing that also.  The problem is that there will 
always be a range of values for which we can't compute the next highest power 
of two, for any integer type.  However, it may work to accept an {{int}} as the 
input but return a {{long}} as the output.

For any value in the range [0x4001, 0x7FFF], the next highest power of 
2 would be 0x8000.  Interpreted as an {{int}} value this is actually 
{{Integer.MIN_VALUE}} because of the sign bit, but this of course works as a 
{{long}}.  So this can work for all valid positive {{int}} input values.  But 
if the caller were to cast the result back to an {{int}} they would then end up 
with a negative number (--2147483648) even though the {{long}} value is a 
positive value (2147483648).

So that's a minor problem, but I think your point is that this would be better 
than not accepting all valid positive {{int}} values and throwing an exception 
when the range is exceeded.  I tend to agree.

However, this does have the effect of changing the function signature (return 
value from {{int}} to {{long}}).  I don't think the function is actually used 
anywhere else in Oak so I'm not sure making that change matters too much, but 
it does technically constitute an API change.

I'll make another patch anyway, and then we can decide which we like best.


was (Author: mattvryan):
[~mduerig] I wondered about doing that also.  The problem is that there will 
always be a range of values for which we can't compute the next highest power 
of two, for any integer type.  However, it may work to accept an {{int}} as the 
input but return a {{long}} as the output.

For any value in the range [0x4001, 0x7FFF], the next highest power of 
2 would be 0x8000.  Interpreted as an {{int}} value this is actually 
{{Integer.MIN_VALUE}} because of the sign bit, but this of course works as a 
{{long}}.  So this can work for all valid positive {{int}} input values.  But 
if the caller were to cast the result back to an {{int}} they would then end up 
with a negative number (--2147483648) even though the {{long}} value is a 
positive value (2147483648).

So that's a minor problem, but I think your point is that this would be better 
than not accepting all valid positive {{int}} values and throwing an exception 
when the range is exceeded.  I tend to agree.

I'll make another patch.

> IOUtils.nextPowerOf2() returns lower power of 2 for very high int values
> 
>
> Key: OAK-6164
> URL: https://issues.apache.org/jira/browse/OAK-6164
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: commons
>Reporter: Matt Ryan
>Assignee: Michael Dürig
>Priority: Minor
> Fix For: 1.7.0, 1.8
>
> Attachments: OAK-6164.patch.1, OAK-6164.patch.2
>
>
> In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
> However, there are valid signed integer values that this method accepts as 
> input, but for which a lower power of 2 value is returned.
> This occurs for values that are valid signed integer values that are greater 
> than the highest possible power of two value in the signed integer range.  
> Signed integer values have the maximum value of 0x7FFF, but the maximum 
> possible power of two in the signed integer range is 0x4000.  (The 
> current implementation incorrectly identifies the maximum possible power of 
> two as 0x3FFF, due to how it is computed by doing integer division of 
> 0x7FFF / 2.)
> In the current implementation any input in the range of [0x4000, 
> 0x7FFF] is a valid signed integer input, but the method will return 
> 0x3FFF as the next valid max power of 2.
> Two minor things need to be fixed:
> * If the input is 0x4000, the return value needs to be 0x4000, 
> instead of 0x3FFF which is not a valid power of 2.
> * If the input is in the range [0x4001, 0x7FFF] I propose the method 
> should instead throw an IllegalArgumentException and indicate that it is not 
> possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6164) IOUtils.nextPowerOf2() returns lower power of 2 for very high int values

2017-05-04 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6164:
---
Attachment: OAK-6164.patch.3

Attached patch 3.  In this patch, the return type of nextPowerOf2 is changed 
from {{int}} to {{long}}, and it no longer throws {{IllegalArgumentException}}. 
 Unit tests have been updated.

> IOUtils.nextPowerOf2() returns lower power of 2 for very high int values
> 
>
> Key: OAK-6164
> URL: https://issues.apache.org/jira/browse/OAK-6164
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: commons
>Reporter: Matt Ryan
>Assignee: Michael Dürig
>Priority: Minor
> Fix For: 1.7.0, 1.8
>
> Attachments: OAK-6164.patch.1, OAK-6164.patch.2, OAK-6164.patch.3
>
>
> In the IOUtils.nextPowerOf2() method, all int values are accepted as input.  
> However, there are valid signed integer values that this method accepts as 
> input, but for which a lower power of 2 value is returned.
> This occurs for values that are valid signed integer values that are greater 
> than the highest possible power of two value in the signed integer range.  
> Signed integer values have the maximum value of 0x7FFF, but the maximum 
> possible power of two in the signed integer range is 0x4000.  (The 
> current implementation incorrectly identifies the maximum possible power of 
> two as 0x3FFF, due to how it is computed by doing integer division of 
> 0x7FFF / 2.)
> In the current implementation any input in the range of [0x4000, 
> 0x7FFF] is a valid signed integer input, but the method will return 
> 0x3FFF as the next valid max power of 2.
> Two minor things need to be fixed:
> * If the input is 0x4000, the return value needs to be 0x4000, 
> instead of 0x3FFF which is not a valid power of 2.
> * If the input is in the range [0x4001, 0x7FFF] I propose the method 
> should instead throw an IllegalArgumentException and indicate that it is not 
> possible to compute a next power of 2 for a number in that range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6173) Add unit test coverage for IOUtils.copy

2017-05-04 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6173:
--

 Summary: Add unit test coverage for IOUtils.copy
 Key: OAK-6173
 URL: https://issues.apache.org/jira/browse/OAK-6173
 Project: Jackrabbit Oak
  Issue Type: Test
  Components: commons
Reporter: Matt Ryan
Priority: Minor
 Fix For: 1.7.0, 1.8


There is no unit test coverage for {{IOUtils.copy}} in {{oak-commons}}.

I will add a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6173) Add unit test coverage for IOUtils.copy

2017-05-04 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6173:
---
Attachment: OAK-6173.patch.1

Attached patch with unit test implemented.

> Add unit test coverage for IOUtils.copy
> ---
>
> Key: OAK-6173
> URL: https://issues.apache.org/jira/browse/OAK-6173
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: commons
>Reporter: Matt Ryan
>Priority: Minor
>  Labels: unit-test-missing
> Fix For: 1.7.0, 1.8
>
> Attachments: OAK-6173.patch.1
>
>
> There is no unit test coverage for {{IOUtils.copy}} in {{oak-commons}}.
> I will add a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6175) Add unit test coverage for IOUtils.humanReadableByteCount

2017-05-04 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6175:
--

 Summary: Add unit test coverage for IOUtils.humanReadableByteCount
 Key: OAK-6175
 URL: https://issues.apache.org/jira/browse/OAK-6175
 Project: Jackrabbit Oak
  Issue Type: Test
  Components: commons
Reporter: Matt Ryan
Priority: Minor
 Fix For: 1.7.0, 1.8


There is no unit test coverage for {{IOUtils.humanReadableByteCount}} in 
{{oak-commons}}.

I will add a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6175) Add unit test coverage for IOUtils.humanReadableByteCount

2017-05-04 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6175:
---
Attachment: OAK-6175.patch.1

Attached patch file implementing unit test for 
{{IOUtils.humanReadableByteCount}}.

> Add unit test coverage for IOUtils.humanReadableByteCount
> -
>
> Key: OAK-6175
> URL: https://issues.apache.org/jira/browse/OAK-6175
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: commons
>Reporter: Matt Ryan
>Priority: Minor
>  Labels: unit-test-missing
> Fix For: 1.7.0, 1.8
>
> Attachments: OAK-6175.patch.1
>
>
> There is no unit test coverage for {{IOUtils.humanReadableByteCount}} in 
> {{oak-commons}}.
> I will add a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6175) Add unit test coverage for IOUtils.humanReadableByteCount

2017-05-08 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6175:
---
Attachment: OAK-6175.patch.4
OAK-6175.patch.3
OAK-6175.patch.2

I tend to agree with the hard-wired locale approach, but I also feel sheepish 
for having failed to test for that.

I've attached three new patches, and we can discuss which approach we prefer.

The first patch file, {{OAK-6175.patch.2}}, implements the hard-wired locale 
approach, specifying {{Locale.ENGLISH}} as the locale in the 
{{String.format()}} used to generate the result.

The second patch file, {{OAK-6175.patch.3}}, expands the test expectations such 
that the test will work in whatever locale the code is run.  Thus the output 
will depend on the locale, but the tests generate expected results also based 
on the locale.

The third patch file, {{OAK-6175.patch.4}}, supports a default locale of 
{{Locale.ENGLISH}} but also allows passing in a specific locale, and generates 
the output based on the locale that is passed in or uses {{Locale.ENGLISH}} if 
no locale was provided.

> Add unit test coverage for IOUtils.humanReadableByteCount
> -
>
> Key: OAK-6175
> URL: https://issues.apache.org/jira/browse/OAK-6175
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: commons
>Reporter: Matt Ryan
>Assignee: Michael Dürig
>Priority: Minor
>  Labels: unit-test-missing
> Fix For: 1.7.0, 1.8
>
> Attachments: OAK-6175.patch.1, OAK-6175.patch.2, OAK-6175.patch.3, 
> OAK-6175.patch.4
>
>
> There is no unit test coverage for {{IOUtils.humanReadableByteCount}} in 
> {{oak-commons}}.
> I will add a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6186) Add unit test coverage for LongUtils.safeAdd()

2017-05-08 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6186:
--

 Summary: Add unit test coverage for LongUtils.safeAdd()
 Key: OAK-6186
 URL: https://issues.apache.org/jira/browse/OAK-6186
 Project: Jackrabbit Oak
  Issue Type: Test
  Components: commons
Reporter: Matt Ryan
Priority: Minor


Add a unit test for LongUtils.safeAdd().

I will upload a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-6186) Add unit test coverage for LongUtils.safeAdd()

2017-05-08 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-6186:
---
Attachment: OAK-6186.patch.1

Attached patch with unit test for LongUtils.safeAdd().

> Add unit test coverage for LongUtils.safeAdd()
> --
>
> Key: OAK-6186
> URL: https://issues.apache.org/jira/browse/OAK-6186
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: commons
>Reporter: Matt Ryan
>Priority: Minor
>  Labels: unit-test-missing
> Attachments: OAK-6186.patch.1
>
>
> Add a unit test for LongUtils.safeAdd().
> I will upload a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6305) Add unit test coverage for SipHash

2017-06-02 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-6305:
--

 Summary: Add unit test coverage for SipHash
 Key: OAK-6305
 URL: https://issues.apache.org/jira/browse/OAK-6305
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: commons
Affects Versions: 1.6.1, 1.7.0, 1.6.0
Reporter: Matt Ryan
Priority: Minor


We should add a unit test for {{org.apache.jackrabbit.oak.commons.SipHash}}.  
Beyond simply attaining a code coverage number, it can provide some assurance 
that the functionality remains consistent should there ever be an 
implementation change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-6305) Add unit test coverage for SipHash

2017-06-02 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035285#comment-16035285
 ] 

Matt Ryan commented on OAK-6305:


I created a pull request with the missing unit test:  
https://github.com/apache/jackrabbit-oak/pull/60

This also has the effect of bringing the code coverage percentage for 
{{oak-commons}} over 70%.

> Add unit test coverage for SipHash
> --
>
> Key: OAK-6305
> URL: https://issues.apache.org/jira/browse/OAK-6305
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: commons
>Affects Versions: 1.6.0, 1.7.0, 1.6.1
>Reporter: Matt Ryan
>Priority: Minor
>  Labels: unit-test-missing
>
> We should add a unit test for {{org.apache.jackrabbit.oak.commons.SipHash}}.  
> Beyond simply attaining a code coverage number, it can provide some assurance 
> that the functionality remains consistent should there ever be an 
> implementation change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector

2017-06-28 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066625#comment-16066625
 ] 

Matt Ryan commented on OAK-6388:


I'm still not comfortable with the comments in {{AzureConstants.java}}.  I 
don't think they have the effect of clarifying the code.  In particular, the 
usage of the word "key" in each comment confuses me, because it doesn't seem to 
match the name of each constant.

For example:
{noformat}
/**
 * Azure storage account name key
 */
public static final String AZURE_STORAGE_ACCOUNT_NAME = "accessKey";
{noformat}
The value of the string is "accessKey".  Originally this was to be consistent 
with S3 which uses the config names "accessKey" and "secretKey".  In Azure's 
case the equivalent is the storage account name.  It isn't a key, it is just a 
name.  If we are going to put a comment here (esp. a JavaDoc comment) I think 
we should explain what it is - for example:  "The Azure storage account name.  
Used as the access key to connect to a storage account."

{noformat}
/**
 * Azure shared access signature key
 */
public static final String AZURE_SAS = "azureSas";
{noformat}
It isn't a key, it is the shared access signature.  If I were unfamiliar with 
the code and with Azure storage, I might see this and wonder "I have a shared 
access signature - but how do I get the key for it?"

> Enable Azure shared access signature for blob store connector
> -
>
> Key: OAK-6388
> URL: https://issues.apache.org/jira/browse/OAK-6388
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob-cloud
>Reporter: Andrei Kalfas
> Attachments: AzureSAS.patch, AzureSAS-v2.patch
>
>
> Azure storage account can be access with access keys or with shared access 
> signatures. Currently the blob connector only allows access keys, limiting 
> the use cases where the storage account must be regarded as a read only one. 
> Access keys enable all access while shared access signatures can be limited 
> to certain operations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector

2017-06-28 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066783#comment-16066783
 ] 

Matt Ryan commented on OAK-6388:


[~akalfas] I was referring to the comment, not the value of the constant.  I 
agree, if we were to want to change "accessKey" to something else that's a 
separate issue.  My point is that the JavaDoc comment should accurately 
describe what the variable actually represents, and in this case the variable 
represents an account name, not a key, so the comment should reflect that.

> Enable Azure shared access signature for blob store connector
> -
>
> Key: OAK-6388
> URL: https://issues.apache.org/jira/browse/OAK-6388
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob-cloud
>Reporter: Andrei Kalfas
> Attachments: AzureSAS.patch, AzureSAS-v2.patch
>
>
> Azure storage account can be access with access keys or with shared access 
> signatures. Currently the blob connector only allows access keys, limiting 
> the use cases where the storage account must be regarded as a read only one. 
> Access keys enable all access while shared access signatures can be limited 
> to certain operations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5960) Multi blob store support

2017-07-17 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090046#comment-16090046
 ] 

Matt Ryan commented on OAK-5960:


AKA CompositeDataStore (not ConsolidatedDataStore) - this is my fault, for some 
reason the word "Consolidated" keeps sticking in my brain, but 
CompositeDataStore was the name chosen on oak-dev.

I'm pretty sure [~tmueller] got the name from me when I used the wrong name 
stuck in my brain, my bad.

> Multi blob store support
> 
>
> Key: OAK-5960
> URL: https://issues.apache.org/jira/browse/OAK-5960
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: blob
>Reporter: Amit Jain
>
> Epic to collect issues and discussion for support for multi blob store which 
> could potentially support the following scenarios:
> * A primary writable blob store and read only secondary blob stores - useful 
> for cloud cross-geo deployments
> * Typed blob store where based on type passed the blobs are written and read 
> from corresponding blob stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OAK-5960) Multi blob store support

2017-12-19 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reassigned OAK-5960:
--

Assignee: Matt Ryan

> Multi blob store support
> 
>
> Key: OAK-5960
> URL: https://issues.apache.org/jira/browse/OAK-5960
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: blob
>Reporter: Amit Jain
>Assignee: Matt Ryan
>
> Epic to collect issues and discussion for support for multi blob store which 
> could potentially support the following scenarios:
> * A primary writable blob store and read only secondary blob stores - useful 
> for cloud cross-geo deployments
> * Typed blob store where based on type passed the blobs are written and read 
> from corresponding blob stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7083:
--

 Summary: CompositeDataStore - ReadOnly/ReadWrite Delegate Support
 Key: OAK-7083
 URL: https://issues.apache.org/jira/browse/OAK-7083
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
Reporter: Matt Ryan
Assignee: Matt Ryan


Support a specific composite data store use case, which is the following:
* One instance uses no composite data store, but instead is using a single 
standard Oak data store (e.g. FileDataStore)
* Another instance is created by snapshotting the first instance node store, 
and then uses a composite data store to refer to the first instance's data 
store read-only, and refers to a second data store as a writable data store

One way this can be used is in creating a test or staging instance from a 
production instance.  At creation, the test instance will look like production, 
but any changes made to the test instance do not affect production.  The test 
instance can be quickly created from production by cloning only the node store, 
and not requiring a copy of all the data in the data store.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OAK-5960) Multi blob store support

2017-12-19 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reassigned OAK-5960:
--

Assignee: (was: Matt Ryan)

> Multi blob store support
> 
>
> Key: OAK-5960
> URL: https://issues.apache.org/jira/browse/OAK-5960
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: blob
>Reporter: Amit Jain
>
> Epic to collect issues and discussion for support for multi blob store which 
> could potentially support the following scenarios:
> * A primary writable blob store and read only secondary blob stores - useful 
> for cloud cross-geo deployments
> * Typed blob store where based on type passed the blobs are written and read 
> from corresponding blob stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7084) Implement CompositeDataStore and CompositeDataStoreService

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7084:
--

 Summary: Implement CompositeDataStore and CompositeDataStoreService
 Key: OAK-7084
 URL: https://issues.apache.org/jira/browse/OAK-7084
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan
Assignee: Matt Ryan


This task is to wire up the composite data store and the corresponding service, 
respond to OSGi events when delegates are created, register the service 
appropriately, implement the data store portions, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7084) Implement CompositeDataStore and CompositeDataStoreService

2017-12-19 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297199#comment-16297199
 ] 

Matt Ryan commented on OAK-7084:


There is a pull request in for review at 
https://github.com/apache/jackrabbit-oak/pull/74, open for review.

> Implement CompositeDataStore and CompositeDataStoreService
> --
>
> Key: OAK-7084
> URL: https://issues.apache.org/jira/browse/OAK-7084
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> This task is to wire up the composite data store and the corresponding 
> service, respond to OSGi events when delegates are created, register the 
> service appropriately, implement the data store portions, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7084) Implement CompositeDataStore and CompositeDataStoreService

2017-12-19 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297219#comment-16297219
 ] 

Matt Ryan commented on OAK-7084:


The composite data store and delegates are configured as follows:

Delegates use a factory to be configured.  They must specify a role name (any 
string) and any other configuration specific to that delegate (e.g. 
readOnly=true).  The composite configuration simply takes a "roles" property 
with a value containing a comma-delimited list of roles that composite uses, 
for example "roles=production,staging".

> Implement CompositeDataStore and CompositeDataStoreService
> --
>
> Key: OAK-7084
> URL: https://issues.apache.org/jira/browse/OAK-7084
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> This task is to wire up the composite data store and the corresponding 
> service, respond to OSGi events when delegates are created, register the 
> service appropriately, implement the data store portions, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7086) Add FileDataStoreFactory

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7086:
--

 Summary: Add FileDataStoreFactory
 Key: OAK-7086
 URL: https://issues.apache.org/jira/browse/OAK-7086
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan
Assignee: Matt Ryan


This allows FileDataStore to be used as a delegate for CompositeDataStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7086) Add FileDataStoreFactory

2017-12-19 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297222#comment-16297222
 ] 

Matt Ryan commented on OAK-7086:


The original implementation for this was submitted for review in this pull 
request:  https://github.com/apache/jackrabbit-oak/pull/71

Since that time it has been merged into the primary pull request for composite 
data store, which is open for review:  
https://github.com/apache/jackrabbit-oak/pull/74

> Add FileDataStoreFactory
> 
>
> Key: OAK-7086
> URL: https://issues.apache.org/jira/browse/OAK-7086
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> This allows FileDataStore to be used as a delegate for CompositeDataStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7087) Add S3DataStoreFactory

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7087:
--

 Summary: Add S3DataStoreFactory
 Key: OAK-7087
 URL: https://issues.apache.org/jira/browse/OAK-7087
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan
Assignee: Matt Ryan


Support using S3DataStore as delegates for CompositeDataStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7087) Add S3DataStoreFactory

2017-12-19 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297242#comment-16297242
 ] 

Matt Ryan commented on OAK-7087:


Available for review in the composite data store pull request:  
https://github.com/apache/jackrabbit-oak/pull/74

> Add S3DataStoreFactory
> --
>
> Key: OAK-7087
> URL: https://issues.apache.org/jira/browse/OAK-7087
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> Support using S3DataStore as delegates for CompositeDataStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7088) Add AzureDataStoreFactory

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7088:
--

 Summary: Add AzureDataStoreFactory
 Key: OAK-7088
 URL: https://issues.apache.org/jira/browse/OAK-7088
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan
Assignee: Matt Ryan


Support using AzureDataStore as delegates for CompositeDataStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7089) Populate composite data store blob ID table at startup

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7089:
--

 Summary: Populate composite data store blob ID table at startup
 Key: OAK-7089
 URL: https://issues.apache.org/jira/browse/OAK-7089
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan
Assignee: Matt Ryan


The composite data store blob ID mapping table is used to quickly find the 
delegate that is handling this blob ID.  At startup we need to fill the table 
with appropriate mappings in order for it to be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7090) Use Bloom filters for composite data store blob ID lookup table

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7090:
--

 Summary: Use Bloom filters for composite data store blob ID lookup 
table
 Key: OAK-7090
 URL: https://issues.apache.org/jira/browse/OAK-7090
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan


The composite data store attempts to keep a mapping of blob ids to delegates 
where that blob id should be found.  We should use Bloom filters to make this 
mapping more efficient.

There are a couple of challenges with implementing Bloom filters for this 
purpose.
# Determining the appropriate size of the Bloom filter.  Assuming OAK-7089 is 
completed before this one, we should have a reasonable guess as to the number 
of blob IDs at startup time, but this may change over time.  This may require a 
task to rebuild the table for a more appropriate size once the table becomes 
too full (too many false positives).
# Handling deletions.  Once a record has been deleted, the corresponding blob 
ID may also need to be removed (similar algorithm to data store GC).  Bloom 
filters don't typically handle deletions though.  This may require something 
like e.g. [Invertible Bloom 
Filter|http://www.i-programmer.info/programming/theory/4641-the-invertible-bloom-filter.html],
 or this may be as simple as using data store GC time to rebuild the Bloom 
filter appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OAK-7089) Populate composite data store blob ID table at startup

2017-12-19 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reassigned OAK-7089:
--

Assignee: (was: Matt Ryan)

> Populate composite data store blob ID table at startup
> --
>
> Key: OAK-7089
> URL: https://issues.apache.org/jira/browse/OAK-7089
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>
> The composite data store blob ID mapping table is used to quickly find the 
> delegate that is handling this blob ID.  At startup we need to fill the table 
> with appropriate mappings in order for it to be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7091) Avoid streaming data twice in composite data store

2017-12-19 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7091:
--

 Summary: Avoid streaming data twice in composite data store
 Key: OAK-7091
 URL: https://issues.apache.org/jira/browse/OAK-7091
 Project: Jackrabbit Oak
  Issue Type: Technical task
Reporter: Matt Ryan
Assignee: Matt Ryan


When adding a new record to an Oak instance that is using composite data store, 
the blob stream will be read twice before it is stored - once by the composite 
data store (to determine the blob ID) and again by the delegate.  We could add 
a method to the CompositeDataStoreAware interface wherein the data store can be 
told which blob ID to use (from the composite) so that it doesn't have to 
process the stream again.  Then the composite data store, after having read the 
stream to a temporary file, can pass an input stream from the temporary file to 
the delegate along with the computed blob ID, to avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313415#comment-16313415
 ] 

Matt Ryan commented on OAK-7091:


The first scenario I'm developing for the composite data store supports a 
single read-only delegate and a single writable delegate, so this capability is 
technically not needed until the composite data store supports multiple 
writable delegates.  Instead for now the composite data store can just pass the 
stream along to the only writable delegate.

If/when this capability is added to the composite data store, we could also add 
a method to the delegate handler to ask how many writable delegates exist.  If 
there is only one, the composite data store can optimize and avoid computing 
the blob ID, and simply pass the stream along to the only writable delegate.

> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7091:
---
Description: 
When adding a new record to an Oak instance that is using composite data store, 
the blob stream will be read twice before it is stored - once by the composite 
data store (to determine the blob ID) and again by the delegate.  This 
necessary because if there are multiple writable delegates and one delegate 
already has a matching blob, the composite should call {{addRecord()}} on the 
delegate that has the matching blob, which may not be the highest priority 
delegate.  So we need to know the blob ID in order to select the correct 
writable delegate.

We could add a method to the CompositeDataStoreAware interface wherein the data 
store can be told which blob ID to use (from the composite) so that it doesn't 
have to process the stream again.  Then the composite data store, after having 
read the stream to a temporary file, can pass an input stream from the 
temporary file to the delegate along with the computed blob ID, to avoid 
reading the stream twice.

  was:When adding a new record to an Oak instance that is using composite data 
store, the blob stream will be read twice before it is stored - once by the 
composite data store (to determine the blob ID) and again by the delegate.  We 
could add a method to the CompositeDataStoreAware interface wherein the data 
store can be told which blob ID to use (from the composite) so that it doesn't 
have to process the stream again.  Then the composite data store, after having 
read the stream to a temporary file, can pass an input stream from the 
temporary file to the delegate along with the computed blob ID, to avoid 
reading the stream twice.


> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> This necessary because if there are multiple writable delegates and one 
> delegate already has a matching blob, the composite should call 
> {{addRecord()}} on the delegate that has the matching blob, which may not be 
> the highest priority delegate.  So we need to know the blob ID in order to 
> select the correct writable delegate.
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reassigned OAK-7091:
--

Assignee: (was: Matt Ryan)

> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> This necessary because if there are multiple writable delegates and one 
> delegate already has a matching blob, the composite should call 
> {{addRecord()}} on the delegate that has the matching blob, which may not be 
> the highest priority delegate.  So we need to know the blob ID in order to 
> select the correct writable delegate.
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7091) Avoid streaming data twice in composite data store

2018-01-05 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7091:
---
Issue Type: Task  (was: Technical task)
Parent: (was: OAK-7083)

> Avoid streaming data twice in composite data store
> --
>
> Key: OAK-7091
> URL: https://issues.apache.org/jira/browse/OAK-7091
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>
> When adding a new record to an Oak instance that is using composite data 
> store, the blob stream will be read twice before it is stored - once by the 
> composite data store (to determine the blob ID) and again by the delegate.  
> This necessary because if there are multiple writable delegates and one 
> delegate already has a matching blob, the composite should call 
> {{addRecord()}} on the delegate that has the matching blob, which may not be 
> the highest priority delegate.  So we need to know the blob ID in order to 
> select the correct writable delegate.
> We could add a method to the CompositeDataStoreAware interface wherein the 
> data store can be told which blob ID to use (from the composite) so that it 
> doesn't have to process the stream again.  Then the composite data store, 
> after having read the stream to a temporary file, can pass an input stream 
> from the temporary file to the delegate along with the computed blob ID, to 
> avoid reading the stream twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7241) oak-run documentation typo for "checkpoints" command

2018-02-02 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7241:
--

 Summary: oak-run documentation typo for "checkpoints" command
 Key: OAK-7241
 URL: https://issues.apache.org/jira/browse/OAK-7241
 Project: Jackrabbit Oak
  Issue Type: Bug
Affects Versions: 1.8.1
Reporter: Matt Ryan


In online documentation for {{oak-run}}, the "checkpoints" command is described 
as follows:
java -mx4g -jar oak-run-*.jar checkpoint 
This command is not recognized and results in {{oak-run}} displaying available 
commands, one of which is "checkpoints" (with an "s" at the end).

"checkpoints" seems to match the {{oak-run}} {{README.md}} file:  
[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md]

as well as the code:  
[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/run/AvailableModes.java]

Changing the command above to use "checkpoints" appears to work, so I assume it 
is just a typo in {{oak-doc}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (OAK-7241) oak-run documentation typo for "checkpoints" command

2018-02-02 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reassigned OAK-7241:
--

Assignee: Matt Ryan

> oak-run documentation typo for "checkpoints" command
> 
>
> Key: OAK-7241
> URL: https://issues.apache.org/jira/browse/OAK-7241
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Minor
>
> In online documentation for {{oak-run}}, the "checkpoints" command is 
> described as follows:
> java -mx4g -jar oak-run-*.jar checkpoint 
> This command is not recognized and results in {{oak-run}} displaying 
> available commands, one of which is "checkpoints" (with an "s" at the end).
> "checkpoints" seems to match the {{oak-run}} {{README.md}} file:  
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md]
> as well as the code:  
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/run/AvailableModes.java]
> Changing the command above to use "checkpoints" appears to work, so I assume 
> it is just a typo in {{oak-doc}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7241) oak-run documentation typo for "checkpoints" command

2018-02-02 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350978#comment-16350978
 ] 

Matt Ryan commented on OAK-7241:


Submitted pull request:  [https://github.com/apache/jackrabbit-oak/pull/78]

 

> oak-run documentation typo for "checkpoints" command
> 
>
> Key: OAK-7241
> URL: https://issues.apache.org/jira/browse/OAK-7241
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Minor
>
> In online documentation for {{oak-run}}, the "checkpoints" command is 
> described as follows:
> java -mx4g -jar oak-run-*.jar checkpoint 
> This command is not recognized and results in {{oak-run}} displaying 
> available commands, one of which is "checkpoints" (with an "s" at the end).
> "checkpoints" seems to match the {{oak-run}} {{README.md}} file:  
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md]
> as well as the code:  
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/run/AvailableModes.java]
> Changing the command above to use "checkpoints" appears to work, so I assume 
> it is just a typo in {{oak-doc}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7241) oak-run documentation typo for "checkpoints" command

2018-02-02 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7241:
---
Description: 
In online documentation for {{oak-run}}, the "checkpoints" command is described 
as follows:
{{java -mx4g -jar oak-run*.jar checkpoint }}
 This command is not recognized and results in {{oak-run}} displaying available 
commands, one of which is "checkpoints" (with an "s" at the end).

"checkpoints" seems to match the {{oak-run}} {{README.md}} file:  
[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md]

as well as the code:  
[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/run/AvailableModes.java]

Changing the command above to use "checkpoints" appears to work, so I assume it 
is just a typo in {{oak-doc}}.

  was:
In online documentation for {{oak-run}}, the "checkpoints" command is described 
as follows:
java -mx4g -jar oak-run-*.jar checkpoint 
This command is not recognized and results in {{oak-run}} displaying available 
commands, one of which is "checkpoints" (with an "s" at the end).

"checkpoints" seems to match the {{oak-run}} {{README.md}} file:  
[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md]

as well as the code:  
[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/run/AvailableModes.java]

Changing the command above to use "checkpoints" appears to work, so I assume it 
is just a typo in {{oak-doc}}.


> oak-run documentation typo for "checkpoints" command
> 
>
> Key: OAK-7241
> URL: https://issues.apache.org/jira/browse/OAK-7241
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Minor
>
> In online documentation for {{oak-run}}, the "checkpoints" command is 
> described as follows:
> {{java -mx4g -jar oak-run*.jar checkpoint }}
>  This command is not recognized and results in {{oak-run}} displaying 
> available commands, one of which is "checkpoints" (with an "s" at the end).
> "checkpoints" seems to match the {{oak-run}} {{README.md}} file:  
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md]
> as well as the code:  
> [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/run/AvailableModes.java]
> Changing the command above to use "checkpoints" appears to work, so I assume 
> it is just a typo in {{oak-doc}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7260) oak-run throws IllegalStateException in backup mode

2018-02-12 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7260:
--

 Summary: oak-run throws IllegalStateException in backup mode
 Key: OAK-7260
 URL: https://issues.apache.org/jira/browse/OAK-7260
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: oak-run
Reporter: Matt Ryan


I try to run {{oak-run}} to back up my repository according to the [online 
documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
this:

{{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
./repo.bak}}

When I do this I get an {{IllegalStateException}}:

{{java.lang.IllegalStateException: Attempt to read external blob 
with}}{{blobId}}{{[3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188]
 without}}{{specifying BlobStore}}

 

It is unclear how to proceed, based on the documentation.  It doesn't tell me 
how to specify a blob store path.  Expected behavior should be:
 # Tool help gives correct options for correct usage
 # Online documentation gives correct options for correct usage
 # Tool handles exception gracefully instead of throwing stack trace

 

Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
custom configuration.  I tested with {{oak-run}} built from sources as well as 
multiple downloaded versions, all gave the same behavior.  Path to repository 
is correct and I verified the blob actually does exist in the blob store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7260) oak-run throws IllegalStateException in backup mode

2018-02-12 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361629#comment-16361629
 ] 

Matt Ryan commented on OAK-7260:


Attached stack trace [^OAK-7260-stacktrace.txt].

> oak-run throws IllegalStateException in backup mode
> ---
>
> Key: OAK-7260
> URL: https://issues.apache.org/jira/browse/OAK-7260
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: oak-run
>Reporter: Matt Ryan
>Priority: Minor
> Attachments: OAK-7260-stacktrace.txt
>
>
> I try to run {{oak-run}} to back up my repository according to the [online 
> documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
> this:
> {{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
> ./repo.bak}}
> When I do this I get an {{IllegalStateException}}:
> {{java.lang.IllegalStateException: Attempt to read external blob 
> with}}{{blobId}}{{[3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188]
>  without}}{{specifying BlobStore}}
>  
> It is unclear how to proceed, based on the documentation.  It doesn't tell me 
> how to specify a blob store path.  Expected behavior should be:
>  # Tool help gives correct options for correct usage
>  # Online documentation gives correct options for correct usage
>  # Tool handles exception gracefully instead of throwing stack trace
>  
> Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
> custom configuration.  I tested with {{oak-run}} built from sources as well 
> as multiple downloaded versions, all gave the same behavior.  Path to 
> repository is correct and I verified the blob actually does exist in the blob 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7260) oak-run throws IllegalStateException in backup mode

2018-02-12 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7260:
---
Attachment: OAK-7260-stacktrace.txt

> oak-run throws IllegalStateException in backup mode
> ---
>
> Key: OAK-7260
> URL: https://issues.apache.org/jira/browse/OAK-7260
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: oak-run
>Reporter: Matt Ryan
>Priority: Minor
> Attachments: OAK-7260-stacktrace.txt
>
>
> I try to run {{oak-run}} to back up my repository according to the [online 
> documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
> this:
> {{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
> ./repo.bak}}
> When I do this I get an {{IllegalStateException}}:
> {{java.lang.IllegalStateException: Attempt to read external blob 
> with}}{{blobId}}{{[3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188]
>  without}}{{specifying BlobStore}}
>  
> It is unclear how to proceed, based on the documentation.  It doesn't tell me 
> how to specify a blob store path.  Expected behavior should be:
>  # Tool help gives correct options for correct usage
>  # Online documentation gives correct options for correct usage
>  # Tool handles exception gracefully instead of throwing stack trace
>  
> Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
> custom configuration.  I tested with {{oak-run}} built from sources as well 
> as multiple downloaded versions, all gave the same behavior.  Path to 
> repository is correct and I verified the blob actually does exist in the blob 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7260) oak-run throws IllegalStateException in backup mode

2018-02-12 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361632#comment-16361632
 ] 

Matt Ryan commented on OAK-7260:


I also found documentation 
[here|https://jackrabbit.apache.org/oak/docs/nodestore/segment/overview.html#backup]
 that differs from the source I mentioned in the description.  Which one is 
official?  Should one be retired or made to point to the other?

> oak-run throws IllegalStateException in backup mode
> ---
>
> Key: OAK-7260
> URL: https://issues.apache.org/jira/browse/OAK-7260
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: oak-run
>Reporter: Matt Ryan
>Priority: Minor
> Attachments: OAK-7260-stacktrace.txt
>
>
> I try to run {{oak-run}} to back up my repository according to the [online 
> documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
> this:
> {{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
> ./repo.bak}}
> When I do this I get an {{IllegalStateException}}:
> {{java.lang.IllegalStateException: Attempt to read external blob 
> with}}{{blobId}}{{[3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188]
>  without}}{{specifying BlobStore}}
>  
> It is unclear how to proceed, based on the documentation.  It doesn't tell me 
> how to specify a blob store path.  Expected behavior should be:
>  # Tool help gives correct options for correct usage
>  # Online documentation gives correct options for correct usage
>  # Tool handles exception gracefully instead of throwing stack trace
>  
> Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
> custom configuration.  I tested with {{oak-run}} built from sources as well 
> as multiple downloaded versions, all gave the same behavior.  Path to 
> repository is correct and I verified the blob actually does exist in the blob 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7260) oak-run throws IllegalStateException in backup mode

2018-02-12 Thread Matt Ryan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7260:
---
Description: 
I try to run {{oak-run}} to back up my repository according to the [online 
documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
this:

{{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
./repo.bak}}

When I do this I get an {{IllegalStateException}}:

{{java.lang.IllegalStateException: Attempt to read external blob with 
}}{{blobId }}
{{[ 3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188 ]}}
{{without specifying BlobStore}}

 

It is unclear how to proceed, based on the documentation.  It doesn't tell me 
how to specify a blob store path.  Expected behavior should be:
 # Tool help gives correct options for correct usage
 # Online documentation gives correct options for correct usage
 # Tool handles exception gracefully instead of throwing stack trace

 

Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
custom configuration.  I tested with {{oak-run}} built from sources as well as 
multiple downloaded versions, all gave the same behavior.  Path to repository 
is correct and I verified the blob actually does exist in the blob store.

  was:
I try to run {{oak-run}} to back up my repository according to the [online 
documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
this:

{{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
./repo.bak}}

When I do this I get an {{IllegalStateException}}:

{{java.lang.IllegalStateException: Attempt to read external blob 
with}}{{blobId}}{{[3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188]
 without}}{{specifying BlobStore}}

 

It is unclear how to proceed, based on the documentation.  It doesn't tell me 
how to specify a blob store path.  Expected behavior should be:
 # Tool help gives correct options for correct usage
 # Online documentation gives correct options for correct usage
 # Tool handles exception gracefully instead of throwing stack trace

 

Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
custom configuration.  I tested with {{oak-run}} built from sources as well as 
multiple downloaded versions, all gave the same behavior.  Path to repository 
is correct and I verified the blob actually does exist in the blob store.


> oak-run throws IllegalStateException in backup mode
> ---
>
> Key: OAK-7260
> URL: https://issues.apache.org/jira/browse/OAK-7260
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: oak-run
>Reporter: Matt Ryan
>Priority: Minor
> Attachments: OAK-7260-stacktrace.txt
>
>
> I try to run {{oak-run}} to back up my repository according to the [online 
> documentation|https://jackrabbit.apache.org/oak/docs/command_line.html], like 
> this:
> {{java -mx4g -jar oak-run-1.10-SNAPSHOT.jar backup repository/segmentstore 
> ./repo.bak}}
> When I do this I get an {{IllegalStateException}}:
> {{java.lang.IllegalStateException: Attempt to read external blob with 
> }}{{blobId }}
> {{[ 3e15c88da9b46267141b60826c0734007c3e5a5f804c953edb0280eda680a9be#174188 
> ]}}
> {{without specifying BlobStore}}
>  
> It is unclear how to proceed, based on the documentation.  It doesn't tell me 
> how to specify a blob store path.  Expected behavior should be:
>  # Tool help gives correct options for correct usage
>  # Online documentation gives correct options for correct usage
>  # Tool handles exception gracefully instead of throwing stack trace
>  
> Configuration:  Default Oak setup with SegmentNodeStore and FileDataStore, no 
> custom configuration.  I tested with {{oak-run}} built from sources as well 
> as multiple downloaded versions, all gave the same behavior.  Path to 
> repository is correct and I verified the blob actually does exist in the blob 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373153#comment-16373153
 ] 

Matt Ryan commented on OAK-7083:


The following pull requests have been created for this issue:
 * [https://github.com/apache/jackrabbit-oak/pull/71] - Changes made to 
{{oak-blob-plugins}} and {{oak-blob}} to support {{CompositeDataStore}}.  This 
includes the following:

 ** Adding a {{DataStoreProvider}} interface to {{oak-blob}} so that delegate 
data stores can associate a role to themselves.
 ** Implementing {{AbstractDataStoreFactory}} in {{oak-blob-plugins}} so 
multiple data stores can be configured as factory classes.
 ** Implementing {{FileDataStoreFactory}} in {{oak-blob-plugins}} to provide 
this capability for {{FileDataStore}}.
 * 
[https://github.com/apache/jackrabbit-oak/pull/80|https://github.com/apache/jackrabbit-oak/pull/80/files]
 - Changes made to {{oak-blob-plugins}} to {{MarkSweepGarbageCollector}} so 
that garbage collection will work for {{CompositeDataStore}}.
 * [https://github.com/apache/jackrabbit-oak/pull/74] - Includes the following 
changes:
 ** Addition of {{S3DataStoreFactory}} to {{oak-blob-cloud}}
 ** Creation of {{oak-blob-composite}} which implements {{CompositeDataStore}} 
(service, data store, supporting code, unit tests, etc.)

 

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373219#comment-16373219
 ] 

Matt Ryan commented on OAK-7083:


I'm going to be moving a conversation from the oak-dev list to the ticket over 
the next few comments.

In doing so, I want to make sure we consider the larger picture.  **It is 
important to remember that right now we are talking about a very specific use 
case of {{CompositeDataStore}}, which is the purpose of this issue:  To support 
the ReadOnly/ReadWrite Delegate scenario.  However, we of course want to avoid 
making design decisions that limit the usability of the {{CompositeDataStore}} 
in the future.  Other uses may not always have exactly one read-only delegate - 
there may be multiple writable delegates, with zero or more read-only 
delegates, in any number of combinations.  I want to avoid designing ourselves 
into a corner where we can't support other use cases easily.

 

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373222#comment-16373222
 ] 

Matt Ryan commented on OAK-7083:


Since [https://github.com/apache/jackrabbit-oak/pull/80] entails a change to 
the {{MarkSweepGarbageCollector}}, we should discuss the change here to see if 
there are concerns with it or if there is a better approach.
 
Let me try to briefly explain the change I've proposed to the 
{{MarkSweepGarbageCollector}}.
 
In the use case I tested there are two Oak repositories, one which we will call 
primary and one which we will call secondary.  Primary gets created first; 
secondary is created by cloning the node store of primary, then using a 
{{CompositeDataStore}} to have two delegate data stores.  The first delegate is 
the same as the data store for primary, in read-only mode.  The second delegate 
is only accessible by the secondary repo.
 
Let the data store shared by primary and secondary be called DS_P and the data 
store being used only by the secondary be called DS_S.  DS_P can be read by 
secondary but not modified, so all changes on secondary are saved in DS_S.  
Primary can still make changes to DS_P.
 
Suppose after creating both repositories, records A and B are deleted from the 
primary repo, and records B and C are deleted from the secondary repo.  Since 
DS_P is shared, only blob B should actually be deleted from DS_P via GC.  After 
both repositories run their “mark” phase, the primary repo created a 
“references” file in DS_P excluding A and B, meaning primary thinks A and B can 
both be deleted.  And the secondary repo created a “references” file in DS_P 
excluding B and C, meaning secondary thinks B and C can both be deleted.
 
Suppose then primary runs the sweep phase first.  It will first verify that it 
has a references file for each repository file in DS_P.  Since both primary and 
secondary put one there this test passes.  It will then merge all the data in 
all the references files in DS_P with its own local view of the existing blobs, 
and come up with a set of blobs to delete.  Primary will conclude that blobs B 
and C should be deleted - B because both primary and secondary said it is 
deleted, and C because secondary said it should be deleted and primary has no 
knowledge of C so it will assume it is okay to delete.  At this point primary 
will delete B and try to delete C and fail (which is ok).  Then primary will 
delete its “references” file from DS_P and call the sweep phase complete.
 
Now the problem comes when secondary tries to run the sweep phase.  It will 
first try to verify that a references file exists for each repository file in 
DS_P - and fail.  This fails because primary deleted its references file 
already.  Thus secondary will cancel GC and thus blob C never ends up getting 
deleted.  Note that secondary must delete C because it is the only repository 
that knows about C.
 
This same situation exists also if secondary sweeps first.  If record D was 
created by primary after secondary was cloned, then D is deleted by primary, 
secondary never knows about blob D so it cannot delete it during the sweep 
phase - it can only be deleted by primary.
 
 
The change I made to the garbage collector is that when a repository finishes 
the sweep phase, it doesn’t necessarily delete the references file.  Instead it 
marks the data store with a “sweepComplete” file indicating that this 
repository finished the sweep phase.  When there is a “sweepComplete” file for 
every repository (in other words, the last repository to sweep), then all the 
references files are deleted.
 
I wrote an integration test to test DSGC for this specific composite data store 
use case at 
[https://github.com/mattvryan/jackrabbit-oak/blob/39b33fe94a055ef588791f238eb85734c34062f3/oak-blob-composite/src/test/java/org/apache/jackrabbit/oak/blob/composite/CompositeDataStoreRORWIT.java.]
 
 
All the oak unit tests pass with this change.  I am concerned about any 
unforeseen consequences that others on-list may have about this change.  Also 
there’s the issue that sweeping must now be done by every repository sharing 
the data store, which will have some inefficiencies.  I’m open to changes or to 
a different approach if we can solve the problem described above still.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373229#comment-16373229
 ] 

Matt Ryan commented on OAK-7083:


(From [~amjain] via oak-dev):

 
{quote}Now the problem comes when secondary tries to run the sweep phase.  It 
will first try to verify that a references file exists for each repository file 
in DS_P - and fail.  This fails because primary deleted its references file 
already.  Thus secondary will cancel GC and thus blob C never ends up getting 
deleted.  Note that secondary must delete C because it is the only repository 
that knows about C.

This same situation exists also if secondary sweeps first.  If record D was 
created by primary after secondary was cloned, then D is deleted by primary, 
secondary never knows about blob D so it cannot delete it during the sweep 
phase - it can only be deleted by primary.
{quote}
 

The solution for {{SharedDataStore}} currently is to require all repositories 
to run a Mark phase then run the Sweep phase on one of them.

 
{quote}The change I made to the garbage collector is that when a repository 
finishes the sweep phase, it doesn’t necessarily delete the references file.  
Instead it marks the data store with a “sweepComplete” file indicating that 
this repository finished the sweep phase.  When there is a “sweepComplete” file 
for every repository (in other words, the last repository to sweep), then all 
the references files are deleted.
{quote}
 

Well currently the problem is that all repositories are not required to run the 
sweep phase. The solution above would have been ok when the GC is to be run 
manually at different times as in your case. But in the real world applications 
typically there's a cron (e.g. AEM maintenance task) which could be setup to 
execute weekly at a particular time on all repositories. In this case in almost 
all cases the repository which finished the Mark phase at the last would only 
be able to execute the Sweep phase as it would be the only repository to see 
all the reference files for other repos (others executing before it would 
fail). This is still Ok for the {{SharedDataStore}} use cases we have. But with 
the above solution since not all repositories would be able to run the sweep 
phase the reference files won't be cleaned up. 

Besides there's a problem of the Sweep phase on the primary encountering blobs 
it does not know about (from the secondary) and which it cannot delete creating 
an unpleasant experience. As I understand the Primary could be a production 
system and having these sort of errors crop up would be problematic. 

So, generically the solution would be to use the shared {{DataStore}} GC 
paradigm we currently have which requires Mark phase to be run on all 
repositories before running a Sweep. 

For this specific use case some observations and quick rough sketch of a 
possible solution:
 * The {{DataStore}}s for the 2 repositories - Primary & Secondary can be 
thought of as Shared & Private
 ** Primary does not know about Secondary and could be an existing repository 
and thus does not know about the {{DataStore}} of the Secondary as well. In 
other words it could even function as a normal {{DataStore}} and need not be a 
{{CompositeDataStore}}.
 ** Secondary does need to know about the Primary and thus registers itself as 
sharing the Primary {{DataStore}}.
 * Encode the blobs ids on the Secondary with the {{DataStore}} location/type 
with which we can distinguish the blob ids belonging to the respective 
{{DataStore}}s.
 * Secondary's Mark phase only redirects the Primary owned blobids to the 
references file in the Primary's {{DataStore}} (Primary's {{DataStore}} 
operating as Shared).
 * Secondary executes GC for its {{DataStore}} independently and does not worry 
about the Shared blobids (already taken care of above). 


I presume some of the above steps are required to enable a generic or even some 
restricted {{CompositeDataStore}} solutions.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in c

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373241#comment-16373241
 ] 

Matt Ryan commented on OAK-7083:


 
{quote} bq. Now the problem comes when secondary tries to run the sweep phase. 
It will first try to verify that a references file exists for each repository 
file in DS_P - and fail. This fails because primary deleted its references file 
already. Thus secondary will cancel GC and thus blob C never ends up getting 
deleted. Note that secondary must delete C because it is the only repository 
that knows about C.
 bq. This same situation exists also if secondary sweeps first. If record D was 
created by primary after secondary was cloned, then D is deleted by primary, 
secondary never knows about blob D so it cannot delete it during the sweep 
phase - it can only be deleted by primary.

The solution for SharedDataStore currently is to require all repositories to 
run a Mark phase then run the Sweep phase on one of them.
{quote}

Yes.  Sorry, I didn’t mention that.  I was trying to be brief and ended up 
being unclear.  In the situation I described above it is definitely running the 
mark phase first and then the sweep phase.  The problem is still as I described 
- no matter which one runs sweep first, it cannot delete all the binaries that 
may possibly have been deleted on both systems.

{quote} bq. The change I made to the garbage collector is that when a 
repository finishes the sweep phase, it doesn’t necessarily delete the 
references file. Instead it marks the data store with a “sweepComplete” file 
indicating that this repository finished the sweep phase. When there is a 
“sweepComplete” file for every repository (in other words, the last repository 
to sweep), then all the references files are deleted.

Well currently the problem is that all repositories are not required to run the 
sweep phase. The solution above would have been ok when the GC is to be run 
manually at different times as in your case.
{quote}
Exactly - in the case I’ve described both have to successfully run a sweep or 
not all binaries will be deleted.
{quote}But in the real world applications typically there's a cron (e.g. AEM 
maintenance task) which could be setup to execute weekly at a particular time 
on all repositories. In this case in almost all cases the repository which 
finished the Mark phase at the last would only be able to execute the Sweep 
phase as it would be the only repository to see all the reference files for 
other repos (others executing before it would fail). This is still Ok for the 
{{SharedDataStore}} use cases we have. But with the above solution since not 
all repositories would be able to run the sweep phase the reference files won't 
be cleaned up.
{quote}
A very valid point.  I'll need to think that one through some more.
{quote}Besides there's a problem of the Sweep phase on the primary encountering 
blobs it does not know about (from the secondary) and which it cannot delete 
creating an unpleasant experience. As I understand the Primary could be a 
production system and having these sort of errors crop up would be problematic.
{quote}
If they are regarded as errors, yes.  Currently this logs a WARN level message 
(not an ERROR) which suggests that sometimes not all the binaries targeted for 
deletion will actually be deleted.

So this might be an issue of setting clear expectations.  But I do see the 
point.
{quote}So, generically the solution would be to use the shared {{DataStore}} GC 
paradigm we currently have which requires Mark phase to be run on all 
repositories before running a Sweep.
{quote}
Yes - like I said this is being done, it still requires that both repos do a 
sweep.
{quote}For this specific use case some observations and quick rough sketch of a 
possible solution:
 * The \{{DataStore}}s for the 2 repositories - Primary & Secondary can be 
thought of as Shared & Private
 ** Primary does not know about Secondary and could be an existing repository 
and thus does not know about the {{DataStore}} of the Secondary as well. In 
other words it could even function as a normal {{DataStore}} and need not be a 
{{CompositeDataStore}}.
 ** Secondary does need to know about the Primary and thus registers itself as 
sharing the Primary {{DataStore}}.
 * Encode the blobs ids on the Secondary with the {{DataStore}} location/type 
with which we can distinguish the blob ids belonging to the respective 
\{{DataStore}}s.{quote}
That’s a solution that only works in this very specific use case of 
CompositeDataStore.  In the future if we were ever to want to support different 
scenarios we would then have to reconsider how it encodes blobs for each 
delegate.  Would that mean that data written to a data store by the 
CompositeDataStore could not be read by another CompositeDataStore referencing 
the same delegate?
{quote} * Secondary's Mark phase only redirects

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373250#comment-16373250
 ] 

Matt Ryan commented on OAK-7083:


(From [~amjain] via oak-dev)
{quote}{quote}The solution for {{SharedDataStore}} currently is to require all 
repositories to run a Mark phase then run the Sweep phase on one of them.
{quote}
Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up being 
unclear. In the situation I described above it is definitely running the mark 
phase first and then the sweep phase. The problem is still as I described - no 
matter which one runs sweep first, it cannot delete all the binaries that may 
possibly have been deleted on both systems.
{quote}
The problem is because that's how the systems are set up. For this particular 
problem on the Secondary there is no reason to even account for the Primary's 
datastore as it should not and cannot delete anything in there.
{quote}{quote}Besides there's a problem of the Sweep phase on the primary 
encountering blobs it does not know about (from the secondary) and which it 
cannot delete creating an unpleasant experience. As I understand the Primary 
could be a production system and having these sort of errors crop up would be 
problematic.
{quote}
If they are regarded as errors, yes. Currently this logs a WARN level message 
(not an ERROR) which suggests that sometimes not all the binaries targeted for 
deletion will actually be deleted.
 So this might be an issue of setting clear expectations. But I do see the 
point.
{quote}
Yes these are logged as WARN as these are not fatal and empirically these are 
problematic and is questioned by customers. But apart from that there is a 
performance impact also as each binary is attempted for deletion which incurs a 
penalty.
{quote}{quote}Encode the blobs ids on the Secondary with the {{DataStore}} 
location/type with which we can distinguish the blob ids belonging to the 
respective {{DataStore}}s.
{quote}
That’s a solution that only works in this very specific use case of 
{{CompositeDataStore}}. In the future if we were ever to want to support 
different scenarios we would then have to reconsider how it encodes blobs for 
each delegate. Would that mean that data written to a data store by the 
{{CompositeDataStore}} could not be read by another {{CompositeDataStore}} 
referencing the same delegate?
{quote}
But encoding of blob ids is needed anyways irrespective of the GC no? 
Otherwise, how does the {{CompositeDataStore}} redirect the calls to CRUD on 
the respective DSs? And did not understand how encoding the blob id with 
information about the DS preclude it from reading. It has to have the same 
semantics for the same delegate. But yes it does preclude moving the blobs from 
one subspace to another. But I don't think that's the use case anyways.
{quote}{quote}Secondary's Mark phase only redirects the Primary owned blobids 
to the references file in the Primary's {{DataStore}} (Primary's DataStore 
operating as Shared).
{quote}
The {{DataStore}} has no knowledge of the garbage collection stages. So IIUC 
this would require creating a new garbage collector that is aware of composite 
data stores and has the ability to interact with the {{CompositeDataStore}} in 
a tightly coupled fashion. Either that or we would have to enhance the data 
store API (for example, add a new interface or extend an interface so it can be 
precisely controlled by the garbage collector). Or both.
{quote}
{{DataStore}} does not have knowledge when GC is taking place. But it does have 
helper methods which are used by GC. Yes I would think that the methods 
currently existing for purpose of GC need to be enhanced and the Composite 
would have some intelligence on execution of some methods for e.g. delete and 
the metadata methods with some information about the delegates.
{quote}{quote}Secondary executes GC for its {{DataStore}} independently and 
does not worry about the Shared blobids (already taken care of above).
{quote}
Same issue - GC happens outside of the control of the {{DataStore}}.

It’s a good idea Amit - something I struggled with quite a while. I considered 
the same approach as well. But it tightly binds garbage collection to the data 
store, whereas now they are currently very loosely bound. GC leverages the 
{{DataStore}} APIs to do GC tasks (like reading and writing metadata files) but 
the {{DataStore}} doesn’t have any knowledge that GC is even happening.

So i don’t see how the {{CompositeDataStore}} could control execution of GC 
only on the independent data store.
{quote}
It does not control execution of the GC but it does control the GC helper 
methods and uses info already available with it for the delegates. Also, we 
could simply have GC instances bound to each delegate {{DataStore}}. This also 
would be similar to a case where we use the {{CompositeDataStore}} for 
internal

[jira] [Comment Edited] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373250#comment-16373250
 ] 

Matt Ryan edited comment on OAK-7083 at 2/22/18 9:22 PM:
-

(From [~amjain] via oak-dev)
{quote} bq. The solution for {{SharedDataStore}} currently is to require all 
repositories to run a Mark phase then run the Sweep phase on one of them.
Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up being 
unclear. In the situation I described above it is definitely running the mark 
phase first and then the sweep phase. The problem is still as I described - no 
matter which one runs sweep first, it cannot delete all the binaries that may 
possibly have been deleted on both systems.
{quote}
The problem is because that's how the systems are set up. For this particular 
problem on the Secondary there is no reason to even account for the Primary's 
datastore as it should not and cannot delete anything in there.
{quote} bq. Besides there's a problem of the Sweep phase on the primary 
encountering blobs it does not know about (from the secondary) and which it 
cannot delete creating an unpleasant experience. As I understand the Primary 
could be a production system and having these sort of errors crop up would be 
problematic.
If they are regarded as errors, yes. Currently this logs a WARN level message 
(not an ERROR) which suggests that sometimes not all the binaries targeted for 
deletion will actually be deleted.
 So this might be an issue of setting clear expectations. But I do see the 
point.
{quote}
Yes these are logged as WARN as these are not fatal and empirically these are 
problematic and is questioned by customers. But apart from that there is a 
performance impact also as each binary is attempted for deletion which incurs a 
penalty.
{quote} bq. Encode the blobs ids on the Secondary with the {{DataStore}} 
location/type with which we can distinguish the blob ids belonging to the 
respective {{DataStore}}s.
That’s a solution that only works in this very specific use case of 
{{CompositeDataStore}}. In the future if we were ever to want to support 
different scenarios we would then have to reconsider how it encodes blobs for 
each delegate. Would that mean that data written to a data store by the 
{{CompositeDataStore}} could not be read by another {{CompositeDataStore}} 
referencing the same delegate?
{quote}
But encoding of blob ids is needed anyways irrespective of the GC no? 
Otherwise, how does the {{CompositeDataStore}} redirect the calls to CRUD on 
the respective DSs? And did not understand how encoding the blob id with 
information about the DS preclude it from reading. It has to have the same 
semantics for the same delegate. But yes it does preclude moving the blobs from 
one subspace to another. But I don't think that's the use case anyways.
{quote} bq. Secondary's Mark phase only redirects the Primary owned blobids to 
the references file in the Primary's {{DataStore}} (Primary's DataStore 
operating as Shared).
The {{DataStore}} has no knowledge of the garbage collection stages. So IIUC 
this would require creating a new garbage collector that is aware of composite 
data stores and has the ability to interact with the {{CompositeDataStore}} in 
a tightly coupled fashion. Either that or we would have to enhance the data 
store API (for example, add a new interface or extend an interface so it can be 
precisely controlled by the garbage collector). Or both.
{quote}
{{DataStore}} does not have knowledge when GC is taking place. But it does have 
helper methods which are used by GC. Yes I would think that the methods 
currently existing for purpose of GC need to be enhanced and the Composite 
would have some intelligence on execution of some methods for e.g. delete and 
the metadata methods with some information about the delegates.
{quote} bq. Secondary executes GC for its {{DataStore}} independently and does 
not worry about the Shared blobids (already taken care of above).
Same issue - GC happens outside of the control of the {{DataStore}}.

It’s a good idea Amit - something I struggled with quite a while. I considered 
the same approach as well. But it tightly binds garbage collection to the data 
store, whereas now they are currently very loosely bound. GC leverages the 
{{DataStore}} APIs to do GC tasks (like reading and writing metadata files) but 
the {{DataStore}} doesn’t have any knowledge that GC is even happening.

So i don’t see how the {{CompositeDataStore}} could control execution of GC 
only on the independent data store.
{quote}
It does not control execution of the GC but it does control the GC helper 
methods and uses info already available with it for the delegates. Also, we 
could simply have GC instances bound to each delegate {{DataStore}}. This also 
would be similar to a case where we use the {{CompositeDataStore}} for 
internall

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-22 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373665#comment-16373665
 ] 

Matt Ryan commented on OAK-7083:


Thanks [~amjain] for your comments and review so far.

Since there are a lot of questions I'm going to try to distill down to what I 
think are key issues and then work through the dependent issues as they come.

Let's consider first the proposals to handle garbage collection for composite 
data stores.  I think there are three currently.  For reference, my original 
proposal is:  Change the MarkSweepGarbageCollector so we don't remove any 
"references" files from the metadata area until all repositories connected to a 
data store have attempted the sweep phase.  I think the three proposals are:
 # Move forward with the change I proposed.
 # Require that every repository complete the "mark" phase before any 
repository can attempt a "sweep" phase.
 # Use my proposal but only for repositories using CompositeDataStore.

h3. Proposal 1

I believe the concern with proposal 1 is that production repositories sharing 
the same data store may run GC on completely different schedules.  We can't be 
sure that all repositories complete a mark phase before any repository attempts 
a sweep phase.  In the context of my proposal, I believe what this means is 
that blobs that should be deleted may take longer to delete than expected - for 
example, it may require a couple of invocations.

In the normal shared data store use case I think the impact is that all of the 
connected repositories will try to run the sweep phase.  The same blobs will be 
deleted by the first sweeper as would have been deleted before.  It doesn't 
impact the ability to collect garbage, but may impact efficiency or give 
confusing log messages (which might be fixable).

In the composite data store use case since either repository may have the 
ability to delete blobs that the other repository cannot delete this may mean 
that it takes multiple cycles to do this.  For example, assuming a production 
and staging system, if the staging system deletes a node with a blob reference, 
and then runs mark and then sweep, the sweep may fail since the production 
hasn't done the mark phase yet (no "references" file from production repo).  
Later, the production system would mark and then sweep, deleting the blobs but 
unable to delete blobs on the staging side.  However, with my change the 
"references" files remain, so the next time the staging system runs mark and 
sweep it will be able to sweep since all the "references" files are still 
there, and then it will delete the blob that became unreferenced before.

So eventually I think blobs that should be collected will end up collected 
although it may take a while.
h3. Proposal 2

If we require that every repository complete the "mark" phase before any 
repository can attempt a "sweep" phase, it won't eliminate the need for every 
repository to perform the sweep.  This is still needed because each repository 
has binaries that only can be deleted by that repository.

What it could do is hopefully coordinate the sweep phases so not so much time 
elapses as in proposal 1.

However, I think you still have to answer the question, what does a repository 
do if it is ready to sweep but not all repositories have completed the mark 
phase?  This is almost what we have now.  If not every repository has completed 
the mark phase, and one repository wants to sweep, what happens?  I assume it 
just cancels the sweep until the next scheduled GC time.  In which case I don't 
see how this is any better than proposal 1.
h3. Proposal 3

This proposal is to only use my GC changes with CompositeDataStore.  I'm not 
sure exactly what we mean by this.

We could say that it is only used in repositories that are using a 
CompositeDataStore.  This could be done, although it would probably require 
changing the node store code so that it obtains the garbage collector from a 
registered reference instead of instantiating it directly, and then having the 
different data stores register a garbage collector for use by the node store.  
It might complicate the dependency tree and other things depending on how the 
garbage collector becomes available to the node store (see the 
SegmentNodeStoreService code where the MarkSweepGarbageCollector is 
instantiated to see what I mean).

But it doesn't matter because this approach won't actually solve the problem, 
in my view.  The reason is that *both* of the systems participating have to use 
the same garbage collection algorithm.  In other words, if staging has the 
CompositeDataStore, it is going to rely upon the production system to write the 
"sweepComplete" metadata file and leave the "references" files in order for the 
staging system to successfully complete the "sweep" phase.  The production 
system isn't using CompositeDataStore, though, so if it is

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-23 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375012#comment-16375012
 ] 

Matt Ryan commented on OAK-7083:


[~amitjain]

You are right that Proposal 1 and Proposal 2 are not that different.

I think you clarified something I didn't realize before - that in a normal 
{{SharedDataStore}} environment, only one of the systems will have the "sweep" 
phase scheduled.  In other words GC sweep is only ever performed on one system. 
 I did not realize that.

This scenario CANNOT work with {{CompositeDataStore}} in the use case described 
here.  It cannot work because there will always be blobs that can only be 
deleted by one system or the other.  Once both the production and staging 
systems are both running and operational, any binaries created on either system 
from that point onward that are later deleted can only be garbage collected by 
that system.

You said:
{quote}Then we don't have to make any change to the MarkSweepGarbageCollector 
and the process will be the same as currently for normal deployments.
{quote}
I do not agree with that statement.  I do not think it is possible for GC to 
work in an environment with {{CompositeDataStore}} unless some code changes are 
made to the garbage collector.  Either we need to make these changes to the 
{{MarkSweepGarbageCollector}} or we have to make another garbage collector that 
is designed to work with {{CompositeDataStore}}.
{quote}Essentially, I am Ok with proposal 2 for a start
{quote}
In my view, Proposal 2 offers no real advantages to Proposal 1 but adds 
complexity due to the requirement to coordinate between repositories on the 
mark phase, beyond what is already being done.  With both proposals it is still 
required to make changes to the garbage collector and it is still required that 
ALL repositories perform the sweep phase.
{quote}then we can enhance with the proposal that I outlined of encoding the 
blob ids with the role/type of the DataStore. Could you please also add a 
response on the clarifications I had above.
{quote}
I haven't forgotten this part, Amit.  But currently no encoding is being done.  
That's not the approach that was taken.  I am not convinced yet that it is 
needed or that we want to tie blobs tightly to the repository that wrote them.

I'd like to first settle on an approach to this GC issue and then see if that 
additional step is actually required.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-23 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375021#comment-16375021
 ] 

Matt Ryan commented on OAK-7083:


[~amitjain] [~chetanm]
{quote}Is it a requirement that the Primary should not delete blobs used by the 
Secondary? 
{quote}
I don't know if that was spelled out as a requirement.  It seems like an 
undesirable situation to have the Primary delete blobs that are still being 
referenced by the Secondary.  For the specific use case this was meant to 
address (production system and staging system sharing the data store) it may be 
acceptable that the secondary system have missing blobs, although not ideal.  
But it certainly limits the usability of the system for other similar use cases.
{quote}it would still be strange for the Primary (a Production system) to have 
to co-ordinate GC with a Secondary (staging or a test system) which it has no 
knowledge about.
{quote}
Agreed.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-23 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375128#comment-16375128
 ] 

Matt Ryan commented on OAK-7083:


Let me try to describe in more detail how GC works with the 
{{CompositeDataStore}} without the change to {{MarkSweepGarbageCollector}}.

Let's start with the Primary repo.  It has three nodes in the repo, A1-A3. It 
is connected to data store DS_P which has blobs A1_B-A3-B and a repository file 
in it.
{noformat}
P
  - A1
  - A2
  - A3

DS_P
  - repository-P
  - A1_B
  - A2_B
  - A3_B
{noformat}
At this point we create the Secondary repo by cloning the Primary. It has a 
copy of the Primary node store and refers directly to data store DS_P, which 
now has another repository file in it. Secondary also has another data store, 
DS_S, which is empty other than the repository file for the Secondary repo.
 Note that the {{CompositeDataStore}} writes the metadata files to EVERY 
delegate data store, so this way the repository file appears in every data 
store.
{noformat}
PS
  - A1 - A1
  - A2 - A2
  - A3 - A3

DS_P DS_S
  - repository-P   - repository-S
  - repository-S
  - A1_B
  - A2_B
  - A3_B
{noformat}
Suppose on Primary nodes A2 and A3 are deleted, and on Secondary nodes A1 and 
A3 are deleted. No GC has been run yet. So now we have this:
{noformat}
P   S
  - A1- (A1-deleted)
  - (A2-deleted)  - A2
  - (A3-deleted)  - (A3-deleted)

DS_PDS_S
  - repository-P  - repository-S
  - repository-S
  - A1_B
  - A2_B
  - A3_B
{noformat}
Now on Primary nodes B4 and B5 are created and on Secondary nodes C6 and C7 are 
created.
{noformat}
P   S
  - A1- (A1-deleted)
  - (A2-deleted)  - A2
  - (A3-deleted)  - (A3-deleted)
  - B4- C6
  - B5- C7

DS_PDS_S
  - repository-P  - repository-S
  - repository-S  - C6_B
  - A1_B  - C7_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
Now node B5 is deleted from Primary and node C7 is deleted from Secondary. 
Still no GC has been run yet.
{noformat}
P   S
  - A1- (A1-deleted)
  - (A2-deleted)  - A2
  - (A3-deleted)  - (A3-deleted)
  - B4- C6
  - (B5-deleted)  - (C7-deleted)

DS_PDS_S
  - repository-P  - repository-S
  - repository-S  - C6_B
  - A1_B  - C7_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
Let's suppose GC runs and Primary does a mark first. When it is done it will 
create a references file with all of the references it knows about that should 
be kept.
{noformat}
P   S
  - A1- (A1-deleted)
  - (A2-deleted)  - A2
  - (A3-deleted)  - (A3-deleted)
  - B4- C6
  - (B5-deleted)  - (C7-deleted)

DS_PDS_S
  - repository-P  - repository-S
  - repository-S  - C6_B
  - references-P (A1, B4) - C7_B
  - A1_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
Note at this point Primary cannot complete a sweep because Secondary has not 
performed a mark phase yet. There is no references file in DS_P yet from 
Secondary.

Later Secondary performs the mark phase and it creates a references file with 
all of the references it knows about that should be kept. Notice the references 
file gets created in both DS_S and DS_P, since {{CompositeDataStore}} performs 
metadata writes to all delegate data stores.
{noformat}
P   S
  - A1- (A1-deleted)
  - (A2-deleted)  - A2
  - (A3-deleted)  - (A3-deleted)
  - B4- C6
  - (B5-deleted)  - (C7-deleted)

DS_PDS_S
  - repository-P  - repository-S
  - repository-S  - references-S (A2, C6)
  - references-P (A1, B4) 

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-27 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379140#comment-16379140
 ] 

Matt Ryan commented on OAK-7083:


{quote}It does not require any changes to the code which means that it is 
compatible to the Shared DataStore setups. It requires the Sweep phase to be 
run on all repositories only on the CompositeDataStore setups. Actually, the 
Shared DataStore(s) generally are required to co-ordinate the mark phase 
anyways. So, the GC steps would look something like these for the 
CompositeDataStore setups:
 * Run {{collectGarbage(true)}} on Primary (Only Mark phase on primary)
 * Run {{collectGarbage(false)}} on Secondary (Mark & Sweep on secondary thus, 
deletes garbage on Secondary's DataStore)
 * Run {{collectGarbage(true)}} on Secondary (Only Mark phase on secondary)
 * Run {{collectGarbage(false)}} on Primary (Mark & Sweep on primary thus, 
deletes garbage on Primary's DataStore){quote}
I see what you mean now [~amitjain], thanks for sticking with me this far.

I've updated the test to match this approach to coordinating marks and sweeps 
between repositories.  It works with the original {{MarkSweepGarbageCollector}} 
as you said.  So I've reverted that change out of the pull request and also 
added the updated test.  I re-ran all unit tests for Oak again and all are 
passing still.

 
{quote}I am not sure how the performance would be when there is no encoding 
unless you have some persistent structure to give you that information.
{quote}
When we discussed this at the last Oakathon the approach we decided to take was 
to implement Bloom filters in the {{CompositeDataStore}} to map blob IDs to 
data store delegates.  IIRC the approach was to tie into blob ID scanning at 
startup (OAK-7089) and to build Bloom filters mapping the blob IDs to the 
delegate (OAK-7090).  We acknowledged a couple of challenges, including:
 * Sizing the filters correctly, and the potential need to resize a filter over 
time
 * Handling deleted blobs

One approach to this would be to simply rebuild the filters every time data 
store garbage collection is run.  In this case we wouldn't need to worry about 
deletes and we could right-size the filter every time it is rebuilt.  But this 
doesn't work well if users don't run data store GC.

Another approach would be to implement more complex Bloom filters.

At present neither the Bloom filter approach or the encoded blob ID approach 
are implemented in the current pull request.  Since the use case is for a 
primary repository with a single data store, and a secondary repository with 
only two delegate data stores (only one of which is writable), what you end up 
with is that all writes go only to one data store in all cases, and reads on 
the secondary may sometimes look in the wrong place, not find the blob, and 
then try the second option to look up the blob.  Since the use case for this 
describes the secondary system as a "staging" or "test" system, I suggest it is 
acceptable for now to take the slight performance hit on the secondary system 
in favor of getting the code reviewed and getting it to where performance 
testing can be done.
{quote}Have you run any performance tests Vs a standard setup?
{quote}
I have not run any performance tests, no.
{quote}IMHO it would be great if we can get that part sorted first.
{quote}
I agree that eventually some mechanism is required so the 
{{CompositeDataStore}} just knows which delegate to use.  I thought the 
eventual agreed-on approach was to use Bloom filters to address this problem.  
While the use of Bloom filters sounds a bit more complex, I do prefer that 
approach in order that we not tie blobs to a specific data store but rather 
keep a loose coupling between the stored data and the code that stored it.

At any rate neither approach is taken in the current codebase.  So I guess the 
question is, can we agree that it is acceptable to solve the mapping problem 
later given the currently targeted use case, or is this something we must 
address before the code can be accepted into Oak?

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store rea

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-02-27 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379210#comment-16379210
 ] 

Matt Ryan commented on OAK-7083:


One question I had [~amitjain] - does this imply that when 
{{CompositeDataStore}} is used we have to document very specific settings for 
data store GC?  I assume this is the case.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-03-29 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419629#comment-16419629
 ] 

Matt Ryan commented on OAK-7083:


At the recent Oakathon we agreed that I would attempt to implement the mapping 
of the data store identifier into the blob ID and see if we can get that in a 
rather straightforward fashion; if so then we can move forward with a basic 
mapping.

The way I was thinking this would be done was to extend the capability of the 
{{BlobOptions}} class so we could pass the encoded data store id along to the 
delegate during {{addRecord()}}.  However, this requires that the delegate 
implement {{TypedDataStore}} which {{OakFileDataStore}} does not implement.

I'm not sure the best way to move forward from here and am interested to hear 
what the rest of the project thinks.  It seems the best solution is to update 
{{OakFileDataStore}} so it also implements {{TypedDataStore}}.  The issue here 
is that most of the functionality to add a record is implemented in 
{{FileDataStore}} which is Jackrabbit code, not Oak code.  To avoid duplicating 
code between the two classes would require modifying {{FileDataStore}} also.  
It's a lot of change just to make {{OakFileDataStore}} implement 
{{TypedDataStore}}.

Any other suggestions?

/cc [~amitjain] [~tmueller]

 

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-04-09 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431429#comment-16431429
 ] 

Matt Ryan commented on OAK-7083:


Encoding after the delegate is called won't work as the blob has already been 
uploaded by the delegate by that point.

After thinking about this further I decided not to pursue the concept of 
encoding the data store identifier in the blob id for two additional reasons. 

First, this approach prevents any possible future uses in which a blob could be 
stored in more than one place.  Using the {{CompositeDataStore}} to store blobs 
in more than one data store and look them up based on some other criteria was 
one of the originally identified possible use cases (for example, storing 
multiple copies worldwide and retrieving from the closest data store, or 
implementing service-redundant storage by storing in AWS and Azure 
simultaneously).

Second, this approach will require a data migration for any existing user who 
then wants to start using {{CompositeDataStore}} - all existing blobs would 
have to be renamed.

So I've implemented a simpler solution based on a {{LoadingCache}} which is 
currently in the pull request.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-04-10 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432553#comment-16432553
 ] 

Matt Ryan commented on OAK-7083:


{quote}What prevents it from working?
{quote}
I'm not having a problem knowing how to encode the data store ID into the blob 
ID.  I'm saying it won't work well with the API constraints we have and with 
the way {{CompositeDataStore}} is intended to work.

When {{CompositeDataStore.addRecord()}} is called, all it receives is an 
{{InputStream}} and possibly a {{BlobOptions}} just like any other data store 
implementation.  It then has to determine which delegate it should use to 
complete the operation.  Since the only supported use case for now has only one 
writable delegate, this is an easy choice.  The composite data store simply 
passes the arguments along to the delegate which performs the task of adding 
the record to the repository.  Of course, this returns a {{DataRecord}} which 
is then returned by the composite data store to the caller.  We can easily know 
the blob id of the blob that was just added by looking at the return value from 
the delegate {{addRecord()}} call.  But the blob has already been stored in the 
delegate data store by this time.

{{CompositeDataStore}} does not store anything on it's own, this always happens 
via delegates.  Before the delegate is invoked we only have an {{InputStream}}. 
 After the delegate is invoked we have a resulting {{DataRecord}} and the blob 
is already stored.  So we don't have a blob ID before calling the delegate; 
after calling the delegate we have a blob ID but the blob has already been 
written.

In order to modify the blob ID within {{CompositeDataStore}}, we would have to 
wait until the delegate wrote the record, get the blob ID, modify it, and then 
rewrite the blob with the new ID.  How is that to be done if not by calling 
{{addRecord()}} on the delegate in the first place?  And even if we did that, 
how is the modified delegate to be passed to the delegate data store?  Since 
{{OakFileDataStore}} doesn't implement {{TypedDataStore}} we cannot pass 
anything to {{addRecord()}} other than the input stream.  To change that would 
require modifying {{FileDataStore}} in Jackrabbit.

These are the options for modifying the blob ID as I see them:
 * Delegate writes the file, then the composite updates the blob ID and asks 
the delegate to write it again.  This means the file is written to the 
destination twice which seems like a bad idea.  It also would require extending 
the capabilities of {{BlobOptions}} to support providing the blob ID, and since 
{{OakFileDataStore}} doesn't implement {{TypedDataStore}} it doesn't currently 
take a {{BlobOptions}} so this would also require adding that capability in 
{{FileDataStore}} in Jackrabbit.
 * Delegate writes the file, then the composite updates the blob ID and 
rewrites the file itself.  This duplicates logic from the delegate into the 
composite data store, which is bad design for more than one reason, and still 
writes the file twice, which still seems like a bad idea.
 * Extend {{BlobOptions}} to accept some sort of transformer object or 
function.  This was my original approach.  This allows the delegate to generate 
the correct ID without it having to know anything about the custom encoding 
being done.  The blob ID is generated once and the record written to the 
destination once.  But it still requires the changes to {{FileDataStore}} in 
Jackrabbit so this approach can work with {{OakFileDataStore}}.

All of these approaches feel very heavy for what we are trying to solve, 
especially since it will also:
 * Design us into a corner where we will not be able to support some of the 
originally identified possible use cases (namely, storing the same blob in 
multiple data stores).
 * Entail a data migration for any user that wants to move an existing 
installation to the {{CompositeDataStore}}.  In my view that actually should 
cease the discussion about encoding the data store ID into the blob ID 
altogether.

Additionally, IIRC in the Oakathon we discussed we would look into the blob ID 
approach to see if we could quickly add it in before accepting the PR, and if 
it was an easy addition we would go ahead and put it in.  Otherwise, we would 
move forward with evaluating the PR for acceptance into Oak in order that 
full-scale performance testing can begin on it as soon as possible.

In my view the scope of the change to support this feature (which after further 
thought I don't think we should do at all) has gone beyond the "quick addition" 
level to a rather significant change.

Based on that I propose we move forward with reviewing the PR if possible.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/j

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-04-13 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437704#comment-16437704
 ] 

Matt Ryan commented on OAK-7083:


[~amitjain] ah, thanks for the explanation.  I see what you are saying now.  As 
I understand it now the logic flow for {{addRecord()}} goes like this:
 * The composite data store determines which delegate should add the new record 
and invokes {{addRecord()}} on that delegate.
 * The composite data store captures the resulting {{DataRecord}} and updates 
the blob id with the encoded data store included.
 * This encoded version is what is stored in the node store.

And for reading the record:  When the composite data store is given a 
{{DataIdentifier}}, it extracts the data store id from the identifier if there 
is one, and uses that to look up the blob id, and passes a modified data 
identifier along to the delegate without the encoded data store id part.

I can see now how that would work.

I still have concerns about data portability between systems that don't use the 
composite and those that do, and limiting our ability to make use of future 
scenarios.

For data portability, as I see it there are two main cases.
 * The first has to do with going from an Oak instance that doesn't use a 
composite to one that does.  In that case the node store wouldn't have any data 
store info encoded into the stored blob ids.  Once the composite data store is 
being used, requests to read would be done using a blob ID that doesn't include 
the data store information.  So we can look it up from the available delegates, 
which is the fallback approach.  My question is, if we modify the blob ID in 
the {{DataRecord}} being returned, will the node store apply the updated blob 
ID?  If not, any blob IDs in the node store prior to moving to the composite 
data store would never include the encoded data store ID, unless a data 
migration was performed.
 ** Similar to this is the issue that could arise if a data store identifier is 
ever changed or lost.  In that case even though the blob IDs in the node store 
have an encoded data store identifier, the encoded data store identifier is now 
invalid and so we would need a way to update the blob IDs stored in the node 
store.
 * The second has to do with going from an Oak instance that uses a composite 
to one that does not.  In this case, after the migration the Oak instance would 
have a node store full of blob IDs it could not understand since they would 
include data store identifiers encoded into them but the logic to read the data 
store identifier out of the blob ID is not loaded into Oak anymore.  In these 
cases a data migration would be required.

For the possible future support of the use case where the composite stores a 
blob in more than one delegate, I'm not sure how we would support that other 
than to encode _all_ the data store identifiers into the blob ID.

What it all really boils down to this this:  Encoding the data store identifier 
into the blob ID creates a tight coupling between the data being stored and the 
specific implementation and configuration of data stores at a particular point 
in time, which has the effect of limiting our flexibility in supporting 
scenarios in the future without doing a data migration.

So it will be hard for me to be comfortable with the approach of encoding the 
data store ID into the blob ID unless someone can convince me that it is not a 
problem to create this tight coupling.  I will need my specific concerns 
addressed but also need to be put at ease about the general concern I have 
about the tight coupling.

And even if we were to agree that encoding the data store ID into the blob ID 
is not a problem, after going through this in my mind I see that there are a 
number of cases that would need to be tested.  This isn't a quick job that will 
only take a couple of days.  The unit tests that will need to be written to 
verify functionality in the situations I've described above, as well as the 
normal case, will take quite a few days to complete.  So it still exceeds the 
scope of what we had in mind when we discussed it at the Oakathon, in my view.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 

[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-04-13 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437707#comment-16437707
 ] 

Matt Ryan commented on OAK-7083:


[~amitjain] I understand the concern about performance impact.  I'm thinking 
through the best way to handle that (my manager may want someone else to 
conduct the performance testing) so I'll have to get back to you on that one.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> 
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-05-01 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459793#comment-16459793
 ] 

Matt Ryan commented on OAK-7083:


Over the past few days I did some performance testing.  The test covers the 
"production/staging" scenario.

I set up a system representing production first.  A folder named "prod" was 
created and 10,000 random JPEG images were added to this folder.  Each JPEG 
image is approximately 100K in size.  Additional renditions were generated on 
these images as they were added so the actual number of blobs was higher 
(around 40,000).

Using that system I then cloned it to the staging environment and created a new 
"stg" folder, adding another 10,000 random images just like (but distinct from) 
the "prod" folder.

Once that was done I created a content package of "stg" and added that back to 
the production system.  So therefore at the end each system had a "prod" and 
"stg" folder with 10,000 base images (plus renditions) in each folder.  The 
content trees in each were essentially identical, but the "stg" system was 
using {{CompositeDataStore}} to access the "prod" folder read-only.

I wrote a JMeter test that would randomly choose between the "prod" or "stg" 
folder, then randomly choose one of the 10,000 images in that folder, and 
download it.  This step would be repeated 250,000 times in a single test run.  
The test ran with no delays between requests and metrics were collected to see 
how quickly the test would complete.

I initially ran 8 test runs on each system.  I later decided to expand this to 
20 so I did an additional 12 runs on the staging system then took a break for 
the weekend.  Coming back I did the additional 12 runs on the production 
system, but saw results different from the original runs on either system.  I 
reran 8 runs on the staging system in which I saw more consistent results.  I'm 
reporting 20 runs on each system, done at comparable times between the two to 
attempt to show comparable results.  So some of the abnormal results from the 
original runs 11-20 on the staging system were replaced with later runs done at 
a similar time to runs 11-20 on production.

I collected the following metrics for each test run:  response time (minimum, 
maximum, and average), average requests per second, and average KB per second 
downloaded.

After collecting these metrics for 20 runs I computed min, max, and average 
values for each category.  When computing the average I threw out the extreme 
min and max values in order to avoid the average being swayed too much by an 
outlier.

Then I compared the averaged metrics from one system to those of the other.

The results are below:

Staging system (with {{CompositeDataStore}}):
|*Run*|*Average Response Time (ms)*|*Min Response Time (ms)*|*Max Response Time 
(ms)*|*Requests/Sec*|*KB/sec*|
|1|3|1|522|240.1|24882.28|
|2|3|1|587|247.3|25635.84|
|3|3|1|647|251|26012.87|
|4|3|1|705|252.2|26143.79|
|5|2|1|718|248.6|25764.68|
|6|3|1|388|280.5|29071.79|
|7|3|1|353|269.2|27901.22|
|8|3|1|581|260.1|26953.55|
|9|3|1|537|288.6|29907.38|
|10|3|1|608|294.1|30481.24|
|11|3|1|414|288.7|29924.89|
|12|3|1|167|300.4|31136.34|
|13|3|1|593|272.9|28281.77|
|14|3|1|625|271.1|28101.01|
|15|3|1|452|272.5|28240.11|
|16|3|1|465|304.8|31594.93|
|17|3|1|525|287.6|29804.54|
|18|3|1|625|276.7|28678.59|
|19|3|1|650|273.5|28343.48|
|20|3|1|548|294.8|30551.06|

Production system (without {{CompositeDataStore}}):
|*Run*|*Average Response Time (ms)*|*Min Response Time (ms)*|*Max Response Time 
(ms)*|*Requests/Sec*|*KB/sec*|
|1|3|2|739|233.2|24507.37|
|2|3|1|489|254.4|26739.06|
|3|3|1|171|260.7|27398.34|
|4|3|1|526|234|24591.95|
|5|3|1|544|284.4|29888.1|
|6|3|1|520|284.8|29933.02|
|7|3|1|181|287.8|30242.71|
|8|3|1|364|268.3|28192.92|
|9|3|1|419|297.7|31283.1|
|10|3|1|155|293.3|30820.8|
|11|3|1|638|276.5|29059.47|
|12|3|1|638|197.6|20762.37|
|13|3|1|528|254.5|26741.96|
|14|3|1|619|264.5|27797.78|
|15|3|1|169|303.3|31873.11|
|16|3|1|433|298.7|31389.12|
|17|3|1|546|278.7|29288.03|
|18|3|1|522|275.2|28923.43|
|19|2|1|530|298|31322.16|
|20|3|1|520|303.9|31933.47|

Average and minimum response times are essentially identical between both 
systems.  Comparison of the other metrics below:
|| ||Without CDS||With CDS||CDS Performance Difference||% Change||
|*Max Response Time - Min Value 
(ms)*|155|167|{color:#d04437}+12{color}|{color:#d04437}+7.8%{color}|
|*Max Response Time - Max Value (ms)*|739 
|718|{color:#14892c}-21{color}|{color:#14892c}-2.8%{color}|
|*Max Response Time - Average Value 
(ms)*|464.28|545.83|{color:#d04437}+81.55{color}|{color:#d04437}+18%{color}|
|*Requests / Sec - Min 
Value*|197.6|240.1|{color:#14892c}+42.5{color}|{color:#14892c}+21.5%{color}|
|*Requests / Sec - Max 
Value*|303.9|304.8|{color:#14892c}+0.9{color}|{color:#14892c}+0.3%{color}|
|*Request / Sec - Average 
Value*|274.89|273.88|{color:#d04437}-1.01{color}|{color:#d04437}-

[jira] [Comment Edited] (OAK-7083) CompositeDataStore - ReadOnly/ReadWrite Delegate Support

2018-05-01 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459793#comment-16459793
 ] 

Matt Ryan edited comment on OAK-7083 at 5/1/18 4:00 PM:


Over the past few days I did some performance testing.  The test covers the 
"production/staging" scenario.

I set up a system representing production first.  A folder named "prod" was 
created and 10,000 random JPEG images were added to this folder.  Each JPEG 
image is approximately 100K in size.  Additional renditions were generated on 
these images as they were added so the actual number of blobs was higher 
(around 40,000).

Using that system I then cloned it to the staging environment and created a new 
"stg" folder, adding another 10,000 random images just like (but distinct from) 
the "prod" folder.

Once that was done I created a content package of "stg" and added that back to 
the production system.  So therefore at the end each system had a "prod" and 
"stg" folder with 10,000 base images (plus renditions) in each folder.  I tried 
to make the trees as similar as possible, especially the content folders, but 
the "stg" system was using {{CompositeDataStore}} to access the "prod" folder 
read-only which was a primary difference between the two systems.

The resulting systems were as follows:
 * Production:  Using SegmentNodeStore and OakFileDataStore, node count 792576, 
blob count 92532 (includes "prod" and "stg" folders).
 * Staging:  Using SegmentNodeStore and OakFileDataStore, node count 558080, 
blob count 40196 (only includes "stg" folder).

I wrote a JMeter test that would randomly choose between the "prod" or "stg" 
folder, then randomly choose one of the 10,000 images in that folder, and 
download it.  This step would be repeated 250,000 times in a single test run.  
The test ran with no delays between requests and metrics were collected to see 
how quickly the test would complete.

I initially ran 8 test runs on each system.  I later decided to expand this to 
20 so I did an additional 12 runs on the staging system then took a break for 
the weekend.  Coming back I did the additional 12 runs on the production 
system, but saw results different from the original runs on either system.  I 
reran 8 runs on the staging system in which I saw more consistent results.  I'm 
reporting 20 runs on each system, done at comparable times between the two to 
attempt to show comparable results.  So some of the abnormal results from the 
original runs 11-20 on the staging system were replaced with later runs done at 
a similar time to runs 11-20 on production.

I collected the following metrics for each test run:  response time (minimum, 
maximum, and average), average requests per second, and average KB per second 
downloaded.

After collecting these metrics for 20 runs I computed min, max, and average 
values for each category.  When computing the average I threw out the extreme 
min and max values in order to avoid the average being swayed too much by an 
outlier.

Then I compared the averaged metrics from one system to those of the other.

The results are below:

Staging system (with {{CompositeDataStore}}):
|*Run*|*Average Response Time (ms)*|*Min Response Time (ms)*|*Max Response Time 
(ms)*|*Requests/Sec*|*KB/sec*|
|1|3|1|522|240.1|24882.28|
|2|3|1|587|247.3|25635.84|
|3|3|1|647|251|26012.87|
|4|3|1|705|252.2|26143.79|
|5|2|1|718|248.6|25764.68|
|6|3|1|388|280.5|29071.79|
|7|3|1|353|269.2|27901.22|
|8|3|1|581|260.1|26953.55|
|9|3|1|537|288.6|29907.38|
|10|3|1|608|294.1|30481.24|
|11|3|1|414|288.7|29924.89|
|12|3|1|167|300.4|31136.34|
|13|3|1|593|272.9|28281.77|
|14|3|1|625|271.1|28101.01|
|15|3|1|452|272.5|28240.11|
|16|3|1|465|304.8|31594.93|
|17|3|1|525|287.6|29804.54|
|18|3|1|625|276.7|28678.59|
|19|3|1|650|273.5|28343.48|
|20|3|1|548|294.8|30551.06|

Production system (without {{CompositeDataStore}}):
|*Run*|*Average Response Time (ms)*|*Min Response Time (ms)*|*Max Response Time 
(ms)*|*Requests/Sec*|*KB/sec*|
|1|3|2|739|233.2|24507.37|
|2|3|1|489|254.4|26739.06|
|3|3|1|171|260.7|27398.34|
|4|3|1|526|234|24591.95|
|5|3|1|544|284.4|29888.1|
|6|3|1|520|284.8|29933.02|
|7|3|1|181|287.8|30242.71|
|8|3|1|364|268.3|28192.92|
|9|3|1|419|297.7|31283.1|
|10|3|1|155|293.3|30820.8|
|11|3|1|638|276.5|29059.47|
|12|3|1|638|197.6|20762.37|
|13|3|1|528|254.5|26741.96|
|14|3|1|619|264.5|27797.78|
|15|3|1|169|303.3|31873.11|
|16|3|1|433|298.7|31389.12|
|17|3|1|546|278.7|29288.03|
|18|3|1|522|275.2|28923.43|
|19|2|1|530|298|31322.16|
|20|3|1|520|303.9|31933.47|

Average and minimum response times are essentially identical between both 
systems.  Comparison of the other metrics below:
|| ||Without CDS||With CDS||CDS Performance Difference||% Change||
|*Max Response Time - Min Value 
(ms)*|155|167|{color:#d04437}+12{color}|{color:#d04437}+7.8%{color}|
|*Max Response Time - Max Value (ms)*|739 
|718|{color:#14892c}-21{color}|{col

[jira] [Created] (OAK-7569) Direct Binary Access

2018-06-21 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7569:
--

 Summary: Direct Binary Access
 Key: OAK-7569
 URL: https://issues.apache.org/jira/browse/OAK-7569
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: api, blob-cloud, blob-cloud-azure, blob-plugins
Reporter: Matt Ryan
Assignee: Matt Ryan


Provide a direct binary access feature to Oak which allows an authenticated 
client to create or download blobs directly to/from the blob store, assuming 
the authenticated user has appropriate permission to do so. The primary value 
of this feature is that the I/O of transferring large binary files to or from 
the blob store can be offloaded entirely from Oak and performed directly 
between a client application and the blob store.

This feature is described in more detail [on the Oak 
wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7569) Direct Binary Access

2018-06-21 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519927#comment-16519927
 ] 

Matt Ryan commented on OAK-7569:


A pull request for this issue is available 
[here|https://github.com/apache/jackrabbit-oak/pull/88].

> Direct Binary Access
> 
>
> Key: OAK-7569
> URL: https://issues.apache.org/jira/browse/OAK-7569
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: api, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Provide a direct binary access feature to Oak which allows an authenticated 
> client to create or download blobs directly to/from the blob store, assuming 
> the authenticated user has appropriate permission to do so. The primary value 
> of this feature is that the I/O of transferring large binary files to or from 
> the blob store can be offloaded entirely from Oak and performed directly 
> between a client application and the blob store.
> This feature is described in more detail [on the Oak 
> wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7569) Direct Binary Access

2018-06-21 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/OAK-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-7569:
---
Description: 
Provide a direct binary access feature to Oak which allows an authenticated 
client to create or download blobs directly to/from the blob store, assuming 
the authenticated user has appropriate permission to do so. The primary value 
of this feature is that the I/O of transferring large binary files to or from 
the blob store can be offloaded entirely from Oak and performed directly 
between a client application and the blob store.

This feature is described in more detail [on the Oak 
wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].

This feature is similar in functionality to OAK-6575.  It adds the capability 
to also upload directly to storage via preauthorized URLs in addition to 
downloading directly from storage via preauthorized URLs.

  was:
Provide a direct binary access feature to Oak which allows an authenticated 
client to create or download blobs directly to/from the blob store, assuming 
the authenticated user has appropriate permission to do so. The primary value 
of this feature is that the I/O of transferring large binary files to or from 
the blob store can be offloaded entirely from Oak and performed directly 
between a client application and the blob store.

This feature is described in more detail [on the Oak 
wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].


> Direct Binary Access
> 
>
> Key: OAK-7569
> URL: https://issues.apache.org/jira/browse/OAK-7569
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: api, blob-cloud, blob-cloud-azure, blob-plugins
>Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Major
>
> Provide a direct binary access feature to Oak which allows an authenticated 
> client to create or download blobs directly to/from the blob store, assuming 
> the authenticated user has appropriate permission to do so. The primary value 
> of this feature is that the I/O of transferring large binary files to or from 
> the blob store can be offloaded entirely from Oak and performed directly 
> between a client application and the blob store.
> This feature is described in more detail [on the Oak 
> wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].
> This feature is similar in functionality to OAK-6575.  It adds the capability 
> to also upload directly to storage via preauthorized URLs in addition to 
> downloading directly from storage via preauthorized URLs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7570) [DirectBinaryAccess][DISCUSS] Client access via DataStoreBlobStore directly

2018-06-21 Thread Matt Ryan (JIRA)
Matt Ryan created OAK-7570:
--

 Summary: [DirectBinaryAccess][DISCUSS] Client access via 
DataStoreBlobStore directly
 Key: OAK-7570
 URL: https://issues.apache.org/jira/browse/OAK-7570
 Project: Jackrabbit Oak
  Issue Type: Technical task
  Components: blob-plugins
Reporter: Matt Ryan
Assignee: Matt Ryan


Open discussion related to OAK-7569:

The original pull request proposes changes to oak-api, oak-segment-tar, 
oak-store-document, oak-core, and oak-jcr as well as oak-blob-plugins, 
oak-blob-cloud, and oak-blob-azure.  Would it be possible / better to keep the 
changes local to the oak-blob-* bundles and avoid making changes throughout the 
stack?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >