Re: Azure Segment Store
Hi, I misread the documentation in the patch. Thank you for pointing out my mistake. Best Regards Ian On 6 March 2018 at 09:53, Tomek Rękawek <tom...@apache.org> wrote: > Hi Ian, > > > On 5 Mar 2018, at 17:47, Ian Boston <i...@tfd.co.uk> wrote: > > > > I assume that the patch deals with the 50K limit[1] to the number of > blocks > > per Azure Blob store ? > > As far as I understand, it’s the limit that applies to the number of > blocks in a single blob. Block is a single write. Since the segments are > immutable (written at once), we don’t need to worry about this limit for > the segments. It’s a different case for the journal file - a single commit > leads to a single append which adds a block. However, the patch takes care > of this, by creating journal.log.001, .002, when we’re close to the limit > [1]. > > Regards, > Tomek > > [1] https://github.com/trekawek/jackrabbit-oak/blob/OAK-6922/ > oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/ > AzureJournalFile.java#L37 > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > reka...@adobe.com > >
Re: Azure Segment Store
On 5 March 2018 at 16:04, Michael Dürig <mic...@gmail.com> wrote: > > How does it perform compared to TarMK > > a) when the entire repo doesn't fit into RAM allocated to the container ? > > b) when the working set doesn't fit into RAM allocated to the container ? > > I think this is some of the things we need to find out along the way. > Currently my thinking is to move from off heap caching (mmap) to on > heap caching (leveraging the segment cache). For that to work we > likely need better understand locality of the working set (see > https://issues.apache.org/jira/browse/OAK-5655) and rethink the > granularity of the cached items. There will likely be many more issues > coming through Jira re. this. > Agreed. All that will help minimise the IO in this case, or are you saying that if the IO is managed and not left to the OS via mmap that it may be possible to use a network disk cached by the OS VFS Disk cache, if TarMK has been optimised for that type of disk ? @Tomek I assume that the patch deals with the 50K limit[1] to the number of blocks per Azure Blob store ? With a compacted TarEntry size averaging 230K, the max repo size per Azure Blob store will be about 10GB. I checked the patch but didn't see anything to indicate that the size of each tar entry was increased. Azure Blob stores are also limited to 500 IOPS (API requests/s), which is about the same as a magnetic disk. Best Regards Ian 1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits > > Michael > > On 2 March 2018 at 09:45, Ian Boston <i...@tfd.co.uk> wrote: > > Hi Tomek, > > Thank you for the pointers and the description in OAK-6922. It all makes > > sense and seems like a reasonable approach. I assume the description is > > upto date. > > > > How does it perform compared to TarMK > > a) when the entire repo doesn't fit into RAM allocated to the container ? > > b) when the working set doesn't fit into RAM allocated to the container ? > > > > Since you mentioned cost, have you done a cost based analysis of RAM vs > > attached disk, assuming that TarMK has already been highly optimised to > > cope with deployments where the working set may only just fit into RAM ? > > > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block > > device driver and use local SSD to optimise caching (in read and write > > through mode). Since there are a kernel block device they also benefit > from > > the linux kernel VFS Disk Cache and support memory mapping via the page > > cache. So An Azure attached disk often behaves like a local SSD (IIUC). I > > realise that some containerisation frameworks in Azure dont yet support > > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) > > > > Best regards > > Ian > > > > > > 1 https://azure.microsoft.com/en-us/services/container-service/ > > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv > > > > > > > > On 1 March 2018 at 18:40, Matt Ryan <o...@mvryan.org> wrote: > > > >> Hi Tomek, > >> > >> Some time ago (November 2016 Oakathon IIRC) some people explored a > similar > >> concept using AWS (S3) instead of Azure. If you haven’t discussed with > >> them already it may be worth doing so. IIRC Stefan Egli and I believe > >> Michael Duerig were involved and probably some others as well. > >> > >> -MR > >> > >> > >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid > ) > >> wrote: > >> > >> Hi Tommaso, > >> > >> so, the goal is to run the Oak in a cloud, in this case Azure. In order > to > >> do this in a scalable way (eg. multiple instances on a single VM, > >> containerized), we need to take care of provisioning the sufficient > amount > >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in > >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two > drawbacks: > >> > >> * it’s expensive, > >> * it’s complex (each disk is a separate /dev/sdX that has to be > formatted, > >> mounted, etc.) > >> > >> The point of the Azure Segment Store is to deal with these two issues, > by > >> replacing the need for a local file system space with a remote service, > >> that will be (a) cheaper and (b) easier to provision (as it’ll be > >> configured on the application layer rather than VM layer). > >> > >> Another option would be using the Azure File Storage (which mounts the > SMB > >> file system, not the “physical” disk). Howe
Re: Azure Segment Store
Hi Tomek, Thank you for the pointers and the description in OAK-6922. It all makes sense and seems like a reasonable approach. I assume the description is upto date. How does it perform compared to TarMK a) when the entire repo doesn't fit into RAM allocated to the container ? b) when the working set doesn't fit into RAM allocated to the container ? Since you mentioned cost, have you done a cost based analysis of RAM vs attached disk, assuming that TarMK has already been highly optimised to cope with deployments where the working set may only just fit into RAM ? IIRC the Azure attached disks mount Azure Blobs behind a kernel block device driver and use local SSD to optimise caching (in read and write through mode). Since there are a kernel block device they also benefit from the linux kernel VFS Disk Cache and support memory mapping via the page cache. So An Azure attached disk often behaves like a local SSD (IIUC). I realise that some containerisation frameworks in Azure dont yet support easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) Best regards Ian 1 https://azure.microsoft.com/en-us/services/container-service/ https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv On 1 March 2018 at 18:40, Matt Ryanwrote: > Hi Tomek, > > Some time ago (November 2016 Oakathon IIRC) some people explored a similar > concept using AWS (S3) instead of Azure. If you haven’t discussed with > them already it may be worth doing so. IIRC Stefan Egli and I believe > Michael Duerig were involved and probably some others as well. > > -MR > > > On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid) > wrote: > > Hi Tommaso, > > so, the goal is to run the Oak in a cloud, in this case Azure. In order to > do this in a scalable way (eg. multiple instances on a single VM, > containerized), we need to take care of provisioning the sufficient amount > of space for the segmentstore. Mounting the physical SSD/HDD disks (in > Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks: > > * it’s expensive, > * it’s complex (each disk is a separate /dev/sdX that has to be formatted, > mounted, etc.) > > The point of the Azure Segment Store is to deal with these two issues, by > replacing the need for a local file system space with a remote service, > that will be (a) cheaper and (b) easier to provision (as it’ll be > configured on the application layer rather than VM layer). > > Another option would be using the Azure File Storage (which mounts the SMB > file system, not the “physical” disk). However, in this case we’d have a > remote storage that emulates a local one and SegmentMK doesn’t really > expect this. Rather than that it’s better to create a full-fledged remote > storage implementation, so we can work out the issues caused by the higher > latency, etc. > > Regards, > Tomek > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > reka...@adobe.com > > > On 1 Mar 2018, at 11:16, Tommaso Teofili > wrote: > > > > Hi Tomek, > > > > While I think it's an interesting feature, I'd be also interested to hear > > about the user story behind your prototype. > > > > Regards, > > Tommaso > > > > > > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek > > > ha scritto: > > > >> Hello, > >> > >> I prepared a prototype for the Azure-based Segment Store, which allows > to > >> persist all the SegmentMK-related resources (segments, journal, > manifest, > >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole > >> description of the approach, data structure, etc. as well as the patch > can > >> be found in OAK-6922. It uses the extension points introduced in the > >> OAK-6921. > >> > >> While it’s still an experimental code, I’d like to commit it to trunk > >> rather sooner than later. The patch is already pretty big and I’d like > to > >> avoid developing it “privately” on my own branch. It’s a new, optional > >> Maven module, which doesn’t change any existing behaviour of Oak or > >> SegmentMK. The only change it makes externally is adding a few exports > to > >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. > We > >> may narrow these exports to a single package if you think it’d be good > for > >> the encapsulation. > >> > >> There’s a related issue OAK-7297, which introduces the new fixture for > >> benchmark and ITs. After merging it, all the Oak integration tests pass > on > >> the Azure Segment Store. > >> > >> Looking forward for the feedback. > >> > >> Regards, > >> Tomek > >> > >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/ > >> > >> -- > >> Tomek Rękawek | Adobe Research | www.adobe.com > >> reka...@adobe.com > >> > >> >
Re: Oak 1.7.10 release plan
Hi, Amit has resolved the issue. 1.7.10 is no longer blocked. Best Regards Ian On 20 October 2017 at 09:57, Davide Giannella <dav...@apache.org> wrote: > On 19/10/2017 16:57, Ian Boston wrote: > > Are you able to revert the 3 commits listed in the issue ? > > If thats done, you can resolve the issue. > > Nothing else depends on these changes at this time, and should not, hence > > the blocker. > > Technically yes. However I'm not confident in doing it as I didn't > commit that stuff, nor it's clear to me if the whole team want to > actually revert the functionality or not. > > Cheers > Davide > > >
Re: Oak 1.7.10 release plan
Hi Davide, Are you able to revert the 3 commits listed in the issue ? If thats done, you can resolve the issue. Nothing else depends on these changes at this time, and should not, hence the blocker. Best Regards Ian On 19 October 2017 at 14:30, Davide Giannellawrote: > Hello team, > > I'm planning to cut Oak on 23rd October; next Monday. > > There's currently 1 blocker which will delay the release: > > https://issues.apache.org/jira/browse/OAK-6841 > Revert Changes made in OAK-6575 before 1.7.10 is released > > Please re-schedule, lower priority if it's not really blocking or resolve. > > If there are any objections please let me know. Otherwise I will > re-schedule any non-resolved issue for the next iteration. > > Thanks > Davide > > >
Re: Oak modularisation unstable roadmap ?
Hi, Moving URIProvider from Oak 1.6 oak-core (not yet patched) to JCR API would require a patch to 2.14.2 with a new feature to make this API available to Sling and Oak 1.6. I was told patching stable branches with new features was not allowed in Oak. Only unstable branches can be patched with new features. Moving URIProvider from Oak 1.7.9 oak-api to JCR API 2.15.6-SNAPSHOT is a viable option for Oak, but not for Sling until Oak releases 2.0, which is too late for the feature to make it into the next AEM release. I hope that makes sense, and I understood what you were asking. This issue is now redundant. See my other message and OAK-6841. Best Regards Ian On 18 October 2017 at 10:06, Angela Schreiber <anch...@adobe.com> wrote: > Hi Ian > > Sorry... I can't follow you here. Did I misunderstand your previous mail? > I am really curious why you felt that Jackrabbit API was not an option > because I don't recall that we had that option discussed in the lengthy > thread that preceded the contribution to oak-api. > > Kind regards > Angela > > > From: Ian Boston <i...@tfd.co.uk> > Date: Tuesday 17 October 2017 19:37 > > To: Angela <anch...@adobe.com> > Cc: "oak-dev@jackrabbit.apache.org" <oak-dev@jackrabbit.apache.org> > Subject: Re: Oak modularisation unstable roadmap ? > > Hi, > > I dont really care where the API is, however IIUC > > Oak 1.6 needs JR API 2.14.2 (based on Slings oak.txt provisioning model) > Oak 1.7 nees JR API 2.15.5 (based on my patch to make Sling work on Oak > 1.7.8) > > To make the patch work with Oak 1.6 would mean patching the JR API 2.14.2, > which is a stable branch of an earlier version and is governed by the same > rules patching Oak 1.6 was, or in other words. If patching JR API 2.14.2 > with a new feature is Ok, then by the same logic, so should patching Oak > 1.6.x > > --- > > I dont want to create extra work. Getting this feature in to Oak and Sling > is not simple or easy and I would much rather the patch that was merged and > released in Oak 1.7.9 is reverted before it is released in 1.7.10 or a > stable release. There were plenty of objections to the concept. I dont > think anyone was really happy with the idea. The difficulties here make me > think it was not to be. > > Perhaps it would be better done by the Oak team through its normal > planning process, working its way through an unstable release, rather than > as a wild contributed patch from someone outside the team who doesn't > really understand how Oak works. > > Best Regards > Ian > > On 17 October 2017 at 18:14, Angela Schreiber <anch...@adobe.com> wrote: > >> Hi Ian >> >> Would you mind sharing your thoughts and why you think moving it to >> Jackrabbit API is not an option? >> >> As far as I remember the Sling community has been particularly vocal >> about NOT introducing dependencies to any particular JCR implementation. >> With this history in mind it would look a lot more Sling-compatible to have >> API extensions on the same layer than JCR and that would be Jackrabbit API >> (as there is currently no easy way to extend the JCR specification). From a >> Sling point of view the Oak API is an implementation detail of a particular >> JCR implementation (namely Jackrabbit Oak). >> >> Kind regards >> Angela >> >> From: Ian Boston <i...@tfd.co.uk> >> Date: Friday 13 October 2017 18:32 >> To: Angela <anch...@adobe.com> >> Cc: "oak-dev@jackrabbit.apache.org" <oak-dev@jackrabbit.apache.org> >> Subject: Re: Oak modularisation unstable roadmap ? >> >> Hi, >> Marcel suggested this off list to me yesterday. I have thought about it >> and think its not an option. Probably ok of Oak, but not for anyone >> downstream. >> >> Taking that route means >> >> Oak 1.7.9 has been released so the patch in OAK-6575 would need to be >> reverted ASAP to avoid the oak-api package being used. >> Oak 1.6.5 depends on Jackrabbit API 2.14.4 IIRC, >> Oak 1.7.9 depends on Jackrabbit API 2.15.0 >> >> So a backport will still be required to JR API 2.14.6, breaking Oaks >> backport rule of no new features. It looks like JR API 2.15.x can't be used >> with Oak 1.6 since 2.15.x depends on Lucene 3.6 and 2.14 has references to >> Lucene 2.4. >> >> Coupled with, it won't be possible to test Sling until Oak 2.0 is >> released. I am not comfortable asking for a vote on a Sling release I know >> hasn't been tested with integration tests. >> >> I have asked those who are waiting for this feature if they can wait till >&g
Re: Oak modularisation unstable roadmap ?
Hi, I have done some more thinking and coding overnight and now think the best place for the API is in the Sling API, since thats were its used. This will unblock Sling and allow Oak to choose, or not to implement the URIProvider API. I will adjust the Sling patch and discuss on sling-dev. Best Regards Ian On 17 October 2017 at 18:37, Ian Boston <i...@tfd.co.uk> wrote: > Hi, > > I dont really care where the API is, however IIUC > > Oak 1.6 needs JR API 2.14.2 (based on Slings oak.txt provisioning model) > Oak 1.7 nees JR API 2.15.5 (based on my patch to make Sling work on Oak > 1.7.8) > > To make the patch work with Oak 1.6 would mean patching the JR API 2.14.2, > which is a stable branch of an earlier version and is governed by the same > rules patching Oak 1.6 was, or in other words. If patching JR API 2.14.2 > with a new feature is Ok, then by the same logic, so should patching Oak > 1.6.x > > --- > > I dont want to create extra work. Getting this feature in to Oak and Sling > is not simple or easy and I would much rather the patch that was merged and > released in Oak 1.7.9 is reverted before it is released in 1.7.10 or a > stable release. There were plenty of objections to the concept. I dont > think anyone was really happy with the idea. The difficulties here make me > think it was not to be. > > Perhaps it would be better done by the Oak team through its normal > planning process, working its way through an unstable release, rather than > as a wild contributed patch from someone outside the team who doesn't > really understand how Oak works. > > Best Regards > Ian > > On 17 October 2017 at 18:14, Angela Schreiber <anch...@adobe.com> wrote: > >> Hi Ian >> >> Would you mind sharing your thoughts and why you think moving it to >> Jackrabbit API is not an option? >> >> As far as I remember the Sling community has been particularly vocal >> about NOT introducing dependencies to any particular JCR implementation. >> With this history in mind it would look a lot more Sling-compatible to have >> API extensions on the same layer than JCR and that would be Jackrabbit API >> (as there is currently no easy way to extend the JCR specification). From a >> Sling point of view the Oak API is an implementation detail of a particular >> JCR implementation (namely Jackrabbit Oak). >> >> Kind regards >> Angela >> >> From: Ian Boston <i...@tfd.co.uk> >> Date: Friday 13 October 2017 18:32 >> To: Angela <anch...@adobe.com> >> Cc: "oak-dev@jackrabbit.apache.org" <oak-dev@jackrabbit.apache.org> >> Subject: Re: Oak modularisation unstable roadmap ? >> >> Hi, >> Marcel suggested this off list to me yesterday. I have thought about it >> and think its not an option. Probably ok of Oak, but not for anyone >> downstream. >> >> Taking that route means >> >> Oak 1.7.9 has been released so the patch in OAK-6575 would need to be >> reverted ASAP to avoid the oak-api package being used. >> Oak 1.6.5 depends on Jackrabbit API 2.14.4 IIRC, >> Oak 1.7.9 depends on Jackrabbit API 2.15.0 >> >> So a backport will still be required to JR API 2.14.6, breaking Oaks >> backport rule of no new features. It looks like JR API 2.15.x can't be used >> with Oak 1.6 since 2.15.x depends on Lucene 3.6 and 2.14 has references to >> Lucene 2.4. >> >> Coupled with, it won't be possible to test Sling until Oak 2.0 is >> released. I am not comfortable asking for a vote on a Sling release I know >> hasn't been tested with integration tests. >> >> I have asked those who are waiting for this feature if they can wait till >> Oak 2.0 is released. >> Best Regards >> Ian >> >> On 13 October 2017 at 17:09, Angela Schreiber <anch...@adobe.com> wrote: >> >>> I share Julians concerns. >>> >>> In a private conversation Marcel suggested to reconsider placing the new >>> API into oak-api and put it to Jackrabbit API instead. If there is really >>> no Oak dependency in there that would make a lot of sense to me. In >>> particular since the Sling community used to be quite strict about any >>> kind of implementation dependency to Jackrabbit/Oak and only want to >>> depend on JCR... >>> Jackrabbit API is the natural extension of JCR, where as Oak API is on a >>> different layer in the stack and from a Sling PoV a implementation detail >>> of a JCR implementation. >>> >>> So, I would opt for taking following up on Marcels suggestion. >>> >>> Kind regards
Re: Oak modularisation unstable roadmap ?
Hi, I dont really care where the API is, however IIUC Oak 1.6 needs JR API 2.14.2 (based on Slings oak.txt provisioning model) Oak 1.7 nees JR API 2.15.5 (based on my patch to make Sling work on Oak 1.7.8) To make the patch work with Oak 1.6 would mean patching the JR API 2.14.2, which is a stable branch of an earlier version and is governed by the same rules patching Oak 1.6 was, or in other words. If patching JR API 2.14.2 with a new feature is Ok, then by the same logic, so should patching Oak 1.6.x --- I dont want to create extra work. Getting this feature in to Oak and Sling is not simple or easy and I would much rather the patch that was merged and released in Oak 1.7.9 is reverted before it is released in 1.7.10 or a stable release. There were plenty of objections to the concept. I dont think anyone was really happy with the idea. The difficulties here make me think it was not to be. Perhaps it would be better done by the Oak team through its normal planning process, working its way through an unstable release, rather than as a wild contributed patch from someone outside the team who doesn't really understand how Oak works. Best Regards Ian On 17 October 2017 at 18:14, Angela Schreiber <anch...@adobe.com> wrote: > Hi Ian > > Would you mind sharing your thoughts and why you think moving it to > Jackrabbit API is not an option? > > As far as I remember the Sling community has been particularly vocal about > NOT introducing dependencies to any particular JCR implementation. With > this history in mind it would look a lot more Sling-compatible to have API > extensions on the same layer than JCR and that would be Jackrabbit API (as > there is currently no easy way to extend the JCR specification). From a > Sling point of view the Oak API is an implementation detail of a particular > JCR implementation (namely Jackrabbit Oak). > > Kind regards > Angela > > From: Ian Boston <i...@tfd.co.uk> > Date: Friday 13 October 2017 18:32 > To: Angela <anch...@adobe.com> > Cc: "oak-dev@jackrabbit.apache.org" <oak-dev@jackrabbit.apache.org> > Subject: Re: Oak modularisation unstable roadmap ? > > Hi, > Marcel suggested this off list to me yesterday. I have thought about it > and think its not an option. Probably ok of Oak, but not for anyone > downstream. > > Taking that route means > > Oak 1.7.9 has been released so the patch in OAK-6575 would need to be > reverted ASAP to avoid the oak-api package being used. > Oak 1.6.5 depends on Jackrabbit API 2.14.4 IIRC, > Oak 1.7.9 depends on Jackrabbit API 2.15.0 > > So a backport will still be required to JR API 2.14.6, breaking Oaks > backport rule of no new features. It looks like JR API 2.15.x can't be used > with Oak 1.6 since 2.15.x depends on Lucene 3.6 and 2.14 has references to > Lucene 2.4. > > Coupled with, it won't be possible to test Sling until Oak 2.0 is > released. I am not comfortable asking for a vote on a Sling release I know > hasn't been tested with integration tests. > > I have asked those who are waiting for this feature if they can wait till > Oak 2.0 is released. > Best Regards > Ian > > On 13 October 2017 at 17:09, Angela Schreiber <anch...@adobe.com> wrote: > >> I share Julians concerns. >> >> In a private conversation Marcel suggested to reconsider placing the new >> API into oak-api and put it to Jackrabbit API instead. If there is really >> no Oak dependency in there that would make a lot of sense to me. In >> particular since the Sling community used to be quite strict about any >> kind of implementation dependency to Jackrabbit/Oak and only want to >> depend on JCR... >> Jackrabbit API is the natural extension of JCR, where as Oak API is on a >> different layer in the stack and from a Sling PoV a implementation detail >> of a JCR implementation. >> >> So, I would opt for taking following up on Marcels suggestion. >> >> Kind regards >> Angela >> >> On 13/10/17 17:22, "Julian Reschke" <julian.resc...@gmx.de> wrote: >> >> >On 2017-10-13 17:02, Ian Boston wrote: >> >> Hi, >> >> Thank you for the clarification. It sounds like Sling can't safely bind >> >>to >> >> 1.7.x and safely make releases that will not cause problems with >> package >> >> versions later. That blocks Sling from binding to any features not >> >> backported to a stable version of Oak. >> > >> >That blocks Sling from *releasing* something as stable which relies on >> >an unstable Oak feature. >> > >> >As far as I can tell, it doesn't mean that Sling can't experiment with
Re: Oak modularisation unstable roadmap ?
Hi, Marcel suggested this off list to me yesterday. I have thought about it and think its not an option. Probably ok of Oak, but not for anyone downstream. Taking that route means Oak 1.7.9 has been released so the patch in OAK-6575 would need to be reverted ASAP to avoid the oak-api package being used. Oak 1.6.5 depends on Jackrabbit API 2.14.4 IIRC, Oak 1.7.9 depends on Jackrabbit API 2.15.0 So a backport will still be required to JR API 2.14.6, breaking Oaks backport rule of no new features. It looks like JR API 2.15.x can't be used with Oak 1.6 since 2.15.x depends on Lucene 3.6 and 2.14 has references to Lucene 2.4. Coupled with, it won't be possible to test Sling until Oak 2.0 is released. I am not comfortable asking for a vote on a Sling release I know hasn't been tested with integration tests. I have asked those who are waiting for this feature if they can wait till Oak 2.0 is released. Best Regards Ian On 13 October 2017 at 17:09, Angela Schreiber <anch...@adobe.com> wrote: > I share Julians concerns. > > In a private conversation Marcel suggested to reconsider placing the new > API into oak-api and put it to Jackrabbit API instead. If there is really > no Oak dependency in there that would make a lot of sense to me. In > particular since the Sling community used to be quite strict about any > kind of implementation dependency to Jackrabbit/Oak and only want to > depend on JCR... > Jackrabbit API is the natural extension of JCR, where as Oak API is on a > different layer in the stack and from a Sling PoV a implementation detail > of a JCR implementation. > > So, I would opt for taking following up on Marcels suggestion. > > Kind regards > Angela > > On 13/10/17 17:22, "Julian Reschke" <julian.resc...@gmx.de> wrote: > > >On 2017-10-13 17:02, Ian Boston wrote: > >> Hi, > >> Thank you for the clarification. It sounds like Sling can't safely bind > >>to > >> 1.7.x and safely make releases that will not cause problems with package > >> versions later. That blocks Sling from binding to any features not > >> backported to a stable version of Oak. > > > >That blocks Sling from *releasing* something as stable which relies on > >an unstable Oak feature. > > > >As far as I can tell, it doesn't mean that Sling can't experiment with > >it, and even make experimental releases for Sling's downstream users. > > > >> The (obvious) reason for asking was I still need to get OAK-6575 into > >> Sling. Since that wont be possible til 1.8 is released could the 1.6 > >> backport patch I did for 1.6 be reconsidered ? > > > >I understand your pain, but forcing something into a stable release of > >Oak just because Sling's release model is incompatible with Oak's makes > >me really uncomfortable. > > > >Best regards, Julian > > > >
Re: Oak modularisation unstable roadmap ?
Hi, Thank you for the clarification. It sounds like Sling can't safely bind to 1.7.x and safely make releases that will not cause problems with package versions later. That blocks Sling from binding to any features not backported to a stable version of Oak. The (obvious) reason for asking was I still need to get OAK-6575 into Sling. Since that wont be possible til 1.8 is released could the 1.6 backport patch I did for 1.6 be reconsidered ? Best Regards Ian On 13 October 2017 at 15:53, Angela Schreiber <anch...@adobe.com.invalid> wrote: > hi ian > > q1: i'd say no. but might require more rounds as we complete the m12n. > q2: yes. that's my understanding of the discussed we had just this week. > > one additional thing i would like to bring up: > IMHO we should consider making this an 2.0 release given the fact that the > m12n task is quite an effort. will send a separate email to get the > discussion started. > > kind regards > angela > > On 13/10/17 15:48, "Ian Boston" <i...@tfd.co.uk> wrote: > > >Hi, > >I have ported Sling to depend on Oak 1.7.8, and now that 1.7.9 is released > >I am updating the patch to port, finding 2 new bundles that were not > >required to make Sling build with Oak 1.7.8. > > > >oak-store-document > >oak-security-spi > > > >Is it too soon for Sling to depend on Oak 1.7.x ? > >Is there more modularisation to come before 1.8.0 is released ? > > > >Best Regards > >Ian > >
Oak modularisation unstable roadmap ?
Hi, I have ported Sling to depend on Oak 1.7.8, and now that 1.7.9 is released I am updating the patch to port, finding 2 new bundles that were not required to make Sling build with Oak 1.7.8. oak-store-document oak-security-spi Is it too soon for Sling to depend on Oak 1.7.x ? Is there more modularisation to come before 1.8.0 is released ? Best Regards Ian
Re: Oak Bundle versioning.
Hi, On 6 October 2017 at 15:02, Robert Munteanu <romb...@apache.org> wrote: > Hi Ian, > > Thanks for starting the discussion. I think this can be one of the big > benefits of modularising Oak and I am interested in seeing this being > done. > > As you mentioned, it becomes easier to integrate various Oak changes, > especially for consumers only depending on stable APIs. > > On Thu, 2017-10-05 at 13:33 +0100, Ian Boston wrote: > > Obviously bundles remain the release unit, and the build must include > > OSGi > > based integration tests that validate a release is viable. > > This brings about a question - what is an Oak release? If doing per- > module releases, will we also do a "product" releases? > > A product release in my view is - similar to the Sling Launchpad - a > way of saying 'these module versions are guaranteed to work together > beyond compilation and semantic versioning constraints'. > > Also of interest is how/if we want to address the issue of supporting > various module versions combinations. So if we have ( for instance ) > > - oak-api 1.7.9 > - oak-core 1.7.12 > - oak-segment-tar 1.8.0 > > will these work together? Furthermore, which versions of oak-upgrade > and oak-run are compatible with this combination? > Perhaps, there needs to be a Oak Quickstart jar to define a combination of jars that work. Perhaps that is oak-run ? Best Regards Ian > > We should have these discussion first, and then (hopefully) switch to a > more modular release process. > > Thanks, > > Robert >
Oak Bundle versioning.
Hi, Currently the whole Oak source tree is synchronously versioned. Which is great for Oak and its releases, but problematic downstream. I assume this has been discussed before, and hope it can be discussed again ? -- If it can, here is some supporting evidence, using Sling as an example.If it cant be discussed and the case is closed, the please ignore this thread. Currently: When a version of Sling depends on Oak 1.6, 1.4 or 1.2 , and needs a feature or an API from Oak 1.8-SNAPSHOT, that feature or API must be backported and released to 1.6, 1.4 or 1.2, but probably all versions back to the one desired. With OSGi this is not necessary since OSGi works on the version of the package, not the bundle. The bundle from Oak 1.6, 1.4 or 1.2 will all have to have the same package version, which they do when an API change is made. This adds a second constraint. Changes to all the branches probably have to be made in the same sequence to ensure that the package versions remain in sync. Not doing so will create many problems downstream eventually. If not strictly enforced there will be the same version of a package, from different bundles, containing different APIs. More coordination between pending pull requests will be required to ensure that changes get made in the right order. With one or two stable branches, that is manageable, with more it may not be. Once a package it released, it may be very hard to untangle a conflict. Alternatively: If a API bundle is stable, and has not changed since version 1.2, then the same API bundle is used through all versions. When that API bundle has a change, it is released and the new version can be used, if required by all versions of Oak, Sling and everything downstream. Only the API bundle needs to be released, nothing else. APIs should be compatible within a well defined range, enforced by the OSGi container, builds, unit tests and integration tests. That mean less blocking downstream and from an Oak pov, less backporting work. When the bundle is an implementation bundle, its even simpler. No wholescale release of Oak is required, just the bundle integrated with each release. Again less work for Oak. Obviously bundles remain the release unit, and the build must include OSGi based integration tests that validate a release is viable. I am not saying all the problems are solved by independent bundle versioning. There will be new problems. Perhaps, there would be less work for Oak, and more flexibility downstream with independent versioning. wdyt? Best Regards Ian
Re: OAK-6575 - A word of caution
Hi Angela, Thank you for the clarification. To summarize, I think consensus has been reached. 1. Move forward with the patch in OAK-6575 discussed in this thread. 2. Develop a thread model so that this and future changes can be evaluated as part of the release process, ideally before the next stable release. If anyone feels consensus has not been reached, please continue this thread. Best Regards Ian On 15 September 2017 at 07:52, Angela Schreiber <anch...@adobe.com.invalid> wrote: > Hi Ian > > On 13/09/17 23:34, "Ian Boston" <i...@tfd.co.uk> wrote: > > >Hi Angela, > > > >On 13 September 2017 at 06:50, Angela Schreiber > ><anch...@adobe.com.invalid> > >wrote: > > > >> Hi Ian > >> > >> The new proposal looks a lot better to me. > >> > >> The only concern from a security perspective I could come up with is the > >> one we expressed already with the very first proposal (see initial word > >>of > >> caution mail sent by Francesco): applications built on top of Oak can up > >> to now be sure that all access to the repository content is subject to > >>the > >> same permission evaluation as configured with the repository setup. This > >> is no longer guaranteed when we offer the ability to plug application > >>code > >> that may or may not by-pass the evaluation by allowing it to directly > >> access binaries. > >> > > > >I don't think this patch bypasses Oak security, and since the API can only > >be implemented by Oak itself. I am sure any future patch would be subject > >to the same scrutiny. If it can be implemented outside Oak, then Oak has > >already been breached, something I can see no evidence of. > > > >In this case, the signed url is only issued after Oak security has granted > >access to the binary, and only returned over the JCR API to the JCR > >Session > >that made the call, in the same way that an InputStream allows the bytes > >of > >the binary to be read by that session. The URL only allows read access. > > > >What the session does with that data, is outside the control of Oak. > >Unlike the byte[] from the that has no protection, the signed URL is > >protected. It may only be used unmodified for the purpose it was intended > >by Oak and only for a well defined period of time. In that respect, > >arguably, its is more secure than the byte[] or InputStream. > > I am not worried about the security of the approach and maybe I am even > mistaken if I think this changes our threat model. But even if it actually > changes our threat model wouldn't mean that your proposal isn't secure. > Those are two quite distinct things. > > That's why i would love us to finally reserve some time to create a threat > model. It will help us with this discussion and with similar requests in > the future. and even the threat model itself can evolve over time... but > it allows us to review features from a different angle. > > > > >> > >> While I know that this is actually the goal of the whole exercise, we > >>have > >> to be aware that this also is a change in our Oak security model. As > >>such > >> this may look like a security breach and I have been told by my > >>colleagues > >> at Adobe that the 'single-way-to-access' is a relevant security question > >> with a lot of potential customers. > >> > >> That doesn't mean that I am opposed to the patch in it's current form > >>as I > >> see the benefits from an Oak pov, I just want to highlight that we are > >> going to make a fundamental change and we should treat and document it > >> with the necessary care... maybe we should take this opportunity to > >> finally create a threat model for Oak? Doing so at this stage would > >>allow > >> us to visualise the proposed change to all parties involved. > >> > >> wdyt? > >> > > > >Agreed. > >Having a fully developed threat model which clarified all the risks for > >every aspect of Oak would, imho, be much better than not defining the > >risks > >that exist. Even the most secure application has risks, best exposed in a > >threat model, however brief. > > > >Unfortunately Oak now exists in a world which is distributed, where > >applications need to embrace the network. This is a fundamental change, > >which Oak has to embrace. An Oak Threat model that recognises this will be > >a great step forwards. > > > >On the other hand, if you are saying that the Oak Threat model has to be
Re: OAK-6575 - A word of caution
Hi, On 14 September 2017 at 06:05, Alex Deparvu <a...@pfalabs.com> wrote: > Hi, > > > I don't think this patch bypasses Oak security, and since the API can > only > > be implemented by Oak itself. I am sure any future patch would be subject > > to the same scrutiny. If it can be implemented outside Oak, then Oak has > > already been breached, something I can see no evidence of. > > I don't think I agree with this statement. Sure you start with a proper Oak > session, but the patch facilitates any consumer code to bypass Oak security > and generate stateless links to any resource, no session involved anymore > but it's a signed url bound by a ttl. so your trust boundary goes from > (oak-security) to (oak-security + any consumer code) and here I'm not > talking about a possible breach, but simply bugs in the code, you now > delegate the security concerns to code that might not be at the same > quality level as oak. So it looks like yes, you are bypassing Oak security > to achieve this goal. > Fair point, Any data that passes over the JCR API boundary has the same 'oak-security+any consumer code' model, which is not the oak-security model. The biggest difference here is that Oak knows this bit of data provides delegated access, whereas it has not knowledge of the sensitivity of other bits of data that pass over the bounday. > > > > I feel that Oak is weaker without the ability to offload bulk data > streaming to infrastructure designed for that purpose. > > I pasted an older comment to come back to the reasoning behind this need. > If you bind the urls to a ttl, how do you guarantee that the workflows (or > whatever process does bulk data streaming) is completed within that time > frame? Will there be a retry policy? And more specifically how to make sure > this ttl will not get bumped up to a point where it becomes a real problem > (1 min/5mins is probably fine, what if someone sets it to a few hours). > Workflows should make the request back to Sling/AEM as they do now with the appropriate credentials. Those requests will get redirected, if appropriate with a fresh signed URL. The signed URL should never be put into a message that cant be acted on within the TTL time period. It is quite possible, that the current CloudFront Singed URL implementation will not be suitable for a Workflow to use, due to network topology, although that was out so scope for the requirements in OAK-6575. There was some suggestions r adding a getPrivateURI to the URIProvider interface. btw, thanks for your comments in Jira, I have updated the patch. Best Regards Ian > > > > alex > > > > On Wed, Sep 13, 2017 at 11:34 PM, Ian Boston <i...@tfd.co.uk> wrote: > > > Hi Angela, > > > > On 13 September 2017 at 06:50, Angela Schreiber > <anch...@adobe.com.invalid > > > > > wrote: > > > > > Hi Ian > > > > > > The new proposal looks a lot better to me. > > > > > > The only concern from a security perspective I could come up with is > the > > > one we expressed already with the very first proposal (see initial word > > of > > > caution mail sent by Francesco): applications built on top of Oak can > up > > > to now be sure that all access to the repository content is subject to > > the > > > same permission evaluation as configured with the repository setup. > This > > > is no longer guaranteed when we offer the ability to plug application > > code > > > that may or may not by-pass the evaluation by allowing it to directly > > > access binaries. > > > > > > > I don't think this patch bypasses Oak security, and since the API can > only > > be implemented by Oak itself. I am sure any future patch would be subject > > to the same scrutiny. If it can be implemented outside Oak, then Oak has > > already been breached, something I can see no evidence of. > > > > In this case, the signed url is only issued after Oak security has > granted > > access to the binary, and only returned over the JCR API to the JCR > Session > > that made the call, in the same way that an InputStream allows the bytes > of > > the binary to be read by that session. The URL only allows read access. > > > > What the session does with that data, is outside the control of Oak. > > Unlike the byte[] from the that has no protection, the signed URL is > > protected. It may only be used unmodified for the purpose it was intended > > by Oak and only for a well defined period of time. In that respect, > > arguably, its is more secure than the byte[] or InputStream. > > > > > > > > > > > > While I kno
Re: OAK-6575 - A word of caution
Hi Angela, On 13 September 2017 at 06:50, Angela Schreiber <anch...@adobe.com.invalid> wrote: > Hi Ian > > The new proposal looks a lot better to me. > > The only concern from a security perspective I could come up with is the > one we expressed already with the very first proposal (see initial word of > caution mail sent by Francesco): applications built on top of Oak can up > to now be sure that all access to the repository content is subject to the > same permission evaluation as configured with the repository setup. This > is no longer guaranteed when we offer the ability to plug application code > that may or may not by-pass the evaluation by allowing it to directly > access binaries. > I don't think this patch bypasses Oak security, and since the API can only be implemented by Oak itself. I am sure any future patch would be subject to the same scrutiny. If it can be implemented outside Oak, then Oak has already been breached, something I can see no evidence of. In this case, the signed url is only issued after Oak security has granted access to the binary, and only returned over the JCR API to the JCR Session that made the call, in the same way that an InputStream allows the bytes of the binary to be read by that session. The URL only allows read access. What the session does with that data, is outside the control of Oak. Unlike the byte[] from the that has no protection, the signed URL is protected. It may only be used unmodified for the purpose it was intended by Oak and only for a well defined period of time. In that respect, arguably, its is more secure than the byte[] or InputStream. > > While I know that this is actually the goal of the whole exercise, we have > to be aware that this also is a change in our Oak security model. As such > this may look like a security breach and I have been told by my colleagues > at Adobe that the 'single-way-to-access' is a relevant security question > with a lot of potential customers. > > That doesn't mean that I am opposed to the patch in it's current form as I > see the benefits from an Oak pov, I just want to highlight that we are > going to make a fundamental change and we should treat and document it > with the necessary care... maybe we should take this opportunity to > finally create a threat model for Oak? Doing so at this stage would allow > us to visualise the proposed change to all parties involved. > > wdyt? > Agreed. Having a fully developed threat model which clarified all the risks for every aspect of Oak would, imho, be much better than not defining the risks that exist. Even the most secure application has risks, best exposed in a threat model, however brief. Unfortunately Oak now exists in a world which is distributed, where applications need to embrace the network. This is a fundamental change, which Oak has to embrace. An Oak Threat model that recognises this will be a great step forwards. On the other hand, if you are saying that the Oak Threat model has to be developed and agreed, before this patch can be added, then I am concerned that will take too long. Doing justice to an Oak Treat model will require resource. Best `Regards Ian > > Kind regards > Angela > > > On 07/09/17 16:39, "Ian Boston" <i...@tfd.co.uk> wrote: > > >On 7 September 2017 at 14:41, Francesco Mari <mari.france...@gmail.com> > >wrote: > > > >> On Thu, Sep 7, 2017 at 11:05 AM, Ian Boston <i...@tfd.co.uk> wrote: > >> > On 7 September 2017 at 07:22, Ian Boston <i...@tfd.co.uk> wrote: > >> > > >> >> Hi, > >> >> > >> >> On 6 September 2017 at 22:43, Michael Dürig <mdue...@apache.org> > >>wrote: > >> >> > >> >>> > >> >>> > >> >>> On 06.09.17 23:08, Michael Dürig wrote: > >> >>> > >> >>>> > >> >>>> Hi, > >> >>>> > >> >>>> On 05.09.17 14:09, Ian Boston wrote: > >> >>>> > >> >>>>> Repeating the comment to on OAK-6575 here for further discussion. > >>2 > >> new > >> >>>>> Patches exploring both options. > >> >>>>> > >> >>>> > >> >>>> I would actually prefer the original patch ( > >> >>>> https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:O > >> >>>> AK-6575?expand=1) in most parts. However I have concerns regarding > >>the > >> >>>> generality of the new OakConversionService API as mentioned in my > >> previous > >> >>>> mail. I would be more comfortable if this could be rest
Re: OAK-6575 - A word of caution
On 7 September 2017 at 14:41, Francesco Mari <mari.france...@gmail.com> wrote: > On Thu, Sep 7, 2017 at 11:05 AM, Ian Boston <i...@tfd.co.uk> wrote: > > On 7 September 2017 at 07:22, Ian Boston <i...@tfd.co.uk> wrote: > > > >> Hi, > >> > >> On 6 September 2017 at 22:43, Michael Dürig <mdue...@apache.org> wrote: > >> > >>> > >>> > >>> On 06.09.17 23:08, Michael Dürig wrote: > >>> > >>>> > >>>> Hi, > >>>> > >>>> On 05.09.17 14:09, Ian Boston wrote: > >>>> > >>>>> Repeating the comment to on OAK-6575 here for further discussion. 2 > new > >>>>> Patches exploring both options. > >>>>> > >>>> > >>>> I would actually prefer the original patch ( > >>>> https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:O > >>>> AK-6575?expand=1) in most parts. However I have concerns regarding the > >>>> generality of the new OakConversionService API as mentioned in my > previous > >>>> mail. I would be more comfortable if this could be restricted to > something > >>>> that resembles more like a "URIProvider", which given a blob returns > an URI. > >>>> > >>>> On the implementation side, why do we need to introduce the adaptable > >>>> machinery? Couldn't we re-use the Whiteboard and OSGiWhiteBoard > mechanisms > >>>> instead? I think these could be used to track URIProvider instances > >>>> registered by the various blob stores. > >>>> > >>>> > >>> See https://github.com/mduerig/jackrabbit-oak/commit/2709c097b01 > >>> a006784b7011135efcbbe3ce1ba88 for a *really* quickly hacked together > and > >>> entirely untested POC. But it should get the idea across though. > >> > >> > >> > >> Thank you. > >> That makes sense. > >> I think it only needs the java/org/apache/jackrabbit/ > >> oak/blob/cloud/aws/s3/CloudFrontS3SignedUrlAdapterFactory.java and the > >> API to be inside Oak, everything else can be in Sling. > >> I'll update my patch and do a 2 options for Sling. > >> > > > > > > > > > > https://github.com/ieb/jackrabbit-oak/compare/trunk.. > .ieb:OAK-6575-3?expand=1 > > > > and > > > > https://github.com/apache/sling/compare/trunk...ieb:OAK-6575-3?expand=1 > > > > wdyt ? > > I like this a lot. It keeps Oak's side simple and cleanly integrates > Oak's lower-level services in Sling. > Good news. I think we should hold off committing the patch until Monday or Tuesday to give those who may be offline this week a chance to comment. In particular I have not seen a comment from Angela who I would expect to have a view as this is acl/security related. That is assuming she is back online next week. Best Regards Ian > > > Obviously the second patch needs to be discussed with Sling dev, but is > > should not be too contentious. > > > > Best Regards > > Ian > > > > > > > >> > >> I think that should address others concerns since it drops all signs of > >> any generic object to object conversion from Oak (Francesco), and > doesn't > >> require wide scale fragile changes with implied requirements being > placed > >> on how intermediate classes are connected and behave (mine). > >> > >> Best Regards > >> Ian > >> > >> > >>> > >>> Michael > >>> > >> > >> >
Re: OAK-6575 - A word of caution
On 7 September 2017 at 07:22, Ian Boston <i...@tfd.co.uk> wrote: > Hi, > > On 6 September 2017 at 22:43, Michael Dürig <mdue...@apache.org> wrote: > >> >> >> On 06.09.17 23:08, Michael Dürig wrote: >> >>> >>> Hi, >>> >>> On 05.09.17 14:09, Ian Boston wrote: >>> >>>> Repeating the comment to on OAK-6575 here for further discussion. 2 new >>>> Patches exploring both options. >>>> >>> >>> I would actually prefer the original patch ( >>> https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:O >>> AK-6575?expand=1) in most parts. However I have concerns regarding the >>> generality of the new OakConversionService API as mentioned in my previous >>> mail. I would be more comfortable if this could be restricted to something >>> that resembles more like a "URIProvider", which given a blob returns an URI. >>> >>> On the implementation side, why do we need to introduce the adaptable >>> machinery? Couldn't we re-use the Whiteboard and OSGiWhiteBoard mechanisms >>> instead? I think these could be used to track URIProvider instances >>> registered by the various blob stores. >>> >>> >> See https://github.com/mduerig/jackrabbit-oak/commit/2709c097b01 >> a006784b7011135efcbbe3ce1ba88 for a *really* quickly hacked together and >> entirely untested POC. But it should get the idea across though. > > > > Thank you. > That makes sense. > I think it only needs the java/org/apache/jackrabbit/ > oak/blob/cloud/aws/s3/CloudFrontS3SignedUrlAdapterFactory.java and the > API to be inside Oak, everything else can be in Sling. > I'll update my patch and do a 2 options for Sling. > https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:OAK-6575-3?expand=1 and https://github.com/apache/sling/compare/trunk...ieb:OAK-6575-3?expand=1 wdyt ? Obviously the second patch needs to be discussed with Sling dev, but is should not be too contentious. Best Regards Ian > > I think that should address others concerns since it drops all signs of > any generic object to object conversion from Oak (Francesco), and doesn't > require wide scale fragile changes with implied requirements being placed > on how intermediate classes are connected and behave (mine). > > Best Regards > Ian > > >> >> Michael >> > >
Re: OAK-6575 - A word of caution
Hi, On 6 September 2017 at 22:43, Michael Dürig <mdue...@apache.org> wrote: > > > On 06.09.17 23:08, Michael Dürig wrote: > >> >> Hi, >> >> On 05.09.17 14:09, Ian Boston wrote: >> >>> Repeating the comment to on OAK-6575 here for further discussion. 2 new >>> Patches exploring both options. >>> >> >> I would actually prefer the original patch (https://github.com/ieb/jackra >> bbit-oak/compare/trunk...ieb:OAK-6575?expand=1) in most parts. However I >> have concerns regarding the generality of the new OakConversionService API >> as mentioned in my previous mail. I would be more comfortable if this could >> be restricted to something that resembles more like a "URIProvider", which >> given a blob returns an URI. >> >> On the implementation side, why do we need to introduce the adaptable >> machinery? Couldn't we re-use the Whiteboard and OSGiWhiteBoard mechanisms >> instead? I think these could be used to track URIProvider instances >> registered by the various blob stores. >> >> > See https://github.com/mduerig/jackrabbit-oak/commit/2709c097b01 > a006784b7011135efcbbe3ce1ba88 for a *really* quickly hacked together and > entirely untested POC. But it should get the idea across though. Thank you. That makes sense. I think it only needs the java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/CloudFrontS3SignedUrlAdapterFactory.java and the API to be inside Oak, everything else can be in Sling. I'll update my patch and do a 2 options for Sling. I think that should address others concerns since it drops all signs of any generic object to object conversion from Oak (Francesco), and doesn't require wide scale fragile changes with implied requirements being placed on how intermediate classes are connected and behave (mine). Best Regards Ian > > Michael >
Re: OAK-6575 - A word of caution
Hi, Thanks for looking at them. On 6 September 2017 at 12:32, Francesco Mari <mari.france...@gmail.com> wrote: > I personally prefer the second approach. Is that OAK-6575-1 or OAK-6575-2 ? I assume OAK-6575-1 since OAK-6575 was my first approach ? If you mean OAK-6575-2, then I think someone with more knowledge of Oak will need to do the work as I am not at all confident I have covered the potential class/method navigation between a OakValue and a DataStore or if that navigation is even possible where the exposed datastore might actually be a composite datastore with the exposed part having no class based connection with the underlying S3 DataStore. (eg S3 DS cache). I am definitely not proud of OAK-6575-2, imho it's not elegant or efficient and would put up more barriers to future agility rather than remove them. > The only thing I'm not sure > about is if we want to define OakConversionService with such a > wildcard method. Assuming that OakConversionService will be called > from code running on top of the JCR API, we could provide instead more > specific conversion methods. For example, > > URI toURI(javax.jcr.Binary binary); > > What do you think about it? Is it too restrictive? Do we need a > wildcard method like currently defined in OakConversionsService? > Originally OakConversionsService would not have needed a new version of the package for each new conversion Oak supported, greatly simplifying dependencies downstream, especially where the source and target classes already exist. If a concrete method is used, the package will need to be versioned everytime. I suspect OSGi rules will require a minor version number increment each time, which is going to make a downstream developers life painful. In addition if an implementation bundle in Oak decides it wants to optionally support a conversion, it wont need to version the Oak API to achieve that. With concrete methods, ever change, wherever they are and however experimental will require a new version of the Oak API. This was the reason for going for a wildcard method. It allows extension without any downstream disruption, missing dependencies or out of band dependencies. I think the boils down to how much disruption Oak wants to inflict downstream to get new capabilities added, or inversely, how open Oak is to requests for API changes from downstream ? > > Moreover, I would leave PrivateURI out of the picture for the moment > since it's not clear from the patch how this is supposed to be used. > In fact, a comment in S3Backend explicitly states that is not > supported at this time. > PrivateURI was discussed on the OAK-6575 thread. It was added to the patch to illustrate how each patch would cope with extension of a new type. I propose to drop it from the final patch, however, in the second patch the disruption is quite large so it might be worth leaving it in there so that it can be implemented without more Oak API version changes. Best Regards Ian > > Finally, I suspect that in the second patch there was too much of an > aggressive rename refactoring. "types" was renamed to "customtypes" in > a lot of unrelated places. I would definitely double-check that. > On Tue, Sep 5, 2017 at 2:09 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Hi, > > > > Repeating the comment to on OAK-6575 here for further discussion. 2 new > > Patches exploring both options. > > > > https://github.com/ieb/jackrabbit-oak/compare/trunk.. > .ieb:OAK-6575-1?expand=1 > > > > This drops the OSGi AdapterManager/AdapterFactory in favour of a non OSGi > > static pattern. Implementations of the AdapterFactory self register > rather > > than rely on OSGi doing the wiring. This is probably an IoC anti pattern, > > but does avoid exposing the AdapterFactory/AdapterManager outside Oak. > > > > https://github.com/ieb/jackrabbit-oak/compare/trunk.. > .ieb:OAK-6575-2?expand=1 > > > > This drops the AdapterManager concept completely and attempts to get from > > Value to URI using mix in interfaces and instanceof. I cant be certain it > > manages to do this as there is a disconnect between Blob, Blobstore and > > DataStore implementations with no guarantee that a BlobStore as seen by > the > > Blob implementation actually implements DataStore, or the Blob that is > > exposed in the JCR Value (implemented by OakValue) actually connects to > the > > correct DataStore of it it connects to a FileDatastore cache on local > disk. > > I could only wire this as far as I did with API changes. I may have > broken > > some of the new multi node store and multi datastore code used for 0DT in > > the process. An Oak committer with global knowledge will probably be able > > to do better. > > > > > >
Re: OAK-6575 - A word of caution
Hi, Repeating the comment to on OAK-6575 here for further discussion. 2 new Patches exploring both options. https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:OAK-6575-1?expand=1 This drops the OSGi AdapterManager/AdapterFactory in favour of a non OSGi static pattern. Implementations of the AdapterFactory self register rather than rely on OSGi doing the wiring. This is probably an IoC anti pattern, but does avoid exposing the AdapterFactory/AdapterManager outside Oak. https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:OAK-6575-2?expand=1 This drops the AdapterManager concept completely and attempts to get from Value to URI using mix in interfaces and instanceof. I cant be certain it manages to do this as there is a disconnect between Blob, Blobstore and DataStore implementations with no guarantee that a BlobStore as seen by the Blob implementation actually implements DataStore, or the Blob that is exposed in the JCR Value (implemented by OakValue) actually connects to the correct DataStore of it it connects to a FileDatastore cache on local disk. I could only wire this as far as I did with API changes. I may have broken some of the new multi node store and multi datastore code used for 0DT in the process. An Oak committer with global knowledge will probably be able to do better. On 5 September 2017 at 08:19, Ian Boston <i...@tfd.co.uk> wrote: > Hi, > > On 5 September 2017 at 07:55, Francesco Mari <mari.france...@gmail.com> > wrote: > >> On Mon, Sep 4, 2017 at 6:18 PM, Ian Boston <i...@tfd.co.uk> wrote: >> > Do you mean: >> > keep the OakConversionService but put all the logic to convert from a >> > Value to a URI inside that implementation using new Oak SPI/APIs if >> > necessary and drop the AdapterManager completely ? >> >> Yes. I think there is no need to provide a generic adapter-like >> implementation to solve this use case. >> >> > This would mean something the datastore implementation implements which >> > oak-core can navigate to would have to implement a mix in interface >> with a >> > getURI() method. I am not certain what or how without trying to do it. >> > >> > Would that address your concern here ? >> >> I think it's worth trying. Thanks for bringing the conversation forward. >> > > > I will create 2 new branches. > 1 with no adapter manager relying on mixin interfaces and one with a non > OSGi adapter manager plugin pattern. > > Thanks for the input. > Best Regards > Ian > >
Re: OAK-6575 - A word of caution
Hi, On 5 September 2017 at 07:55, Francesco Mari <mari.france...@gmail.com> wrote: > On Mon, Sep 4, 2017 at 6:18 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Do you mean: > > keep the OakConversionService but put all the logic to convert from a > > Value to a URI inside that implementation using new Oak SPI/APIs if > > necessary and drop the AdapterManager completely ? > > Yes. I think there is no need to provide a generic adapter-like > implementation to solve this use case. > > > This would mean something the datastore implementation implements which > > oak-core can navigate to would have to implement a mix in interface with > a > > getURI() method. I am not certain what or how without trying to do it. > > > > Would that address your concern here ? > > I think it's worth trying. Thanks for bringing the conversation forward. > I will create 2 new branches. 1 with no adapter manager relying on mixin interfaces and one with a non OSGi adapter manager plugin pattern. Thanks for the input. Best Regards Ian
Re: OAK-6575 - A word of caution
Hi, On 4 September 2017 at 16:43, Francesco Mari <mari.france...@gmail.com> wrote: > On Mon, Sep 4, 2017 at 4:57 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Hi, > > IIUC There are 2 patterns: > > > > 1 Emitting a short lived signed URL as per the AWS CloudFront recommended > > method of serving private content. > > I have nothing to object to choice of AWS CloudFront. The past > technological choices were very different. At the time, we were taking > about converting a binary to an S3 bucket ID for users to manipulate. > This had, as you could understand, a bigger impact on Oak from the > point of view of security and data ownership. > agreed. > > I am still of the opinion, though, that if S3 and CloudFront are part > of the technological stack chosen by a user, it's the user's > responsibility to directly interact with it. Oak can transparently > manage binaries in S3 for you, as long as it remains transparent. If > as a user you need to manipulate buckets, or create CloudFront > distributions, or use the AWS infrastructure to replicate S3 buckets > in a different zone, then this is part of your business logic. Oak can > still be used to store metadata and S3 bucket IDs, but the management > of S3 and CloudFront it is up to you. This is just my view on the > problem. If our users desperately want Oak to transparently store data > in S3 but also to opt out when needed, we are going to provide this > functionality. Still, my point of view is that it's wrong. > agreed. Users can, to some extent already use CloudFront, but they have to make all the content public to all users to do so. If anyone uses this patch, they do so because they desperately want to have assets securely served as private content by cloudfront. To do so, they will have to go some deployment effort, generating the correct private keys and configuring the various AWS components etc etc > > > 2 An Oak internal AdapterFactory/AdapterManager pattern to avoid Oak API. > > changes. > > Sling adapters or a partial implementation of it is too dangerous in > its flexibility. Today we are using this system to solve only one > problem. Tomorrow we are going to end up like in Sling, where > everything might be adapted to everything else and module boundaries > will be more difficult to enforce. Sling adapters require too much > discipline to get right and are too easy to misuse. > > Moreover, as soon as you register an AdapterManager in OSGi people are > going to get curious. There is no real control in OSGi unless you use > subsystems, and we are not using them. The most widely used commercial > products based on Oak are not using OSGi subsystems either, and this > is just going to exacerbate the problem. > The Oak AdapterManager was written to work in both OSGi and non OSGi environments as guided by Chetan. A previous version of the patch used a static factory pattern (AdapterManager.getInstance()) and no OSGi at all. There are probably several other patterns, all non OSGi that would stop anyone outside Oak using it. Would a non OSGi pattern address your concerns ? > > My take on this is that it's alright to have something like a > OakConversionService, but please don't use a solution based on or > inspired to Sling adapters to implement it. It's easier, safer and > more controllable to implement this solution ad-hoc. > Do you mean: keep the OakConversionService but put all the logic to convert from a Value to a URI inside that implementation using new Oak SPI/APIs if necessary and drop the AdapterManager completely ? This would mean something the datastore implementation implements which oak-core can navigate to would have to implement a mix in interface with a getURI() method. I am not certain what or how without trying to do it. Would that address your concern here ? Best Regards ian > > > Would you be willing state your concerns for each one separately ? > > > > Best Regards > > Ian > > > > On 4 September 2017 at 15:43, Francesco Mari <mari.france...@gmail.com> > > wrote: > > > >> I'm in no position to veto the POC and I'm not willing to. I am well > >> aware of the importance of this feature. I expressed my concerns and > >> so did others. As the subject of this thread clearly stated, I only > >> wanted to point out that I had the feeling that we had a "reboot" of > >> the conversation for no good reason, with the unpleasant side effect > >> of proposing once again a pattern that received a lot of criticism in > >> the past. > >> > >> On Mon, Sep 4, 2017 at 4:18 PM, Bertrand Delacretaz > >> <bdelacre...@apache.org> wrote: > >> > On Mon, Sep 4, 2017 at 3:44 PM, Ian Bos
Re: OAK-6575 - A word of caution
Hi, IIUC There are 2 patterns: 1 Emitting a short lived signed URL as per the AWS CloudFront recommended method of serving private content. 2 An Oak internal AdapterFactory/AdapterManager pattern to avoid Oak API changes. Would you be willing state your concerns for each one separately ? Best Regards Ian On 4 September 2017 at 15:43, Francesco Mari <mari.france...@gmail.com> wrote: > I'm in no position to veto the POC and I'm not willing to. I am well > aware of the importance of this feature. I expressed my concerns and > so did others. As the subject of this thread clearly stated, I only > wanted to point out that I had the feeling that we had a "reboot" of > the conversation for no good reason, with the unpleasant side effect > of proposing once again a pattern that received a lot of criticism in > the past. > > On Mon, Sep 4, 2017 at 4:18 PM, Bertrand Delacretaz > <bdelacre...@apache.org> wrote: > > On Mon, Sep 4, 2017 at 3:44 PM, Ian Boston <i...@tfd.co.uk> wrote: > >> ...I feel > >> that Oak is weaker without the ability to offload bulk data streaming to > >> infrastructure designed for that purpose > > > > FWIW as an Oak user I share that feeling, IMO the use cases described > > at https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase are > > becoming more and more important. > > > > Not being a committer I don't really care about the internals, but > > please do not "throw the baby out with the bath water" if the > > internals need to change. > > > > -Bertrand >
Re: OAK-6575 - A word of caution
963. It was informed by that discussion with the intention of addressing the concerns, even though I did not remember exactly which thread it was from the sha1 or the OAK issue number. While it does introduce a pattern that would allow other conversions which would, as Tommaso and Francesco (and others have pointed out) this can only be utilized to create conversions by code owned by Oak, giving Oak complete control over what, if anything can be converted. That assumes code outside Oak is not allowed to implement Oak SPI interfaces and register those implementations. This pattern was introduced at the request of other Oak committers to remove the need for any Oak API changes. It is fundamentally different from OAK-1963 because it provides a signed URL that only allows the action that the current Oak session has been granted and only for short period of time, just enough to perform a redirect. If Oak committers feel that the contribution can't be included. Please feel free to close OAK-6575 and I will delete the GitHub branch. (I am not a committer, so have nothing binding here, other than a desire to improve Oak.) Best Regards Ian On 4 September 2017 at 14:44, Ian Boston <i...@tfd.co.uk> wrote: > Hi Francesco and Tommaso, > I was not aware of the previous discussions and will now read those > threads and issues. I submitted the issue, patch and thread in good faith, > not having a detailed knowledge of everything that has been discussed on > oak-dev. I was not trying to ignore or circumvent any previous decisions or > opinions. If OAK-6575 brings nothing new and does not address those > concerns I would not be offended to have the patch and issue rejected and > deleted and for Oak not to have this capability, although obviously I feel > that Oak is weaker without the ability to offload bulk data streaming to > infrastructure designed for that purpose. > > Let me read the links you have shared and get back to you. > Best Regards > Ian > > On 4 September 2017 at 14:20, Tommaso Teofili <tommaso.teof...@gmail.com> > wrote: > >> I share Francesco's concerns, the same way I shared them when we first >> discussed this way back; I tried to express my doubts on the current >> proposal in the email thread for OAK-6575 (linked also in Francesco's >> email), which got ignored; that's fine as long as the majority of us is >> happy with the current solution, probably it's just me having this not so >> good feeling here with me, that some of us want this feature to be in a >> way >> or another. >> >> Tommaso >> >> >> >> Il giorno lun 4 set 2017 alle ore 14:45 Francesco Mari < >> mari.france...@gmail.com> ha scritto: >> >> > On Mon, Sep 4, 2017 at 2:13 PM, Chetan Mehrotra >> > <chetan.mehro...@gmail.com> wrote: >> > > Adaptable pattern in itself was not much discussed at that time. >> > >> > Concerns about the adaptable pattern and its implications in data >> > encapsulation were expressed in the old thread at [1], [2], [3], [4], >> > and in other messages in the same thread. In the new thread, it was >> > pointed out [5] that solving the problem discussed in OAK-6575 is >> > orthogonal to the introduction of the adaptable pattern. Moreover, in >> > the new thread some concerns were expressed about the adaptable >> > pattern as well at [6] and [7]. >> > >> > [1]: >> > https://lists.apache.org/thread.html/8b0021987b824b096ea9b47 >> 0a4edd5edf1a246ef10548a2343ad4668@1462438837@%3Coak-dev.jack >> rabbit.apache.org%3E >> > [2]: >> > https://lists.apache.org/thread.html/4c352d247da81ca6ab05abf >> b7c53368fb88ed3c587fb09b42b87b6ae@1462439965@%3Coak-dev.jack >> rabbit.apache.org%3E >> > [3]: >> > https://lists.apache.org/thread.html/fbf28d5e864adbebd677a42 >> 5cc915f89cbd7e0ef85a589fb4f948b51@1462784316@%3Coak-dev.jack >> rabbit.apache.org%3E >> > [4]: >> > https://lists.apache.org/thread.html/77b05609b7e6f0deedbd328 >> 2f734f8c606e6a7451db0698f29082d7b@1462891501@%3Coak-dev.jack >> rabbit.apache.org%3E >> > [5]: >> > https://lists.apache.org/thread.html/d8da865c1f971ff4c84c961 >> 6d9f09ea9369f3e0c6db20f98fbc6e4d3@%3Coak-dev.jackrabbit.apache.org%3E >> > [6]: >> > https://lists.apache.org/thread.html/ab1d405724674ef5855 >> af0a0a9e87255d1144fe40762ff8e9d47@%3Coak-dev.jackrabbit.apache.org%3E >> > [7]: >> > https://lists.apache.org/thread.html/acc9eeef966916791c6073e >> 76d3baa232f48abece4ae6b31f264a8ba@%3Coak-dev.jackrabbit.apache.org%3E >> > >> > >
Re: OAK-6575 - A word of caution
Hi Francesco and Tommaso, I was not aware of the previous discussions and will now read those threads and issues. I submitted the issue, patch and thread in good faith, not having a detailed knowledge of everything that has been discussed on oak-dev. I was not trying to ignore or circumvent any previous decisions or opinions. If OAK-6575 brings nothing new and does not address those concerns I would not be offended to have the patch and issue rejected and deleted and for Oak not to have this capability, although obviously I feel that Oak is weaker without the ability to offload bulk data streaming to infrastructure designed for that purpose. Let me read the links you have shared and get back to you. Best Regards Ian On 4 September 2017 at 14:20, Tommaso Teofiliwrote: > I share Francesco's concerns, the same way I shared them when we first > discussed this way back; I tried to express my doubts on the current > proposal in the email thread for OAK-6575 (linked also in Francesco's > email), which got ignored; that's fine as long as the majority of us is > happy with the current solution, probably it's just me having this not so > good feeling here with me, that some of us want this feature to be in a way > or another. > > Tommaso > > > > Il giorno lun 4 set 2017 alle ore 14:45 Francesco Mari < > mari.france...@gmail.com> ha scritto: > > > On Mon, Sep 4, 2017 at 2:13 PM, Chetan Mehrotra > > wrote: > > > Adaptable pattern in itself was not much discussed at that time. > > > > Concerns about the adaptable pattern and its implications in data > > encapsulation were expressed in the old thread at [1], [2], [3], [4], > > and in other messages in the same thread. In the new thread, it was > > pointed out [5] that solving the problem discussed in OAK-6575 is > > orthogonal to the introduction of the adaptable pattern. Moreover, in > > the new thread some concerns were expressed about the adaptable > > pattern as well at [6] and [7]. > > > > [1]: > > https://lists.apache.org/thread.html/8b0021987b824b096ea9b470a4edd5 > edf1a246ef10548a2343ad4668@1462438837@%3Coak-dev.jackrabbit.apache.org%3E > > [2]: > > https://lists.apache.org/thread.html/4c352d247da81ca6ab05abfb7c5336 > 8fb88ed3c587fb09b42b87b6ae@1462439965@%3Coak-dev.jackrabbit.apache.org%3E > > [3]: > > https://lists.apache.org/thread.html/fbf28d5e864adbebd677a425cc915f > 89cbd7e0ef85a589fb4f948b51@1462784316@%3Coak-dev.jackrabbit.apache.org%3E > > [4]: > > https://lists.apache.org/thread.html/77b05609b7e6f0deedbd3282f734f8 > c606e6a7451db0698f29082d7b@1462891501@%3Coak-dev.jackrabbit.apache.org%3E > > [5]: > > https://lists.apache.org/thread.html/d8da865c1f971ff4c84c9616d9f09e > a9369f3e0c6db20f98fbc6e4d3@%3Coak-dev.jackrabbit.apache.org%3E > > [6]: > > https://lists.apache.org/thread.html/ab1d405724674ef5855af0a0a9 > e87255d1144fe40762ff8e9d47@%3Coak-dev.jackrabbit.apache.org%3E > > [7]: > > https://lists.apache.org/thread.html/acc9eeef966916791c6073e76d3baa > 232f48abece4ae6b31f264a8ba@%3Coak-dev.jackrabbit.apache.org%3E > > >
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
On 24 August 2017 at 14:42, Michael Dürig <mdue...@apache.org> wrote: > > > On 24.08.17 15:33, Chetan Mehrotra wrote: > >> Inside Oak it would have its own version of an AdapterManager, >>> AdapterFactory. the DataStore would implement an AdapterFactory and >>> register it with the AdapterManager. The OakConversionService >>> implementation would then use the AdapterManager to perform the >>> conversion. >>> If no AdapterFactory to adapt from JCR Binary to URI existed, then null >>> would be returned from the OakConversionService. >>> >>> Thats no API changes to Blob, binary or anything. No complex >>> transformation >>> through multiple layers. No instanceof required and no difference between >>> Sling and non Sling usage. >>> It does require an Oak version of the AdapterManager and AdapterFactory >>> concepts, but does not require anything to implement Adaptable. >>> >> >> Thanks for those details. Much clear now. So with this we need not add >> adaptTo to all stuff. Instead provide an OakConversionService which >> converts the Binary to provided type and then have DataStores provide >> the AdapterFactory. >> >> This would indeed avoid any new methods in existing objects and >> provide a single entry point. >> >> +1 for this approach >> > > Yay! Rough quick PoC at [1] to ilustrate how it might work. 1 https://github.com/apache/jackrabbit-oak/compare/trunk...ieb:OAK-6575?expand=1 > > > Michael > > > >> Chetan Mehrotra >> >> >> On Thu, Aug 24, 2017 at 6:16 AM, Ian Boston <i...@tfd.co.uk> wrote: >> >>> Hi, >>> I am probably not helping as here as there are several layers and I think >>> they are getting confused between what I am thinking and what you are >>> thinking. >>> >>> I was thinking Oak exposed a service to convert along the lines of the >>> OSCi >>> converter service or the OakConversionService suggested earlier. Both >>> Sling >>> and other uses of Oak would use it. >>> >>> Inside Oak it would have its own version of an AdapterManager, >>> AdapterFactory. the DataStore would implement an AdapterFactory and >>> register it with the AdapterManager. The OakConversionService >>> implementation would then use the AdapterManager to perform the >>> conversion. >>> If no AdapterFactory to adapt from JCR Binary to URI existed, then null >>> would be returned from the OakConversionService. >>> >>> Thats no API changes to Blob, binary or anything. No complex >>> transformation >>> through multiple layers. No instanceof required and no difference between >>> Sling and non Sling usage. >>> It does require an Oak version of the AdapterManager and AdapterFactory >>> concepts, but does not require anything to implement Adaptable. >>> >>> As I showed in the PoC, all the S3 specific implementation fits inside >>> the >>> S3DataStore which already does everything required to perform the >>> conversion. It already goes from Binary -> Blob -> ContentIdentifier -> >>> S3 >>> Key -> S3 URL by virtue of >>> ValueImpl.getBlob((Value)jcrBinary).getContentIdentifier() -> convert to >>> S3key and then signed URL. >>> >>> If it would help, I can do a patch to show how it works. >>> Best Regards >>> Ian >>> >>> On 24 August 2017 at 13:05, Chetan Mehrotra <chetan.mehro...@gmail.com> >>> wrote: >>> >>> No API changes to any existing Oak APIs, >>>>> >>>> >>>> Some API needs to be exposed. Note again Oak does not depend on Sling >>>> API. Any such integration code is implemented in Sling Base module >>>> [1]. But that module would still require some API in Oak to provide >>>> such an adaptor >>>> >>>> The adaptor proposal here is for enabling layers within Oak to allow >>>> conversion of JCR Binary instance to SignedBinary. Now how this is >>>> exposed to end user depends on usage context >>>> >>>> Outside Sling >>>> ------ >>>> >>>> Check if binary instanceof Oak Adaptable. If yes then cast it and adapt >>>> it >>>> >>>> import org.apache.jackrabbit.oak.api.Adaptable >>>> >>>> Binary b = ... >>>> SignedBinary sb = null >>>> if (b instanceof Adaptable) { >>>>
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
Hi, I am probably not helping as here as there are several layers and I think they are getting confused between what I am thinking and what you are thinking. I was thinking Oak exposed a service to convert along the lines of the OSCi converter service or the OakConversionService suggested earlier. Both Sling and other uses of Oak would use it. Inside Oak it would have its own version of an AdapterManager, AdapterFactory. the DataStore would implement an AdapterFactory and register it with the AdapterManager. The OakConversionService implementation would then use the AdapterManager to perform the conversion. If no AdapterFactory to adapt from JCR Binary to URI existed, then null would be returned from the OakConversionService. Thats no API changes to Blob, binary or anything. No complex transformation through multiple layers. No instanceof required and no difference between Sling and non Sling usage. It does require an Oak version of the AdapterManager and AdapterFactory concepts, but does not require anything to implement Adaptable. As I showed in the PoC, all the S3 specific implementation fits inside the S3DataStore which already does everything required to perform the conversion. It already goes from Binary -> Blob -> ContentIdentifier -> S3 Key -> S3 URL by virtue of ValueImpl.getBlob((Value)jcrBinary).getContentIdentifier() -> convert to S3key and then signed URL. If it would help, I can do a patch to show how it works. Best Regards Ian On 24 August 2017 at 13:05, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > > No API changes to any existing Oak APIs, > > Some API needs to be exposed. Note again Oak does not depend on Sling > API. Any such integration code is implemented in Sling Base module > [1]. But that module would still require some API in Oak to provide > such an adaptor > > The adaptor proposal here is for enabling layers within Oak to allow > conversion of JCR Binary instance to SignedBinary. Now how this is > exposed to end user depends on usage context > > Outside Sling > -- > > Check if binary instanceof Oak Adaptable. If yes then cast it and adapt it > > import org.apache.jackrabbit.oak.api.Adaptable > > Binary b = ... > SignedBinary sb = null > if (b instanceof Adaptable) { >sb = ((Adaptable)b).adaptTo(SignedBinary.class); > } > Within Sling > > > Have an AdapterManager implemented in Sling JCR Base [1] which uses > above approach > > Chetan Mehrotra > [1] https://github.com/apache/sling/tree/trunk/bundles/jcr/base > > > On Thu, Aug 24, 2017 at 4:55 AM, Ian Boston <i...@tfd.co.uk> wrote: > > From the javadoc in [1] > > > > "The adaptable object may be any non-null object and is not required to > > implement the Adaptable interface." > > > > > > On 24 August 2017 at 12:54, Ian Boston <i...@tfd.co.uk> wrote: > > > >> Hi, > >> That would require javax.jcr.Binary to implement Adaptable, which it > cant. > >> (OakBinary could but it doesnt need to). > >> > >> Using Sling AdapterFactory/AdapterManger javadoc (to be replaced with > Oaks > >> internal version of the same) > >> > >> What is needed is an AdapterFactory for javax.jcr.Binary to SignedBinary > >> provided by the S3DataStore itself. > >> > >> Since javax.jcr.Binary cant extend Adaptable, its not possible to call > >> binary.adaptTo(SignedBinary.class) without a cast, hence, > >> the call is done via the AdapterManager[1] > >> > >> SignedBinary signedBinary = adapterManager.getAdapter(binary, > >> SignedBinary.class); > >> > >> --- > >> You could just jump to > >> URI uri = adapterManager.getAdapter(binary, URI.class); > >> > >> No API changes to any existing Oak APIs, > >> > >> Best Regards > >> Ian > >> > >> > >> 1 https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/ > >> AdapterManager.html > >> > >> > >> > >> On 24 August 2017 at 12:38, Chetan Mehrotra <chetan.mehro...@gmail.com> > >> wrote: > >> > >>> > various layers involved. The bit I don't understand is how the > adaptable > >>> > pattern would make those go away. To me that pattern is just another > >>> way to > >>> > implement this but it would also need to deal with all those layers. > >>> > >>> Yes this adapter support would need to be implement at all layers. > >>> > >>> So call to > >>> 1. binary.adaptTo(SignedBinary.class) //binary is JCR Binary > >>>
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
>From the javadoc in [1] "The adaptable object may be any non-null object and is not required to implement the Adaptable interface." On 24 August 2017 at 12:54, Ian Boston <i...@tfd.co.uk> wrote: > Hi, > That would require javax.jcr.Binary to implement Adaptable, which it cant. > (OakBinary could but it doesnt need to). > > Using Sling AdapterFactory/AdapterManger javadoc (to be replaced with Oaks > internal version of the same) > > What is needed is an AdapterFactory for javax.jcr.Binary to SignedBinary > provided by the S3DataStore itself. > > Since javax.jcr.Binary cant extend Adaptable, its not possible to call > binary.adaptTo(SignedBinary.class) without a cast, hence, > the call is done via the AdapterManager[1] > > SignedBinary signedBinary = adapterManager.getAdapter(binary, > SignedBinary.class); > > --- > You could just jump to > URI uri = adapterManager.getAdapter(binary, URI.class); > > No API changes to any existing Oak APIs, > > Best Regards > Ian > > > 1 https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/ > AdapterManager.html > > > > On 24 August 2017 at 12:38, Chetan Mehrotra <chetan.mehro...@gmail.com> > wrote: > >> > various layers involved. The bit I don't understand is how the adaptable >> > pattern would make those go away. To me that pattern is just another >> way to >> > implement this but it would also need to deal with all those layers. >> >> Yes this adapter support would need to be implement at all layers. >> >> So call to >> 1. binary.adaptTo(SignedBinary.class) //binary is JCR Binary >> 2. results in blob.adaptTo(SignedBinary.class) //blob is Oak Blob. >> Blob interface would extend adaptable >> 3. results in SegmentBlob delegating to BlobStoreBlob which >> 4. delegates to BlobStore // Here just passing the BlobId >> 5. which delegates to DataStoreBlobStore >> 6. which delegates to S3DataStore >> 7. which returns the SignedBinary implementation >> >> However adapter support would allow us to make this instance of check >> extensible. Otherwise we would be hardcoding instance of check to >> SignedBinary at each of the above place though those layers need not >> be aware of SignedBinary support (its specific to S3 impl) >> >> Chetan Mehrotra >> > >
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
Hi, That would require javax.jcr.Binary to implement Adaptable, which it cant. (OakBinary could but it doesnt need to). Using Sling AdapterFactory/AdapterManger javadoc (to be replaced with Oaks internal version of the same) What is needed is an AdapterFactory for javax.jcr.Binary to SignedBinary provided by the S3DataStore itself. Since javax.jcr.Binary cant extend Adaptable, its not possible to call binary.adaptTo(SignedBinary.class) without a cast, hence, the call is done via the AdapterManager[1] SignedBinary signedBinary = adapterManager.getAdapter(binary, SignedBinary.class); --- You could just jump to URI uri = adapterManager.getAdapter(binary, URI.class); No API changes to any existing Oak APIs, Best Regards Ian 1 https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/AdapterManager.html On 24 August 2017 at 12:38, Chetan Mehrotrawrote: > > various layers involved. The bit I don't understand is how the adaptable > > pattern would make those go away. To me that pattern is just another way > to > > implement this but it would also need to deal with all those layers. > > Yes this adapter support would need to be implement at all layers. > > So call to > 1. binary.adaptTo(SignedBinary.class) //binary is JCR Binary > 2. results in blob.adaptTo(SignedBinary.class) //blob is Oak Blob. > Blob interface would extend adaptable > 3. results in SegmentBlob delegating to BlobStoreBlob which > 4. delegates to BlobStore // Here just passing the BlobId > 5. which delegates to DataStoreBlobStore > 6. which delegates to S3DataStore > 7. which returns the SignedBinary implementation > > However adapter support would allow us to make this instance of check > extensible. Otherwise we would be hardcoding instance of check to > SignedBinary at each of the above place though those layers need not > be aware of SignedBinary support (its specific to S3 impl) > > Chetan Mehrotra >
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
Hi, On 24 August 2017 at 10:20, Julian Sedding <jsedd...@gmail.com> wrote: > Hi > > On Thu, Aug 24, 2017 at 9:27 AM, Ian Boston <i...@tfd.co.uk> wrote: > > On 24 August 2017 at 08:18, Michael Dürig <mdue...@apache.org> wrote: > > > >> > >> > >> URI uri = ((OakValueFactory) valueFactory).getSignedURI(binProp); > >> > >> > > +1 > > > > One point > > Users in Sling dont know abou Oak, they know about JCR. > > I think this issue should be solved in two steps: > > 1. Figure out how to surface a signed URL from the DataStore to the > level of the JCR (or Oak) API. > 2. Provide OSGi glue inside Sling, possibly exposing the signed URL it > via adaptTo(). > > > > > URI uri = ((OakValueFactory) > > valueFactory).getSignedURI(jcrNode.getProperty("jcr:data")); > > > > No new APIs, let OakValueFactory work it out and return null if it cant > do > > it. It should also handle a null parameter. > > (I assume OakValueFactory already exists) > > > > If you want to make it extensible > > > > T convertTo(Object source, Class target); > > > > used as > > > > URI uri = ((OakValueFactory) > > valueFactory). convertTo(jcrNode.getProperty("jcr:data"), URI.class); > > There is an upcoming OSGi Spec for a Converter service (RFC 215 Object > Conversion, also usable outside of OSGI)[0]. It has an implementation > in Felix, but afaik no releases so far. > > A generic Converter would certainly help with decoupling. Basically > the S3-DataStore could register an appropriate conversion, hiding all > implementation details. > Sounds like a good fit. +1 Best Regards Ian > > Regards > Julian > > [0] https://github.com/osgi/design/blob/05cd5cf03d4b6f8a512886eae472a6 > b6fde594b0/rfcs/rfc0215/rfc-0215-object-conversion.pdf > > > > > The user doesnt know or need to know the URI is signed, it needs a URI > that > > can be resolved. > > Oak wants it to be signed. > > > > Best Regards > > Ian > > > > > > > >> Michael > >> > >> > >> > >> > >>> A rough sketch of any alternative proposal would be helpful to decide > >>> how to move forward > >>> > >>> Chetan Mehrotra > >>> > >>> >
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
On 24 August 2017 at 09:16, Michael Dürig <mdue...@apache.org> wrote: > > > On 24.08.17 09:27, Ian Boston wrote: > >> On 24 August 2017 at 08:18, Michael Dürig <mdue...@apache.org> wrote: >> >> >>> >>> URI uri = ((OakValueFactory) valueFactory).getSignedURI(binProp); >>> >>> >>> +1 >> >> One point >> Users in Sling dont know abou Oak, they know about JCR. >> >> URI uri = ((OakValueFactory) >> valueFactory).getSignedURI(jcrNode.getProperty("jcr:data")); >> >> No new APIs, let OakValueFactory work it out and return null if it cant do >> it. It should also handle a null parameter. >> (I assume OakValueFactory already exists) >> > > No, OakValueFactory does not exist as API (yet). But adding it would be > more inline with how we approached the Oak API traditionally. > > If it doesnt exist then perhaps Oak could add a Service interface that deals with conversions, rather than expose a second adaptable pattern in Sling, or require type casting and instanceof. How Oak implements those conversions is upto Oak. eg public interface OakConversionService { T convertTo(Object source, Class target); } implemented as an OSGi Service used as eg @Reference private OakConversionService conversionService; public void doFilter(...) { ... URL u = conversionService.convertTo(jcrBinary, URI.class); } Best Regards Ian > I'm not against introducing the adaptable pattern but would like to > understand whether there is concrete enough use cases beyond the current > one to warrant it. > > Michael > > > >> If you want to make it extensible >> >> T convertTo(Object source, Class target); >> >> used as >> >> URI uri = ((OakValueFactory) >> valueFactory). convertTo(jcrNode.getProperty("jcr:data"), URI.class); >> >> The user doesnt know or need to know the URI is signed, it needs a URI >> that >> can be resolved. >> Oak wants it to be signed. >> >> Best Regards >> Ian >> >> >> >> Michael >>> >>> >>> >>> >>> A rough sketch of any alternative proposal would be helpful to decide >>>> how to move forward >>>> >>>> Chetan Mehrotra >>>> >>>> >>>> >>
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
The datastore should understand how to go from Blob -> URI. In the case of S3 it does and uses Blob.getContentId(). If the datastore doesnt know how to do it, then its not supported by the datastore. You might need a DataStore.getSignedURI(Blob b) method. On 24 August 2017 at 08:27, Chetan Mehrotrawrote: > > Fair point. So this is more about dynamic adaptability than future > > extendibility. But AFIU this could still be achieved without the full > > adaptable machinery: > > > > if (binProp instanceOf SignableBin) { > > URI uri = ((SignableBin) binProp).getSignedURI(); > > if (uri != null) { > > // resolve URI etc. > > } > > } > > This would be tricky. The current logic is like below. > > 1. Oak JCR BinaryImpl holds a ValueImpl > 2. ValueImpl -> PropertyState -> Blob > 3. From Blob following paths are possible >- Blob -> SegmentBlob -> BlobStoreBlob -> DataRecord -> S3DataRecord >- Blob -> ArrayBasedBlob >- Blob ... MongoBlob > > So at JCR level where we have a PropertyState we cannot determine if > the Blob provided by it can provide a signed binary without adding > such instance of check at each place. Hence the adaptor based proposal > > Chetan Mehrotra >
Re: OAK-6575 - Provide a secure external URL to a DataStore binary.
On 24 August 2017 at 08:18, Michael Dürigwrote: > > > URI uri = ((OakValueFactory) valueFactory).getSignedURI(binProp); > > +1 One point Users in Sling dont know abou Oak, they know about JCR. URI uri = ((OakValueFactory) valueFactory).getSignedURI(jcrNode.getProperty("jcr:data")); No new APIs, let OakValueFactory work it out and return null if it cant do it. It should also handle a null parameter. (I assume OakValueFactory already exists) If you want to make it extensible T convertTo(Object source, Class target); used as URI uri = ((OakValueFactory) valueFactory). convertTo(jcrNode.getProperty("jcr:data"), URI.class); The user doesnt know or need to know the URI is signed, it needs a URI that can be resolved. Oak wants it to be signed. Best Regards Ian > Michael > > > > >> A rough sketch of any alternative proposal would be helpful to decide >> how to move forward >> >> Chetan Mehrotra >> >>
Re: Oak Metrics.
Hi, On 15 June 2017 at 11:42, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Thu, Jun 15, 2017 at 2:55 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Is are the new Oak Metrics documented somewhere ? > > No because so far no one else asked for it and only I was making use > of them! Now that you asked would try to document it > perfect, thanks. > > > NRT_REFRESH_TIME > > That measures the time to refresh the index reader (see NRTIndex > class). Probably we should have a way to add description at time of > Metric creation. > Elsewhere I am trying to encourage doing that in JavaDoc using a @metric custom javadoc tag. I dont know if that will cause problems for Oaks build, but it should be possible to extract the documentation at a later date if required. IIUC the tag doesnt need to be formally implemented. In the case of repository stats all that would be required is the tag added to the existing javadoc. When the metric name is the class name, it becomes trivial to locate the documentation. Best Regards Ian > > Chetan Mehrotra >
Oak Metrics.
Hi, Is are the new Oak Metrics documented somewhere ? I see that [1] exists. Google returns 2 hits for NRT_REFRESH_TIME, oddly neither relevant. (now there will be 1 relevant hit, this thread) Is there 1 page where I can look for a description of all metrics in Oak. Best Regards Ian 1 https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/stats/RepositoryStatistics.java
Re: minimize the impact when creating a new index (or re-indexing)
Hi, Assuming the MongoDB instance is performing well and does not show any slow queries in the mongodb logs, running the index operation on many cores, each core handling one index writer should parallelise the operation. IIRC this is theoretically possible, and might have been implemented in the latest versions of Oak (Chetan?). If you are in AWS then a X1 instance will give you 128 cores and upto 2TB of ram, for the duration of the re-index. Other cloud vendors have equivalent VMS. Whatever the instance is, the Oak cluster leader should be allocated to this instance as IIRC only the Oak cluster leader performs the index operation. The single threaded index writer is a feature/limitation of the way Lucene works, but Oak has many independent indexes. your deployment may not have 128 so may not be able to use all the cores of the largest instance. If however, the MongoDB cluster is showing any signs or slow queries in the logs (> 100ms), any level of read IO then however many cores over however many VMs wont speed the process up and may slow the process down. To be certain of no bottleneck in MongoDB, ensure the VM has more memory than the disk size of the database. The latest version of MongoDB supported by Oak, running WiredTiger will greatly reduce memory pressure and IO as it doesnt use memory mapping as the primary DB to disk mechanism, and compresses the data as it writes. The instance running Oak must also be sized correctly. I suspect you will be running a persistent cache which must be sized to give optimum performance and minimise IO, which then also requires sufficient memory. For the period of the re-index, the largest AEM instance you can afford will minimise IO. Big VMs (in AWS at least) have more network bandwidth which also helps. Finally disks. Dont use HDD, only use SSD and ensure that there is sufficient IOPS available at all times, and enable all the Oak indexing optimisation switches (copyOnRead, copyOnWrite etc) IO generally kills performance, and if the VMs have not been configured (THP off, readhead low, XFS or noatime ext4 disks) then that IO will be amplified. If you have done all of this, then you might have to wait for OAK-6246 (I see Chetan just responded), but if you haven't please do check that you are running as fast as possible with no constrained resources. HTH, if its been said before sorry for the noise and please ignore. Best Regards Ian On 9 June 2017 at 07:49, Alvaro Cabrerizowrote: > Thanks Chetan, > > Sorry, but that part is out of my reach. There is an IT team in charge of > managing the infrastructure and make optimizations, so It is difficult to > get that information. Basically what is was looking for is the way > to parallelize the indexing process. On the other hand, reducing the > indexing time would be fine (it was previously reduced from 7 to 2 days), > but I think that traversing more than 1 nodes is a pretty tough > operation and I'm not sure if there is much we can do. Anyway, any pointer > related to indexing optimization or any advice on how to design the repo > (e.g. use different paths to isolate different groups of assets, use > different nodetypes to differentiate content type, create different > repositories [is that possible?] for different groups of uses...) is > welcome. > > Regards. > > On Thu, Jun 8, 2017 at 12:44 PM, Chetan Mehrotra < > chetan.mehro...@gmail.com> > wrote: > > > On Thu, Jun 8, 2017 at 4:04 PM, Alvaro Cabrerizo > > wrote: > > > It is a DocumentNodeStore based instance. We don't extract data from > > binary > > > files, just indexing metadata stored on nodes. > > > > In that case 48 hrs is a long time. Can you share some details around > > how many nodes are being indexed as part of that index and the repo > > size in terms of Mongo stats if possible? > > > > Chetan Mehrotra > > >
Re: codahale metrics Jmx reporter
On 30 May 2017 at 07:35, Chetan Mehrotrawrote: > On Tue, May 30, 2017 at 11:53 AM, Andrei Kalfas > wrote: > > Looks to me that there is a dependency on oak functionality. > > Ian can confirm but I believe thats not required now (the component > does not get activated) and was only a temporary workaround. Oak > publishes the MetricRegistry instance in OSGi service registry and > then any component can look up that service and configure a reporter > for it > Correct. Older versions of Oak didn't publish the MetricsRegistry as a service so it had to be extracted from Oak forcibly. (;) Once older versions of Oak disappear, the dependency goes. I could have done it with the class name and no dependency, but that was more effort, and no one would have found the code to delete. Best Regards Ian > > Chetan Mehrotra >
Re: codahale metrics Jmx reporter
Hi, Here are some reporters that work with Sling/Oak/AEM. [1]. They all look for components registered as implementing MetricsRegistry and then aggregate the data pumping it out to a reporter. They are all implemented as independent bundles. TBH I would avoid pumping the metrics into JMX as JMX was designed for management, and not metrics. It might be able to cope with trivial metrics sets, but will likely start to consume unreasonable JVM resources with a production set of metrics.. Most of the reporters in [1] are simple wrappers round other 3rd party Metrics reporters. If you have a target not included in that list, its trivial to follow the same pattern. HTH Best Regards Ian 1 https://github.com/ieb/statsd-reporter-osgi https://github.com/ieb/prometheus-reporter-osgi https://github.com/ieb/influxdb-reporter-osgi https://github.com/ieb/gmond-osgi https://github.com/ieb/tsdb-reporter-osgi On 29 May 2017 at 12:48, Andrei Kalfaswrote: > Hi, > > > By default this is the only mode. > > What would you guys rather prefer, have a different component peeks into > the metrics registry or change oak-core to deal with multiple reporters - > Jmx should be the default one. > > Thanks, > Andrei > >
Re: MongoMK failover behaviour.
Hi, On 4 May 2017 at 15:19, Marcel Reutegger <mreut...@adobe.com> wrote: > Hi, > > On 04/05/17 14:57, Ian Boston wrote: > >> Before 120 seconds, should the MongoDB Java driver route read queries to a >> secondary and use the new primary without any action by Oak (eg closing a >> connection and opening a new one ) ? >> > > Yes, the MongoDB Java driver automatically routes queries based on their > required read preference. The failover is automatic and the driver should > direct queries to the new primary once available. Connection pooling is > done by the driver. Oak does not manage those. > Thanks. Looks like there might be a problem with the MongoDB deployment in the case I am looking at. Either due to performance or misconfiguration. Dropping a primary results in read queries failing and after 120s the Oak repositories shutdown as they are not able to write. All that points to the MongoDB driver config, or the MongoDB instances and not Oak. Best Regards Ian > > Regards > Marcel >
Re: MongoMK failover behaviour.
Hi, On 4 May 2017 at 11:26, Marcel Reutegger <mreut...@adobe.com> wrote: > Hi, > > On 04/05/17 12:02, Ian Boston wrote: > >> What is the expected behaviour when a Oak MongoMK experiences a MongoDB >> primary failure. >> >> I am looking at an instance that appears to try and retry reads repeatedly >> from the MongoDB primary and after 60s or more reports the Oak Discovery >> lease has been lost, resulting in many minutes of retries there eventually >> shutting down the repository. >> >> I don't currently have the MongoDB logs to share. Just wondering what to >> expect at this stage ? >> > > Oak will stop the oak-core bundle if a MongoDB primary is unavailable for > more than 110 seconds. The 110 seconds are based on the 120 seconds lease > timeout and a lease update interval of 10 seconds. > Yes, that happens after 120s. Before 120 seconds, should the MongoDB Java driver route read queries to a secondary and use the new primary without any action by Oak (eg closing a connection and opening a new one ) ? Best Regards Ian > > When this happens, all reads and writes to the repository will fail. > Though, in an OSGi environment services depending on oak-core should stop > as well. You will need to restart the system or affected bundles once the > primary is available again. See also discussion in OAK-3397 and OAK-3250. > > Regards > Marcel >
MongoMK failover behaviour.
Hi, What is the expected behaviour when a Oak MongoMK experiences a MongoDB primary failure. I am looking at an instance that appears to try and retry reads repeatedly from the MongoDB primary and after 60s or more reports the Oak Discovery lease has been lost, resulting in many minutes of retries there eventually shutting down the repository. I don't currently have the MongoDB logs to share. Just wondering what to expect at this stage ? I am starting from the assumption that Oak works perfectly in this regard. Best Regards Ian
Re: Cassandra as NodeStore
Hi Juanjo, For Cassandra to work as a DocumentStore for Oak it needs to be configured with a Quorum high enough to ensure that all writes are sequentially eventually consistent. That might kill Cassandras write performance. RDB backends have this behaviour because they are single instance. MongoDB also as the primary of a shard is a write singleton. I am not 100% certain what will happen if the backend isnt sequentially eventually consistent, but I think individual cluster instances will see a partial sparse view of the current head state, which could feedback when that instance writes back causing an unpredictable final state. Eg, property A gets changed on instance B, instance C sees the same head revision as instance B, but is missing the changes to property A. Instance C writes data based on no changes to A. or not... depending on the current state of A on C. moments later A would have a different state when its change gets filled in by the non sequentially eventually consistent behaviour of Cassandra. There might be a way of encoding the revision into the document ID, which could avoid this, but I think that would lead to billions of documents in Cassandra prematurely. I would be interested to hear if you get it to work. HTH Best Regards Ian On 29 March 2017 at 09:48, Chetan Mehrotrawrote: > Adding to that below are few features I think any new store would have > to support > > 1. Sorted primary key access - For now required to find children of > any parent path > 2. secondary indexes apart from _id field > 3. compare-and-set (CAS) operations for sub fields > > Chetan Mehrotra > > > On Wed, Mar 29, 2017 at 1:59 PM, Marcel Reutegger > wrote: > > Hi Juanjo, > > > > I don't know Cassandra that well, but I'd say this is feasible. Though, > you > > would probably not implement a NodeStore but a backend for the > > DocumentNodeStore. That is, you need to implement a DocumentStore [0]. > There > > are currently implementations for MongoDB [1] and RDB [2]. > > > > Consistency is indeed important and Oak requires rater strict guarantees. > > > > Regards > > Marcel > > > > [0] > > http://svn.apache.org/repos/asf/jackrabbit/oak/tags/ > jackrabbit-oak-1.6.0/oak-core/src/main/java/org/apache/ > jackrabbit/oak/plugins/document/DocumentStore.java > > [1] > > http://svn.apache.org/repos/asf/jackrabbit/oak/tags/ > jackrabbit-oak-1.6.0/oak-core/src/main/java/org/apache/ > jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java > > [2] > > http://svn.apache.org/repos/asf/jackrabbit/oak/tags/ > jackrabbit-oak-1.6.0/oak-core/src/main/java/org/apache/ > jackrabbit/oak/plugins/document/rdb/RDBDocumentStore.java > > > > > > On 28/03/17 19:34, Juan José Vázquez Delgado wrote: > >> > >> Hello guys, I'm currently assessing Oak as an alternative for content > >> management on my cloud product. However, I already have a Cassandra > >> cluster > >> as the main persistence technology and to go additionally with Mongo > would > >> turn out in more complexity in terms of manteinance and support. > >> > >> So, have you ever consider Cassandra as an alternative to Mongo for node > >> storing?. I'd be willing to tackle the implementation of such a plugin > but > >> I'd like to know if you find any drawbacks in advance. Perhaps you've > >> already tried it and stumbled across with blocking issues. For instance, > >> I'd be concern with Cassandra's eventual consistency. > >> > >> Thanks in adance for considering this. > >> > >> Regards, > >> > >> Juanjo > >> > > >
Re: Timeout on Oak Queries ?
I have created https://issues.apache.org/jira/browse/OAK-5978 to track. Best Regards Ian On 23 March 2017 at 14:03, Davide Giannella <dav...@apache.org> wrote: > On 23/03/2017 11:12, Ian Boston wrote: > > Hi, > > Is it possible to configure a maximum execution time for Oak queries ? > > > > Other "database" systems often monitor the time a query is taking and > kill > > if if it exceeds a time limit to avoid long running queries causing > outages. > > > > > > I'm aware of "timeouts" on node reads but not on time. So we cancel a > query that iterated over a certain amount of nodes or as of 1.6 may hit > the traversal index https://issues.apache.org/jira/browse/OAK-4888. > > Not aware on time constraints. I think though it will be a good feature > to add if not there. > > Davide > > >
Timeout on Oak Queries ?
Hi, Is it possible to configure a maximum execution time for Oak queries ? Other "database" systems often monitor the time a query is taking and kill if if it exceeds a time limit to avoid long running queries causing outages. Best Regards Ian
Re: Oak 1.0.29 vs 1.4.10 memory mapping.
Hi, Based on the page fault behaviour, I think the areas mapped and reported by pmap are being actively accessed by the JVM. The number of page faults for Oak 1.4.11 is well over 2x the number of page faults for Oak 1.0.29 on the same VM, with the same DB when performing an oak-run offline compaction. This is on the same VM with the same repository in the same state. The Tar files are not the same, but 1 copy of the tar files is 32GB in both instances, 1.4.11 maps 64GB as mentioned before. I dont know if its the behaviour seen in OAK-4274. I have seen similar in the past. I was not confident that a GC cycle did unmap, but it would be logical. Best Regards Ian On 23 March 2017 at 09:07, Francesco Mari <mari.france...@gmail.com> wrote: > You might be hitting OAK-4274, which I discovered quite some time ago. > I'm not aware of a way to resolve this issue at the moment. > > 2017-03-22 16:47 GMT+01:00 Alex Parvulescu <alex.parvule...@gmail.com>: > > Hi, > > > > To give more background this came about during an investigation into a > slow > > offline compaction but it may affect any running FileStore as well (to be > > verified). > > I don't think it's related to oak-run itself, but more with the way we > map > > files, and so far it looks like a bug (there is no reasonable explanation > > for mapping each tar file twice). > > > > Took a quick look at the TarReader but there are not many changes in this > > area 1.0 vs. 1.4 branches. > > If no one has better ideas, I'll create an oak issue and investigate > this a > > bit further. > > > > thanks, > > alex > > > > > > On Wed, Mar 22, 2017 at 4:28 PM, Ian Boston <i...@tfd.co.uk> wrote: > > > >> Hi, > >> I am looking at Oak-run and I see 2x the mapped memory between 1.0.29 > and > >> 1.4.10. It looks like in 1.0.29 each segment file is mapped into memory > >> once, but in 1.4.10 its mapped into memory 2x. > >> > >> Is this expected ? > >> > >> Its not great for page faults. > >> Best Regards > >> Ian > >> >
Re: Metrics support in Oak
Hi, IIRC (a) is doable and the prefered way of naming metrics. In other systems that use Metrics they typically use the package or class name, sometimes an API classname, in the same way that loggers do. This makes it much easier to process and report on blocks of functionality at the reporting stage. For instance when the metrics are ingested into InfluxDB using Grafana as a front end they can be filtered effectively on the metrics name. Some background (mostly for the wider community) Oaks MetricsRegistry is deployed as a service into Sling with a name the name "oak". Sling has its own MetricsRegistry exposed as a service with the name "sling". The reporting tools aggregate all the MetricsRegistries prefixing them with their service name. Hence the Oak MetricsRegistry metrics will all be prefixed with "oak-" when reported. That means Oak doesn't need to differentiate itself from other metrics, but (a) is a good idea to avoid 100s of metrics all in 1 namespace. MetricsRegistries are designed to scale to 1000s. Anyone using a MetricsRegistry service should bind to a the "sling" registry service or create their own and register it with a unque name as is done here [1]. Thats the runtime instrumentation bundle, service named "woven". +1 to (a) Best Regards Ian 1 https://github.com/ieb/slingmetrics/blob/master/src/main/java/org/apache/sling/metrics/impl/MetricsActivator.java#L79 On 21 March 2017 at 12:53, Michael Dürigwrote: > > Hi, > > AFAICS Oak's Metrics support exposes all Stats in a flat namespace under > "org.apache.jackrabbit.oak". I don't think this is desirable. We should (a) > either come up with a way to expose them by grouping related ones together > or at least (b) arrive at a consensus on how we construct the names of the > individual Stats in an unambiguous and standard way. Currently we have > different approaches in the various component resulting in a confusing list > of items. > > My preference would be (a), but I don't know if this is doable. > > > Michael >
Oak 1.0.29 vs 1.4.10 memory mapping.
Hi, I am looking at Oak-run and I see 2x the mapped memory between 1.0.29 and 1.4.10. It looks like in 1.0.29 each segment file is mapped into memory once, but in 1.4.10 its mapped into memory 2x. Is this expected ? Its not great for page faults. Best Regards Ian
Re: Mongo timeouts
Hi Arek, Have you checked the mongodb logs to determine what is taking too long ? Generally the default timeouts used by Oak on the MongoDB connections are correct to run Oak on MongoMK successfully. The stack trace you shared looks like a slow query in MongoDB rather than a protocol timeout. As a guideline, if you see MongoDB reporting slow queries in the log, then Oak wont run successfully and the back pressure it produces will cause many other things to go wrong with Oak internally. There are 2 types of problem that cause slow MongoDB performance and timeouts. 1 Incorrect or unsuitable MongoDB OS/VM deployment or configuration. (review 10Gen documentation and fix) 2 Missing MongoDB indexes (look for the latest version of Oak which has many fixes relative to earlier versions) 3 IO latency under the VM layer (rare, but has been reported in AWS and other cloud providers) If you want to analyse your MongoDB log files, good tooling can be found here https://github.com/rueckstiess/mtools. HTH Best Regards Ian On 17 March 2017 at 10:37, Arek Kitawrote: > Hi, > > I have a problem with Mongo [0]. It causes timeouts. This might be due > to MongoDB is not present at all (inaccessible) or there are rather a > connectivity issues. > > How I can check and fine tune mongo connection parameters? I've > reviewed Oak documentation [1] and codebase [2,3] but I haven't found > anything related. I thought I could use some JVM system properties. > > I have finally found a way via MongoDB driver [4] but I wanted to > double check if placing such options is a good way for Oak via OSGi > config mongouri property? > > WDYT? Do you have any experience in fine-tuning MongoDB parameters? > > Cheers, > Arek > > [0] https://gist.githubusercontent.com/kitarek/ > 4cc2611fb68d9f167bdf34fa4526072c/raw/bc9d116eb25e803b519e264e89672a > c91303d09f/stacktrace.txt > [1] http://jackrabbit.apache.org/oak/docs/osgi_config.html > [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak- > core/src/main/java/org/apache/jackrabbit/oak/plugins/document/ > DocumentNodeStoreService.java > [3] https://github.com/apache/jackrabbit-oak/blob/trunk/oak- > core/src/main/java/org/apache/jackrabbit/oak/plugins/ > document/DocumentNodeStore.java > [4] https://docs.mongodb.com/manual/reference/connection-string/ >
Hybrid indexing and soft commits.
Hi, IIUC the Hybrid indexing on the master operates in parallel with the master index writer, performing the same task but repeatedly throwing its work away when the master provides an update. IIUC, it effectively performs many soft commits to achieve NRT behaviour. I wonder if there is an opportunity to use the Hybrid indexer on the master instance and every n seconds (or even minutes) perform a hard commit. That hard commit being the output of the master index writer, committed by Oak to the DS. This would avoid doing the work and follows the pattern used by Solr and ES, where an indexing update is written to a WAL, soft committed and periodically hard committed. The WAL comes free as part of Oak so if the soft commits are lost, the index and WAL starts from the last hard commit. To be clear. I am only talking about de-duplicating the effort performed on the master node by the hybrid indexer and the master index writer. I am not talking about anything performed on slave index reader instances which also have a hybrid indexer. Those hybrid indexers will still work as they do now. wdyt? Best Regards Ian
Re: IndexEditorProvider behaviour question.
Hi, Thanks for looking at this, sounds like you are on the case already. if I see anything else I'll let you know. Best Regards Ian On 15 September 2016 at 05:33, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Note that so far LuceneIndexEditor was used only for async indexing > case and hence invoked only on leader node every 5 sec. So performance > aspects here were not that critical. However with recent work on > Hybrid indexes they would be used in critical path and hence such > aspects are important > > On Wed, Sep 14, 2016 at 3:10 PM, Ian Boston <i...@tfd.co.uk> wrote: > > A and B mean that the work of creating the tree and working out the > changes > > in a tree will be duplicated roughly n times, where n is the number of > > index definitions. > > Here note that diff would be performed only once at any level and > IndexUpdate would then pass them to various editors. However > construction of trees can be avoided and I have opened OAK-4806 for > that now. Oak issue has details around why Tree was used also. > > Also with multiple index editors performance does decrease. See > OAK-1273. If we switch to Hybrid Index then this aspects improves a > bit as instead of having 50 different property indexes (with 50 editor > instance for each commit) we can have a single editor with 50 property > definition. This can be seen in benchmark in Hybrid Index (OAk-4412) > by changing the numOfIndexes > > If you see any other area of improvement say around unnecessary object > generation then let us know! > > Chetan Mehrotra >
IndexEditorProvider behaviour question.
Hi, The behaviour of calls to the IndexEditorProvider appears to be suboptimal. Has this area been looked at before? I am working from a complete lack of historical knowledge about the area, so probably don't know the full picture. Based on logging the calls into IndexEditorProvider.getIndexEditor(), and reading the LuceneIndexEditorProvider this is what I have observed. A. Every commit results in 1 call to IndexEditorProvider.getIndexEditor() per index definition. (perhaps 100 in a full system). B. Each IndexEditor then gets called building a tree of IndexEditors which work out changes to update the their index. C. IndexEditors sometimes filter subtrees. based on the index definition, but this seems to the the exception rather than the rule. D. Index Editor Providers produce a subtree based on type (ie a property index definition doesn't generate a IndexEditor for lucene indexes and visa versa). A and B mean that the work of creating the tree and working out the changes in a tree will be duplicated roughly n times, where n is the number of index definitions. (D means its not n*p where p is the number of IndexEditorProviders). I haven't looked at how much C reduces the cost in reality. Has anyone looked at building the tree once, and passing the fully built tree to indexers? Even if the computational effort is not great the number of objects being created and passing through GC seems higher than it needs to be. As I said, I have no historical knowledge so if doing this doesn't improve things and why is recorded just say (ideally with a pointer) so I can read and understand more. Best Regards Ian
Re: CommitHooks as OSGi Components.
On 12 September 2016 at 10:45, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Mon, Sep 12, 2016 at 3:12 PM, Ian Boston <i...@tfd.co.uk> wrote: > > but if the information that connect a sessionID/userID to the > > paths that are modified is available through some other route, I might be > > able to use something else. > > A regular Observer should work for that case. Just register an > instance with service registry and it would be picked up and for non > external event CommitInfo would be present > Perfect, thanks. I should have spotted that. Best Regards Ian > > Chetan Mehrotra >
Re: CommitHooks as OSGi Components.
Hi, On 12 September 2016 at 09:43, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Mon, Sep 12, 2016 at 2:08 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Unfortunately the IndexProvider route doesn't appear give me the > > information I am after (CommitInfo). > > Any details around intended usage? CommitInfo is now exposed via > OAK-4642 to IndexEditorProvider > I would like it to work with older versions of oak pre 1.5.8 or 1.6 The use case is to capture commit info and pump it into a dataset for visualisation along with other activity information. CommitInfo seems to be what I need, but if the information that connect a sessionID/userID to the paths that are modified is available through some other route, I might be able to use something else. Best Regards Ian > > Chetan Mehrotra >
Re: CommitHooks as OSGi Components.
Hi, Thank you for the pointers. Unfortunately the IndexProvider route doesn't appear give me the information I am after (CommitInfo). Since I need this to work in an independent bundle patching the repository manager isn't an option. I am currently looking to see if there are any other services exposed that might give me a route in. Best Regards Ian On 12 September 2016 at 08:38, Michael Dürig <mdue...@apache.org> wrote: > > Hi, > > No it isn't. Commit hooks haven't been designed for this type of > dynamicity and generality. Exposing them at this layer has been considered > way to dangerous and a breach of modularity. > > What has been done in the past for use cases requiring commit hook > functionality on one hand and some part of dynamicity on the other, was to > to specialise the use case. Index editors are one example here. > > Michael > > > On 9.9.16 6:04 , Ian Boston wrote: > >> Hi, >> Is it possible write a CommitHook as an OSGI Component/Service and for Oak >> to pick it up ? >> The Component starts and gets registered as a service, but Oak doesn't >> appear to pick it up. >> If its not possible to add a CommitHook in this way, what is the best way >> of doing it from outside the oak-core bundle ? >> Best Regards >> Ian >> >>
CommitHooks as OSGi Components.
Hi, Is it possible write a CommitHook as an OSGI Component/Service and for Oak to pick it up ? The Component starts and gets registered as a service, but Oak doesn't appear to pick it up. If its not possible to add a CommitHook in this way, what is the best way of doing it from outside the oak-core bundle ? Best Regards Ian
Re: Seekable access to a Binary
Hi, On 6 September 2016 at 11:34, Bertrand Delacretazwrote: > Hi, > > On Tue, Sep 6, 2016 at 9:49 AM, Marcel Reutegger > wrote: > > ...we'd still have to add > > Jackrabbit API to support it. E.g. something like: > > > > valueFactory.createBinary(existingBinary, appendThisInputStream); ... > > And maybe a way to mark the binary as "in progress" to avoid > applications using half-uploaded binaries? > yes, thats also needed where an incremental upload is being performed. AWS and the Google Data API both have the concepts of a session ID when performing incremental uploads to avoid conflicts between multiple clients operating on the same. The current impl in Sling assumes only 1 upload is being performed per resource. If there are 2 a 500 will be issued and the client will probably reset the state breaking the other upload session. @Marcel I'll document the use case on the wiki. Thanks for the pointer. Best Regards Ian > > Maybe just a boolean property convention that application developers > are supposed to take into account, as I don't think JCR Sessions work > in that use case. > > -Bertrand >
Seekable access to a Binary
Hi, Is it possible to write to an Oak Binary via the JCR API at an offset ? I am asking because I am working on the Sling Upload mechanism to make it streamable in an attempt to eliminate the many duplicate IO operations. A whole body upload works, and depending on the DS being used shows good improvements in speed resulting from less IO. The Sling Chunked Upload protocol, documented at [1] generates 3x the IO compared against a streamed upload and more compared to a non streamed upload. IIUC the protocol was implemented in that way as the only way to update a Binary is to re-write it from scratch with a fresh InputStream every time. Is there an alternative, more efficient way to achieve this that would not require the Binary to be read from the DS, updated and written back to the DS ? eg valueFactory.createBinary(inputStream, startingAtByteOffsetLong); or OutputStream binaryOutputStream = node.getOutputStream(); binaryOutputStream.seek(startingAtByteOffsetLong); IOUtils.copy(inputStream, outputStream); node.getSession().save(); The Sling issue being worked on is [2] Best Regards Ian 1 https://cwiki.apache.org/confluence/display/SLING/Chunked+File+Upload+Support 2 https://issues.apache.org/jira/browse/SLING-6027
Re: Oak Indexing. Was Re: Property index replacement / evolution
Hi, On 11 August 2016 at 13:03, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Thu, Aug 11, 2016 at 5:19 PM, Ian Boston <i...@tfd.co.uk> wrote: > > correct. > > Documents are shared by ID so all updates hit the same shard. > > That may result in network traffic if the shard is not local. > > Focusing on ordering part as that is the most critical aspect compared > to other. (BAckup and Restore with sharded index is a separate problem > to discuss but later) > > So even if there is a single master for a given path how would it > order the changes. Given local changes only give partial view of end > state. > In theory, the index should be driven by the eventual consistency of the source repository, eventually reaching the same consistent state, and updating on each state change. That probably means the queue should only contain pointers to Documents and only index the Document as retrieved. I dont know if that can ever work. > > Also in such a setup would each query need to consider multiple shards > for final result or each node would "eventually" sync index changes > from other nodes (complete replication) and query would only use local > index > > For me ensuring consistency in how index updates are sent to ES wrt > Oak view of changes was kind of blocking feature to enable > parallelization of indexing process. It needs to be ensured that for > concurrent commit end result in index is in sync with repository > state. > agreed, me also on various attempts. > > Current single thread async index update avoid all such race condition. > Perhaps this is the "root" of the problem. The only way to index Oak consistently is with a single thread globally, as is done now That's still possible with ES. Run a single thread on the master, that indexes into a co-located ES cluster. If the full text extraction is distributed, then master only needs to resource writing the local shard. Its not as good as parallelising the queue, but given the structure of Oak might be the only way. Even so, future revisions will be in the index long before Oak has synced the root document. The current implementation doesn't have to think about this as the indexing is single threaded globally *and* each segment update committed first by a hard lucene commit and second by a root document sync guaranteeing the sequential update nature. BTW, how does Hybrid manage to parallelise the indexing and maintain consistency ? Best Regards Ian > > Chetan Mehrotra >
Re: Oak Indexing. Was Re: Property index replacement / evolution
On 11 August 2016 at 11:10, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Both Solr Cloud and ES address this by sharding and > > replicating the indexes, so that all commits are soft, instant and real > > time. That introduces problems. > ... > > Both Solr Cloud and ES address this by sharding and > > replicating the indexes, so that all commits are soft, instant and real > > time. > > This would really be useful. However I have couple of aspects to clear > > Index Update Gurantee > > > Lets say if commit succeeds and then we update the index and index > update fails for some reason. Then would that update be missed or > there can be some mechanism to recover. I am not very sure about WAL > here that may be the answer here but still confirming. > For ES (I don't know about how the Solr Cloud WAL behaves) The update be accepted until it's written to the WAL so if something fails before that, then the it's upto how the queue of updates is managed which is client side. If its written to the WAL, whatever happens it will be indexed eventually, provided the WAL is available. Think of the WAL as equivalent to the Oak Journal, IIUC. The WAL is present on all replicas, so provided 1 replica is available on shard, no data is lost. > > In Oak with the way async index update works based on checkpoint its > ensured that index would "eventually" contain the right data and no > update would be lost. if there is a failure in index update then that > would fail and next cycle would start again from same base state > Sound like the same level of guarantee, depending on how the client side is implemented. Typically I didnt bother with a queue between the application and the ES client because the ES client was so fast. > > Order of index update > - > > Lets say I have 2 cluster nodes where same node is being performed > > Original state /a {x:1} > > Cluster Node N1 - /a {x:1, y:2} > Cluster Node N2 - /a {x:1, z:3} > > End State /a {x:1, y:2, z:3} > > At Oak level both the commits would succeed as there is no conflict. > However N1 and N2 would not be seeing each other updates immediately > and that would depend on background read. So in this case how would > index update would look like. > > 1. Would index update for specific paths go to some master which would > order the update > correct. Documents are shared by ID so all updates hit the same shard. That may result in network traffic if the shard is not local. > 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3} > > Here current async index update logic ensures that it sees the > eventually expected order of changes and hence would be consistent > with repository state. > Backup and Restore > --- > > Would the backup now involve backup of ES index files from each > cluster node. Or assuming full replication it would involve backup of > files from any one of the nodes. Would the back be in sync with last > changes done in repository (assuming sudden shutdown where changes got > committed to repository but not yet to any index) > > Here current approach of storing index files as part of MVCC storage > ensures that index state is consistent to some "checkpointed" state in > repository. And post restart it would eventually catch up with the > current repository state and hence would not require complete rebuild > of index in case of unclean shutdowns > If the revision is present in the document, then I assume it can be filtered at query time. However, there may be problems here, as might have to find some way of indexing the revision history of a document like the format in MongoDB... I did wonder if a better solution was to use ES as the primary storage then all the property indexes would be present by default with no need for any Lucene index plugin. but I stopped thinking about that with the 1s root document sync as my interest was real time. Best Regards Ian > > > Chetan Mehrotra >
Re: Oak Indexing. Was Re: Property index replacement / evolution
Hi, There is no need to have several different plugins to deal with the standalone, small scale cluster, large scale cluster deployment. It might be desirable for some reason, but it's not necessary. I have pushed the code I was working before I got distracted it to a GitHub repo. [1] is where the co-located ES cluster starts. If the property es-server-url is defined, an external ES cluster is used. The repo is wip, incomplete and to will see 2 attempts to port the Lucene plugin, take2 is the second. As I said I stopped when it became apparent there was a 1s latency imposed by Oak. I think you enlightened me to that behavior on oak-dev. I don't know how to co-locate a Solr Cloud cluster in the same way given it needs Zookeeper. (I don't know enough about Solr Cloud TBH). I Oak can't stomach using ES as a library, it could with, with enough time and resources, re-implement the pattern or something close. Best Regards Ian 1 https://github.com/ieb/oak-es/blob/master/src/main/java/org/apache/jackrabbit/oak/plusing/index/es/index/ESServer.java#L27 On 11 August 2016 at 09:58, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Couple of points around the motivation, target usecase around Hybrid > Indexing and Oak indexing in general. > > Based on my understanding of various deployments. Any application > based on Oak has 2 type of query requirements > > QR1. Application Query - These mostly involve some property > restrictions and are invoked by code itself to perform some operation. > The property involved here in most cases would be sparse i.e. present > in small subset of whole repository content. Such queries need to be > very fast and they might be invoked very frequently. Such queries > should also be more accurate and result should not lag repository > state much. > > QR2. User provided query - These queries would consist of both or > either of property restriction and fulltext constraints. The target > nodes may form majority part of overall repository content. Such > queries need to be fast but given user driven need not be very fast. > > Note that speed criteria is very subjective and relative here. > > Further Oak needs to support deployments > > 1. On single setup - For dev, prod on SegmentNodeStore > 2. Cluster Setup on premise > 3. Deployment in some DataCenter > > So Oak should enable deployments where for smaller setups it does not > require any thirdparty system while still allow plugging in a dedicate > system like ES/Solr if need arises. So both usecases need to be > supported. > > And further even if it has access to such third party server it might > be fine to rely on embedded Lucene for #QR1 and just delegate queries > under #QR2 to remote. This would ensure that query results are still > fast for usage falling under #QR1. > > Hybrid Index Usecase > - > > So far for #QR1 we only had property indexes and to an extent Lucene > based property index where results lag repository state and lag might > be significant depending on load. > > Hybrid index aim to support queries under #QR1 and can be seen as > replacement for existing non unique property indexes. Such indexes > would have lower storage requirement and would not put much load on > remote storage for execution. Its not meant as a replacement for > ES/Solr but then intends to address different type of usage > > Very large Indexes > - > > For deployments having very large repository Solr or ES based indexes > would be preferable and there oak-solr can be used (some day oak-es!) > > So in brief Oak should be self sufficient for smaller deployment and > still allow plugging in Solr/ES for large deployment and there also > provide a choice to admin to configure a sub set of index for such > usage depending on the size. > > > > > > > Chetan Mehrotra > > > On Thu, Aug 11, 2016 at 1:59 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Hi, > > > > On 11 August 2016 at 09:14, Michael Marth <mma...@adobe.com> wrote: > > > >> Hi Ian, > >> > >> No worries - good discussion. > >> > >> I should point out though that my reply to Davide was based on a > >> comparison of the current design vs the Jackrabbit 2 design (in which > >> indexes were stored locally). Maybe I misunderstood Davide’s comment. > >> > >> I will split my answer to your mail in 2 parts: > >> > >> > >> > > >> >Full text extraction should be separated from indexing, as the DS blobs > >> are > >> >immutable, so is the full text. There is code to do this in the Oak > >> >indexer, but it's not used to write to the DS
Re: Property index replacement / evolution
Hi, On 8 August 2016 at 15:39, Vikas Saurabh <vikas.saur...@gmail.com> wrote: > Hi Ian, > > On Mon, Aug 8, 2016 at 3:41 PM, Ian Boston <i...@tfd.co.uk> wrote: > > > > If every successful commit writes the root node, due to every update > > updating a sync prop index, this leaves me wondering how the delayed sync > > reduces the writes to the root node ? > > > > I thought the justification of the 1s sync operation was to reduce the > > writes to the root node to n/s where n is the number of instances in the > > cluster, however based on what you are telling me the rate is (m+n)/s > where > > m is the total commits per second of the whole cluster. I understand that > > the update to test for a conflicted commit may not be the same as the > > update of _lastRevs, but in MongoDB both update the MongoDB document. > > > > I'm not sure of the exact numbers around how MongoDB would perform for > lots of edits to the same document. There's a bit of difference > between _lastRev write and commit-root conditional update - > commit-root update is a change on a sub-document... so, something like > 'set "_revision.rX"="c" on _id=0:/ iff "_conflict.rX"' doesn't exist. > While last rev updates change the same key across commits from the > same cluster node - something like 'set "_lastRevs.r0-0-X"="rY-0-X" '. > I think the idea is to avoid any conflict on MongoDB's update > statements. I'm not sure if such edits (edits to same doc but at a > different sub-doc/key) degrade performance badly. > You are correct, that a conditional update won't cost as much as a non conditional update, if no write is performed. And if no write is performed neither is replication, so cost is low, however, AFAIK, a MongoDB document is a single document stored against a single _id key. _conflict.rX and _lastRevs are all part of the same BSON object. So every write, even conditionally to a sub document, will make the root document hot, and since MongoDB shards on _id, that makes 1 MongoDB shard hot. Every Oak commit will result in an update op to the MongoDB primary holding the root document. This isnt specific to MongoMK, it probably impacts all DocumentMK implementations. OAK-4638 and OAK-4412 will need to eliminate all sync property indexes to change this behaviour. (item 3 in the start of the thread) Alternatively, move the indexes so that a sync property index update doesn't perform a conditional change to the global root document ? ( A new thread would be required to discuss this if worth talking about.) > Thanks, > Vikas > PS: I wonder if we should open a different thread as it seems to be > digressing from the subject :) > I'll try not to digress. Best Regards Ian
Re: Property index replacement / evolution
Hi Vikas, On 8 August 2016 at 14:13, Vikas Saurabh <vikas.saur...@gmail.com> wrote: > Hi Ian, > > On Sun, Aug 7, 2016 at 10:01 AM, Ian Boston <i...@tfd.co.uk> wrote: > > Also, IIRC, the root document is not persisted on every commit, but > > synchronized periodically (once every second) similar to fsync on a disk. > > So the indexes (in fact all Oak Documents) are synchronous on the local > Oak > > instance and are synchronous on remote Oak instances but with a minimum > > data latency of the root document sync rate (1s). IIUC the 1 second sync > > period is a performance optimisation as the root document must be updated > > by every commit and hence is a global singleton in an Oak cluster, and > > already hot as you point out in 3. > > > > Just to clarify a bit. There are potentially 2 updates that can modify > root document. > With every commit, oak (document mk) defines a document to be > commit-root. That's root of the sub-tree which changes. A commit is > successful if commit-root could be conditionally updated (condition to > see if the commit conflicted with something else or not). With > synchronous prop indices, commit root usually is at root - so each > successful commit would write to root. That's what Michael was > pointing to in point3. > The other update is about asynchronous update of _lastRevs - _lastRevs > control visibility horizon. For local nodes, a pending list of updates > is kept in memory so local sessions/builders get to see committed > changes. These are pushed to persistence mongo during background > update which defaults at 1 s interval. So, other cluster nodes don't > see changes immediately. > Thanks for the explanation. I learnt something more. If every successful commit writes the root node, due to every update updating a sync prop index, this leaves me wondering how the delayed sync reduces the writes to the root node ? I thought the justification of the 1s sync operation was to reduce the writes to the root node to n/s where n is the number of instances in the cluster, however based on what you are telling me the rate is (m+n)/s where m is the total commits per second of the whole cluster. I understand that the update to test for a conflicted commit may not be the same as the update of _lastRevs, but in MongoDB both update the MongoDB document. Best Regards Ian > > Thanks, > Vikas >
Re: Property index replacement / evolution
Hi, For TarMK, none of this is an issue as TarMK is all in memory on 1 JVM with local disk. Scaling up by throwing RAM and IO at the problem is a viable option, as far as it's safe/sensible to do so. But TarMK doesn't cluster, and if it did cluster, this would probably be an issue. I think, but could easily be wrong, that in the case of MongoDB all modifications to indexes generated by a commit are persisted in a single batch request taking (ie 1 mongodb statement). The time take to process that request is dependent on the size of the request. Large requests can take seconds on large databases. Its not the distance between Oak and the database that matters, as only 1 mongodb statement us used, its the processing time of that statement in MongoDB that matters. With MongoDB setup correctly to not loose data, this statement must be written to a majority of replicas before processing can continue. MongoDB replication is sequential. Also, IIRC, the root document is not persisted on every commit, but synchronized periodically (once every second) similar to fsync on a disk. So the indexes (in fact all Oak Documents) are synchronous on the local Oak instance and are synchronous on remote Oak instances but with a minimum data latency of the root document sync rate (1s). IIUC the 1 second sync period is a performance optimisation as the root document must be updated by every commit and hence is a global singleton in an Oak cluster, and already hot as you point out in 3. I have been involved on the periphery of OAK-4638 and OAK-4412. For me, the main benefit is reducing the number of documents stored in the database. While it is true that the number of documents stored in the database doesn't matter for small numbers, with every document being counted inside Oak, and every document having an impact database performance, having around 66% of the documents not contributing to repository content storage reduces the ultimate capacity limit of an Oak repository by the same amount. 2/3rds. With many applications being built on top of Oak exploiting the deep content structure that Oak encourages and makes so easy, this limit rapidly becomes a reality. What limit ? A limit at which one of the components ceases to work. I don't know which one and when but it's there. A repository containing 100M content items may need 1E10 documents due to both the application implementation and synchronous indexing. Perhaps the application should fix itself, but so should Oak. Quite apart from all that, is embarrassingly wasteful to be using Oak documents in this way for non TarMK repos, rather like implementing Lucene in SQL. To recap. Addressing 1 and 2 are a requirement to reduce waste, increase performance of the update operations and increase data scalability. 3 is not an issue, the pressure is already there without any indexes. Every write has to update the root document for that update to become visible, by design. I am not a core Oak developer, just an observer, so if I got anything wrong, please someone correct me and I will learn from the experience. Best Regards Ian On 5 August 2016 at 18:04, Michael Marthwrote: > Hi, > > I have noticed OAK-4638 and OAK-4412 – which both deal with particular > problematic aspects of property indexes. I realise that both issues deal > with slightly different problems and hence come to different suggested > solutions. > But still I felt it would be good to take a holistic view on the different > problems with property indexes. Maybe there is a unified approach we can > take. > > To my knowledge there are 3 areas where property indexes are problematic > or not ideal: > > 1. Number of nodes: Property indexes can create a large number of nodes. > For properties that are very common the number of index nodes can be almost > as large as the number of the content nodes. A large number of nodes is not > necessarily a problem in itself, but if the underlying persistence is e.g. > MongoDB then those index nodes (i.e. MongoDB documents) cause pressure on > MongoDB’s mmap architecture which in turn affects reading content nodes. > > 2. Write performance: when the persistence (i.e. MongoDB) and Oak are “far > away from each other” (i.e. high network latency or low throughput) then > synchronous property indexes affect the write throughput as they may cause > the payload to double in size. > > 3. I have no data on this one – but think it might be a topic: property > index updates usually cause commits to have / as the commit root. This > results on pressure on the root document. > > Please correct me if I got anything wrong or inaccurate in the above. > > My point is, however, that at the very least we should have clarity which > one go the items above we intend to tackle with Oak improvements. Ideally > we would have a unified approach. > (I realize that property indexes come in various flavours like unique > index or not, which makes the discussion more complex) > > my2c >
Re: Does Oak core check the repository version ?
Hi, On 4 July 2016 at 09:53, Marcel Reutegger <mreut...@adobe.com> wrote: > Hi, > > On 30/06/16 13:10, "ianbos...@gmail.com on behalf of Ian Boston" wrote: > >I have heard reports of a case of the wrong version of oak-run causing > >problems in a repository. I dont have the details, but it sounds like the > >core not starting on a repo unless it was in a safe known range might be a > >usefull safety check. Could be good to enable it to force starting via a > >system property. > > > >Should I open an OAK issue to track this requirement ? > > Yes, please open an issue. The DocumentNodeStore does not > have such a check right now. > https://issues.apache.org/jira/browse/OAK-4529 Best Regards Ian > > Regards > Marcel > >
Does Oak core check the repository version ?
Hi, Does Oak core check the persisted repository version to make certain it fits in a range that is compatible with the code being run ? If it doesn't already, I think it should to avoid something like the wrong version of oak-run being used potentially damaging the repository. Best Regards Ian
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Hi, On 11 May 2016 at 14:21, Marius Petriawrote: > Hi, > > I would add another use case in the same area, even if it is more > problematic from the point of view of security. To better support load > spikes an application could return 302 redirects to (signed) S3 urls such > that binaries are fetched directly from S3. > Perhaps that question exposes the underlying requirement for some downstream users. This is a question, not a statement: If the application using Oak exposed a RESTfull API that had all the same functionality as [1], and was able to perform at the scale of S3, and had the same security semantics as Oak, would applications that are needing direct access to S3 or a File based datastore be able to use that API in preference ? Is this really about issues with scalability and performance rather than a fundamental need to drill deep into the internals of Oak ? If so, shouldn't the scalability and performance be fixed ? (assuming its a real concern) > > (if this can already be done or you think is not really related to the > other two please disregard). > AFAIK this is not possible at the moment. If it was deployments could use nginX X-SendFile and other request offloading mechanisms. Best Regards Ian 1 http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html > > Marius > > > > On 5/11/16, 1:41 PM, "Angela Schreiber" wrote: > > >Hi Chetan > > > >IMHO your original mail didn't write down the fundamental analysis > >but instead presented the solution for every the 2 case I was > >lacking the information _why_ this is needed. > > > >Both have been answered in private conversions only (1 today in > >the oak call and 2 in a private discussion with tom). And > >having heard didn't make me more confident that the solution > >you propose is the right thing to do. > > > >Kind regards > >Angela > > > >On 11/05/16 12:17, "Chetan Mehrotra" wrote: > > > >>Hi Angela, > >> > >>On Tue, May 10, 2016 at 9:49 PM, Angela Schreiber > >>wrote: > >> > >>> Quite frankly I would very much appreciate if took the time to collect > >>> and write down the required (i.e. currently known and expected) > >>> functionality. > >>> > >>> Then look at the requirements and look what is wrong with the current > >>> API that we can't meet those requirements: > >>> - is it just missing API extensions that can be added with moderate > >>>effort? > >>> - are there fundamental problems with the current API that we needed to > >>> address? > >>> - maybe we even have intrinsic issues with the way we think about the > >>>role > >>> of the repo? > >>> > >>> IMHO, sticking to kludges might look promising on a short term but > >>> I am convinced that we are better off with a fundamental analysis of > >>> the problems... after all the Binary topic comes up on a regular basis. > >>> That leaves me with the impression that yet another tiny extra and > >>> adaptables won't really address the core issues. > >>> > >> > >>Makes sense. > >> > >>Have a look in of the initial mail in the thread at [1] which talks about > >>the 2 usecase I know of. The image rendition usecase manifest itself in > >>one > >>form or other, basically providing access to Native programs via file > path > >>reference. > >> > >>The approach proposed so far would be able to address them and hence > >>closer > >>to "is it just missing API extensions that can be added with moderate > >>effort?". If there are any other approach we can address both of the > >>referred usecases then we implement them. > >> > >>Let me know if more details are required. If required I can put it up on > a > >>wiki page also. > >> > >>Chetan Mehrotra > >>[1] > >> > http://markmail.org/thread/6mq4je75p64c5nyn#query:+page:1+mid:zv5dzsgmoegu > >>pd7l+state:results > > >
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Hi Angela, On 10 May 2016 at 17:19, Angela Schreiberwrote: > Hi Ian > > >Fair enough, provided there is a solution that addresses the issue Chetan > >is trying to address. > > That's what we are all looking for :) > > >The alternative, for some applications, seems to store the binary data > >outside Oak, which defeats the purpose completely. > > You mean with the current setup, right? > yes. > > That might well be... while I haven't been involved with a concrete > case I wouldn't categorically reject that this might in same cases > even be the right solution. > But maybe I am biased due to the fact that we also have a big > community that effectively stores and manages their user/group > accounts outside the repository and where I am seeing plenty of > trouble with the conception that those accounts _must_ be synced > (i.e. copied) into the repo. > > So, I'd definitely like to understand why you think that this > "completely defeats the purpose". I agree that it's not always > desirable but nevertheless there might be valid use-cases. > If the purpose of Oak is to provide a content repository to store metadata and assets, then if the application built on top of Oak, in order to achieve its scalability targets has to store its asset data (blobs) outside Oak, that defeats the purpose of supporting the storage of assets within Oak. Oak should support the storage of assets within Oak supporting the scalability requirements of the application. Since they are non trivial and hard to quantify, that means horizontal scalability limited only by available budget to purchase VM's or hardware. You can argue that horizontal scalability is not really required. I can share use cases, not exactly the same ones Chetan is working on where it is. Sorry I can't share them on list. > > >I don't have a perfect handle on the issue he is trying to address or what > >would be an acceptable solution, but I suspect the only solution that is > >not vulnerable by design will a solution that abstracts all the required > >functionality behind an Oak API (ie no S3Object, File object or anything > >that could leak) and then provide all the required functionality with an > >acceptable level of performance in the implementation. That is doable, but > >a lot more work. > > Not sure about that :-) > Quite frankly I would very much appreciate if took the time to collect > and write down the required (i.e. currently known and expected) > functionality. > In the context of what I said above, for AWS deployment that means wrapping [1] so nothing can leak and supporting almost everything expressed by [2] via an Oak API/jar in a way that enables horizontal scalability. > > Then look at the requirements and look what is wrong with the current > API that we can't meet those requirements: > - is it just missing API extensions that can be added with moderate effort? > - are there fundamental problems with the current API that we needed to > address? > - maybe we even have intrinsic issues with the way we think about the role > of the repo? > > IMHO, sticking to kludges might look promising on a short term but > I am convinced that we are better off with a fundamental analysis of > the problems... after all the Binary topic comes up on a regular basis. > That leaves me with the impression that yet another tiny extra and > adaptables won't really address the core issues. > I agree. It comes up time and again because the applications are being asked to do something Oak does not currently support, so developers look for a work arround. It should be done properly, once and for all. imvho, that is a lot of work upfront, but since I am not the one doing the work its not right for me to estimate or suggest anyone do it. Best Regards Ian 1 http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3Object.html 2 http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html > Kind regards > Angela > > > > > > > > >Best Regards > >Ian > > > > > >> > >> Kind regards > >> Angela > >> > >> > > >> >Best Regards > >> >Ian > >> > > >> > > >> >On 3 May 2016 at 15:36, Chetan Mehrotra > >> wrote: > >> > > >> >> Hi Team, > >> >> > >> >> For OAK-1963 we need to allow access to actaul Blob location say in > >>form > >> >> File instance or S3 object id etc. This access is need to perform > >> >>optimized > >> >> IO operation around binary object e.g. > >> >> > >> >> 1. The File object can be used to spool the file content with zero > >>copy > >> >> using NIO by accessing the File Channel directly [1] > >> >> > >> >> 2. Client code can efficiently replicate a binary stored in S3 by > >>having > >> >> direct access to S3 object using copy operation > >> >> > >> >> To allow such access we would need a new API in the form of > >> >> AdaptableBinary. > >> >> > >> >> API > >> >> === > >> >> > >> >> public interface AdaptableBinary { > >> >> > >> >> /** > >> >> * Adapts the
Re: API proposal for - Expose URL for Blob source (OAK-1963)
On 10 May 2016 at 15:02, Angela Schreiber <anch...@adobe.com> wrote: > Hi Ian > > On 04/05/16 18:37, "Ian Boston" <i...@tfd.co.uk> wrote: > >[...] The locations will certainly probably leak > >outside the context of an Oak session so the API contract should make it > >clear that the code using a direct location needs to behave responsibly. > > See my reply to Chetan, who was referring to > SlingRepository.loginAdministrative > which always had a pretty clear API contract wrt responsible usage. > > As a matter of fact (and I guess you are aware of this) it turned into a > total nightmare with developers using it just everywhere, ignoring not > only > the API contract but also all concerns raised for years. This can even > been seen in Apache Sling code base itself. > So, I am quite pessimistic about responsible usage and API contract > and definitely prefer an API implementation that effectively enforces > the contract. > > Vulnerable by design is IMHO a bad guideline for introducing new APIs. > From my experiences they backfire usually sooner than later and need > to be abandoned again... so, I'd rather aim for a properly secured > solution right from the beginning. > Fair enough, provided there is a solution that addresses the issue Chetan is trying to address. The alternative, for some applications, seems to store the binary data outside Oak, which defeats the purpose completely. I don't have a perfect handle on the issue he is trying to address or what would be an acceptable solution, but I suspect the only solution that is not vulnerable by design will a solution that abstracts all the required functionality behind an Oak API (ie no S3Object, File object or anything that could leak) and then provide all the required functionality with an acceptable level of performance in the implementation. That is doable, but a lot more work. Best Regards Ian > > Kind regards > Angela > > > > >Best Regards > >Ian > > > > > >On 3 May 2016 at 15:36, Chetan Mehrotra <chetan.mehro...@gmail.com> > wrote: > > > >> Hi Team, > >> > >> For OAK-1963 we need to allow access to actaul Blob location say in form > >> File instance or S3 object id etc. This access is need to perform > >>optimized > >> IO operation around binary object e.g. > >> > >> 1. The File object can be used to spool the file content with zero copy > >> using NIO by accessing the File Channel directly [1] > >> > >> 2. Client code can efficiently replicate a binary stored in S3 by having > >> direct access to S3 object using copy operation > >> > >> To allow such access we would need a new API in the form of > >> AdaptableBinary. > >> > >> API > >> === > >> > >> public interface AdaptableBinary { > >> > >> /** > >> * Adapts the binary to another type like File, URL etc > >> * > >> * @param The generic type to which this binary is > >> adapted > >> *to > >> * @param type The Class object of the target type, such as > >> *File.class > >> * @return The adapter target or null if the binary > >>cannot > >> * adapt to the requested type > >> */ > >> AdapterType adaptTo(Class type); > >> } > >> > >> Usage > >> = > >> > >> Binary binProp = node.getProperty("jcr:data").getBinary(); > >> > >> //Check if Binary is of type AdaptableBinary > >> if (binProp instanceof AdaptableBinary){ > >> AdaptableBinary adaptableBinary = (AdaptableBinary) binProp; > >> > >> //Adapt it to File instance > >> File file = adaptableBinary.adaptTo(File.class); > >> } > >> > >> > >> > >> The Binary instance returned by Oak > >> i.e. org.apache.jackrabbit.oak.plugins.value.BinaryImpl would then > >> implement this interface and calling code can then check the type and > >>cast > >> it and then adapt it > >> > >> Key Points > >> > >> > >> 1. Depending on backing BlobStore the binary can be adapted to various > >> types. For FileDataStore it can be adapted to File. For S3DataStore it > >>can > >> either be adapted to URL or some S3DataStore specific type. > >> > >> 2. Security - Thomas suggested that for better security the ability to > >> adapt should be restricted based on session permissions. So if the user > >>has > >> required permission then only adaptation would work otherwise null > >>would be > >> returned. > >> > >> 3. Adaptation proposal is based on Sling Adaptable [2] > >> > >> 4. This API is for now exposed only at JCR level. Not sure should we do > >>it > >> at Oak level as Blob instance are currently not bound to any session. So > >> proposal is to place this in 'org.apache.jackrabbit.oak.api' package > >> > >> Kindly provide your feedback! Also any suggestion/guidance around how > >>the > >> access control be implemented > >> > >> Chetan Mehrotra > >> [1] http://www.ibm.com/developerworks/library/j-zerocopy/ > >> [2] > >> > >> > >> > https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/Adap > >>table.html > >> > >
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Hi, By processing independently I meant async, outside the callback, eg inside a Mesos+Frenzo cluster [1], processors not running Oak. Best Regards Ian 1 http://techblog.netflix.com/2015/08/fenzo-oss-scheduler-for-apache-mesos.html On 10 May 2016 at 06:02, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Mon, May 9, 2016 at 8:27 PM, Ian Boston <i...@tfd.co.uk> wrote: > > > I thought the consumers of this api want things like the absolute path of > > the File in the BlobStore, or the bucket and key of the S3 Object, so > that > > they could transmit it and use it for processing independently of Oak > > outside the callback ? > > > > Most cases can still be done, just do it within the callback > > blobStore.process("xxx", new BlobProcessor(){ > void process(AdaptableBlob blob){ > File file = blob.adaptTo(File.class); > transformImage(file); > } > }); > > Doing this within callback would allow Oak to enforce some safeguards (more > on that in next mail) and still allows the user to perform optimal binary > processing > > Chetan Mehrotra >
Re: API proposal for - Expose URL for Blob source (OAK-1963)
and provide the required details. > >> This way we "outsource" the problem. Would that be acceptable? > >> > > > > That's actually to acceptable solutions. Any custom code is outside of > > Oak's liability. However, I'd prefer an approach where we come up with a > > blob store implementation that supports what ever the use case is here. > But > > without leaking internals. > > > > Michael > > > > > > > > > >> Chetan Mehrotra > >> > >> On Mon, May 9, 2016 at 2:28 PM, Michael Dürig <mdue...@apache.org> > wrote: > >> > >> > >>> Hi, > >>> > >>> I very much share Francesco's concerns here. Unconditionally exposing > >>> access to operation system resources underlying Oak's inner working is > >>> troublesome for various reasons: > >>> > >>> - who owns the resource? Who coordinates (concurrent) access to it and > >>> how? What are the correctness and performance implications here (races, > >>> deadlock, corruptions, JCR semantics)? > >>> > >>> - it limits implementation freedom and hinders further evolution > >>> (chunking, de-duplication, content based addressing, compression, gc, > >>> etc.) > >>> for data stores. > >>> > >>> - bypassing JCR's security model > >>> > >>> Pretty much all of this has been discussed in the scope of > >>> https://issues.apache.org/jira/browse/JCR-3534 and > >>> https://issues.apache.org/jira/browse/OAK-834. So I suggest to review > >>> those discussions before we jump to conclusion. > >>> > >>> > >>> Also what is the use case requiring such a vast API surface? Can't we > >>> come > >>> up with an API that allows the blobs to stay under control of Oak? If > >>> not, > >>> this is probably an indication that those blobs shouldn't go into Oak > but > >>> just references to it as Francesco already proposed. Anything else is > >>> whether fish nor fowl: you can't have the JCR goodies but at the same > >>> time > >>> access underlying resources at will. > >>> > >>> Michael > >>> > >>> > >>> > >>> > >>> On 5.5.16 11:00 , Francesco Mari wrote: > >>> > >>> This proposal introduces a huge leak of abstractions and has deep > >>>> security > >>>> implications. > >>>> > >>>> I guess that the reason for this proposal is that some users of Oak > >>>> would > >>>> like to perform some operations on binaries in a more performant way > by > >>>> leveraging the way those binaries are stored. If this is the case, I > >>>> suggest those users to evaluate an applicative solution implemented on > >>>> top > >>>> of the JCR API. > >>>> > >>>> If a user needs to store some important binary data (files, images, > >>>> etc.) > >>>> in an S3 bucket or on the file system for performance reasons, this > >>>> shouldn't affect how Oak handles blobs internally. If some assets are > of > >>>> special interest for the user, then the user should bypass Oak and > take > >>>> care of the storage of those assets directly. Oak can be used to store > >>>> *references* to those assets, that can be used in user code to > >>>> manipulate > >>>> the assets in his own business logic. > >>>> > >>>> If the scenario I outlined is not what inspired this proposal, I would > >>>> like > >>>> to know more about the reasons why this proposal was brought up. Which > >>>> problems are we going to solve with this API? Is there a more concrete > >>>> use > >>>> case that we can use as a driving example? > >>>> > >>>> 2016-05-05 10:06 GMT+02:00 Davide Giannella <dav...@apache.org>: > >>>> > >>>> On 04/05/2016 17:37, Ian Boston wrote: > >>>> > >>>>> > >>>>> Hi, > >>>>>> If the File or URL is writable, will writing to the location cause > >>>>>> issues > >>>>>> for Oak ? > >>>>>> IIRC some Oak DS implementations use a digest of the content to > >>>>>> determine > >>>>>> the location in the DS, so changing the content via Oak will change > >>>>>> the > >>>>>> location, but changing the content via the File or URL wont. If I > >>>>>> didn't > >>>>>> remember correctly, then ignore the concern. Fully supportive of > the > >>>>>> approach, as a consumer of Oak. The locations will certainly > probably > >>>>>> > >>>>>> leak > >>>>> > >>>>> outside the context of an Oak session so the API contract should make > >>>>>> it > >>>>>> clear that the code using a direct location needs to behave > >>>>>> responsibly. > >>>>>> > >>>>>> > >>>>>> It's a reasonable concern and I'm not in the details of the > >>>>> implementation. It's worth to keep in mind though and remember if we > >>>>> want to adapt to URL or File that maybe we'll have to come up with > some > >>>>> sort of read-only version of such. > >>>>> > >>>>> For the File class, IIRC, we could force/use the setReadOnly(), > >>>>> setWritable() methods. I remember those to be quite expensive in time > >>>>> though. > >>>>> > >>>>> Davide > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >> >
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Hi, If the File or URL is writable, will writing to the location cause issues for Oak ? IIRC some Oak DS implementations use a digest of the content to determine the location in the DS, so changing the content via Oak will change the location, but changing the content via the File or URL wont. If I didn't remember correctly, then ignore the concern. Fully supportive of the approach, as a consumer of Oak. The locations will certainly probably leak outside the context of an Oak session so the API contract should make it clear that the code using a direct location needs to behave responsibly. Best Regards Ian On 3 May 2016 at 15:36, Chetan Mehrotrawrote: > Hi Team, > > For OAK-1963 we need to allow access to actaul Blob location say in form > File instance or S3 object id etc. This access is need to perform optimized > IO operation around binary object e.g. > > 1. The File object can be used to spool the file content with zero copy > using NIO by accessing the File Channel directly [1] > > 2. Client code can efficiently replicate a binary stored in S3 by having > direct access to S3 object using copy operation > > To allow such access we would need a new API in the form of > AdaptableBinary. > > API > === > > public interface AdaptableBinary { > > /** > * Adapts the binary to another type like File, URL etc > * > * @param The generic type to which this binary is > adapted > *to > * @param type The Class object of the target type, such as > *File.class > * @return The adapter target or null if the binary cannot > * adapt to the requested type > */ > AdapterType adaptTo(Class type); > } > > Usage > = > > Binary binProp = node.getProperty("jcr:data").getBinary(); > > //Check if Binary is of type AdaptableBinary > if (binProp instanceof AdaptableBinary){ > AdaptableBinary adaptableBinary = (AdaptableBinary) binProp; > > //Adapt it to File instance > File file = adaptableBinary.adaptTo(File.class); > } > > > > The Binary instance returned by Oak > i.e. org.apache.jackrabbit.oak.plugins.value.BinaryImpl would then > implement this interface and calling code can then check the type and cast > it and then adapt it > > Key Points > > > 1. Depending on backing BlobStore the binary can be adapted to various > types. For FileDataStore it can be adapted to File. For S3DataStore it can > either be adapted to URL or some S3DataStore specific type. > > 2. Security - Thomas suggested that for better security the ability to > adapt should be restricted based on session permissions. So if the user has > required permission then only adaptation would work otherwise null would be > returned. > > 3. Adaptation proposal is based on Sling Adaptable [2] > > 4. This API is for now exposed only at JCR level. Not sure should we do it > at Oak level as Blob instance are currently not bound to any session. So > proposal is to place this in 'org.apache.jackrabbit.oak.api' package > > Kindly provide your feedback! Also any suggestion/guidance around how the > access control be implemented > > Chetan Mehrotra > [1] http://www.ibm.com/developerworks/library/j-zerocopy/ > [2] > > https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/Adaptable.html >
Re: DocumentStore question.
Hi, Thank you for the detailed explanation. I can now see how this works with a consistent root document as the slow node effectively waits till its time is ahead of the last root commit and it is clear to commit. This ensures that all commits are sequential based on the revision timestamp. Presumably, having a cluster node running behind real time will result in lower throughput, making it critical to run NTP on all cluster nodes to eliminate as much clock drift as possible ? Also, does the current revision model behave with an eventually consistent storage mechanism, or does Oak require that the underlying storage is immediately consistent in nature ? Best Regards Ian On 16 February 2016 at 10:36, Marcel Reutegger <mreut...@adobe.com> wrote: > Hi, > > On 16/02/16 09:56, "ianbos...@gmail.com<mailto:ianbos...@gmail.com> on > behalf of Ian Boston" wrote: > So, IIUC, (based on Revision.compareTo(Revision) used by > StableRevisionComparitor. > > yes. > > If one instance within a cluster has a clock that is lagging the others, > and all instances are making changes at the same time, then the changes > that the other instances make will be used, even the the lagging instance > makes changes after (in real synchronised time) the others ? > > no, either cluster node has equal chances of getting its > change in, but the other cluster node's change will be rejected. > > Let's assume we have two cluster nodes A and B and cluster node > A's clock is lagging 5 seconds. Now both cluster nodes try to > to set a property P on document D. One of the cluster nodes will be > first to update document D. No matter, which cluster node is first, > the second cluster node will see the previous change when it attempts > the commit and will consider the change as not yet visible and > in conflict with its own changes. The change of the second cluster > node will therefore be rolled back. > > The behaviour of the cluster nodes will be different when external > changes are pulled in from external cluster nodes. The background > read operation of the DocumentNodeStore reads the most recent > root document and compare the _lastRev entries of the other cluster > nodes with its own clock (the _lastRev entries are the most recent > commits visible to other cluster nodes). Here we have two cases: > > a) Cluster node A was successful to commit its change on P > > Cluster node A wrote a _lastRev on the root document for this > change: r75-0-a. Cluster node B picks up that change and compares > the revision with its own clock, which corresponds to r80-0-b > (for readability, assuming for now the timestamp is a decimal > and in seconds instead of milliseconds). Cluster node B will > consider r75-0-a as visible from now on, because the timestamp > of r80-0-b is newer than r75-0-a. From this point on Cluster > node B can overwrite P again because it is able to see the most > recent value set by A with r75-0-a. > > b) Cluster node B was successful to commit its change on P > > Cluster node B wrote a _lastRev on the root document for this > change: r80-0-b. Cluster node A picks up that change and compares > the revision with its own clock, which corresponds to r75-0-a. > Cluster node A will still not consider r80-0-b as visible, > because its own clock is considered behind. It will wait until > its clock is passed r80-0-a. This makes a new change by A > overwriting B's previous value of P, will have a newer timestamp > than the previously made visible change of B. > > This means: > > 1) all changes considered visible can be compared with the > StableRevisionComparator without the need to take clock > differences into account. > > 2) a change will conflict if it is not the most recent > revision (using StableRevisionComparator) or the other > change is not yet visible but already committed. > > > I can see that this won't matter for the majority of nodes, as collisions > are rare, but won't the lagging instance be always overridden in the root > document _revisions list ? > > Depending on usage, collisions are actually not that rare ;) > > The _revisions map on the root document contains just > the commit entry. A cluster node cannot overwrite the > entry of another cluster node, because they use unique > revisions for commits. Each cluster node generates revisions > with a unique clusterId suffix. > > Are there any plans to maintain a clock difference vector for the cluster ? > > Oak 1.0.x and 1.2.x still have something like this. See > RevisionComparator. However, it only maintains the clock > differences for the past 60 minutes. > > Oak 1.4 introduced a RevisionVector, which is inspired by > version vectors [0]. > > Regards > Marcel > > [0] > https://issues.apache.org/jira/browse/OAK-3646?focusedCommentId=15028698=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15028698 >
Re: DocumentStore question.
On 15 February 2016 at 14:49, Marcel Reutegger <mreut...@adobe.com> wrote: > Hi, > > On 12/02/16 17:11, "ianbos...@gmail.com on behalf of Ian Boston" wrote: > >Is there an assumption that the revisions listed in _revisions are > >ordered ? > > There is no requirement that entries in _revisions map > are ordered at the storage layer, but the DocumentStore > will order them when it reads the entries. The entries > are sorted according to the timestamp of the revision, > then revision counter and finally clusterId. > > >If not, then how is the order of the revisions be determined, given that > >the clocks on each node in a cluster will have different offsets ? > > Oak 1.0.x and 1.2.x maintain a revision table (in RevisionComparator) > for each cluster node, which allows it to compare revision across > cluster nodes even when there are clock differences. At least for > the 60 minutes timeframe covered by the RevisionComparator. > > Oak 1.4 uses revision vectors and does not maintain a revision > table anymore. See OAK-3646. At the same time it also simplifies > how revisions are compared and how changes are pulled in from > other cluster nodes. The background read operation ensures that > external changes made visible all have a lower revision timestamp > than the local clock. This ensure that all local changes from that > point on will have a higher revision timestamp than externally > visible changes. This part was also backported to 1.0 and 1.2. > See OAK-3388. > So, IIUC, (based on Revision.compareTo(Revision) used by StableRevisionComparitor. If one instance within a cluster has a clock that is lagging the others, and all instances are making changes at the same time, then the changes that the other instances make will be used, even the the lagging instance makes changes after (in real synchronised time) the others ? I can see that this won't matter for the majority of nodes, as collisions are rare, but won't the lagging instance be always overridden in the root document _revisions list ? Are there any plans to maintain a clock difference vector for the cluster ? Best Regards Ian > > Regards > Marcel > >
DocumentStore question.
Hi, I am looking at [1], and probably confused. Is there an assumption that the revisions listed in _revisions are ordered ? If not, then how is the order of the revisions be determined, given that the clocks on each node in a cluster will have different offsets ? Best Regards Ian 1 http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html
cache and index backup and restore ?
Hi, Having done a cold backup of a MongoMK instance with a FS Datastore, is there any advantage in also backing up the local disk copy of the lucene index (normally in repository/index/** ) and persistent cache file (repository/cache/**) so that it can be restored on more than one Oak instance in the cluster. or do both those subtrees get zapped when the new instance starts ? Also, if I tar up everything to restore multiple times, is there anything I need to edit on disk to make the instances distinct. IIRC there was a sling.id at one point, but that might have been JR2 rather than Oak. Best Regards Ian
Re: OAK-3884 for UDBBroadcaster
Hi, Sorry. I misunderstood your question. The Serverfault question referenced in the Oak issue was about broadcasting to the loopback address, which can't be done. The patch looked like it was attempting to do that. Best Regards Ian On 14 January 2016 at 20:48, Philipp Suter <su...@adobe.com> wrote: > The patch is for unit tests “only”. It executes the unit tests on the > first interface address that has the BROADCAST option already configured. > Such the loopback interface is not taken into account. > > The test will fail if there is no interface address that has the BROADCAST > option configured. > > All of that could also be solved differently, e.g. with a virtual network. > I am not sure if that exists for unit testing. > > Cheers, > > Philipp > > > On 14/01/16 19:41, "ianbos...@gmail.com on behalf of Ian Boston" < > ianbos...@gmail.com on behalf of i...@tfd.co.uk> wrote: > > >Hi, > > > >Does the patch work ? > > > >According to the answer in the serverfault article referenced in OAK-3884 > >it should not > > > >I tried the pattern referenced on OSX using nc and it doesn't work. The > >original poster seems to think it works, but those answering disagree and > >the posted wasn't able to tell them which kernel it worked on. > > > > > >"The "solution" you are asking for does not exist (or at least *should not > >work*) on any platform unless the loopback interface supports BROADCAST > (as > >can be determined by the flags field in ifconfig) -- The platform(s) you > >are talking about *do not advertise support for broadcasting on the > >loopback interface*, therefore you cannot do what you're asking." > > > > > >There are some other less complimentary comments. > > > >It might be possible, with root access to the test machine, to setup > >several tun interfaces, connected to a bridge to create a virtual network > >on the same machine. You can do the same with multiple docker hosts on the > >same machine but all of that requires some setup that a Java process > >isnt going to be able to do. > > > > > > > > > > > >For a non loopback network you should not try and work out the broadcast > >address. IIRC you should set the broadcast flag on the UDP packet. (I > >assume UDB == UDP ?). > > > >I assume you are doing something like: > > > > socket = new DatagramSocket(, InetAddress.getByName("0.0.0.0")); > > socket.setBroadcast(true); > > > > > >Hosts on the same subnet will have the same network mast, otherwise they > >are not on the same subnet. All sorts of things will start to fail. eg if > >some are /24 and some are /25 all the broadcasts on the /25 subnet will be > >directed at the .127 host on the /24 subnet. (I haven't tried to see what > >a switch does with 2 overlapping and misconfigured subnets.). > > > > > > > >HTH > >Best Regards > >Ian > > > > > > > > > > > >On 14 January 2016 at 17:30, Philipp Suter <su...@adobe.com> wrote: > > > >> Hi > >> > >> I added a small patch to https://issues.apache.org/jira/browse/OAK-3884 > >> that could fix the broadcast unit tests for UDBBroadcaster. It seems the > >> loopback interface is not allowing broadcasting on *NIX systems. The > >> broadcasting IP has to be found dynamically for a test execution. > >> > >> Interesting next step: How could this be configured dynamically within a > >> clustered set-up? It needs an agreement among all cluster members to use > >> the same network mask. > >> > >> Cheers, > >> Philipp > >> > >> > >> >
Re: OAK-3884 for UDBBroadcaster
Hi, Does the patch work ? According to the answer in the serverfault article referenced in OAK-3884 it should not I tried the pattern referenced on OSX using nc and it doesn't work. The original poster seems to think it works, but those answering disagree and the posted wasn't able to tell them which kernel it worked on. "The "solution" you are asking for does not exist (or at least *should not work*) on any platform unless the loopback interface supports BROADCAST (as can be determined by the flags field in ifconfig) -- The platform(s) you are talking about *do not advertise support for broadcasting on the loopback interface*, therefore you cannot do what you're asking." There are some other less complimentary comments. It might be possible, with root access to the test machine, to setup several tun interfaces, connected to a bridge to create a virtual network on the same machine. You can do the same with multiple docker hosts on the same machine but all of that requires some setup that a Java process isnt going to be able to do. For a non loopback network you should not try and work out the broadcast address. IIRC you should set the broadcast flag on the UDP packet. (I assume UDB == UDP ?). I assume you are doing something like: socket = new DatagramSocket(, InetAddress.getByName("0.0.0.0")); socket.setBroadcast(true); Hosts on the same subnet will have the same network mast, otherwise they are not on the same subnet. All sorts of things will start to fail. eg if some are /24 and some are /25 all the broadcasts on the /25 subnet will be directed at the .127 host on the /24 subnet. (I haven't tried to see what a switch does with 2 overlapping and misconfigured subnets.). HTH Best Regards Ian On 14 January 2016 at 17:30, Philipp Suterwrote: > Hi > > I added a small patch to https://issues.apache.org/jira/browse/OAK-3884 > that could fix the broadcast unit tests for UDBBroadcaster. It seems the > loopback interface is not allowing broadcasting on *NIX systems. The > broadcasting IP has to be found dynamically for a test execution. > > Interesting next step: How could this be configured dynamically within a > clustered set-up? It needs an agreement among all cluster members to use > the same network mask. > > Cheers, > Philipp > > >
Re: Multiplexing Document Store
On 3 December 2015 at 04:15, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Hi Ian, > > On Wed, Dec 2, 2015 at 9:24 PM, Ian Boston <i...@tfd.co.uk> wrote: > > Hence all MutableTrees get their > > NodeBuilder from the root DocumentNodeState that is the DocumentNodeStore > > owning the root node and not the MultiplexingDocumentNodeStore. > > Some confusion here. What we have is a MultiplexDocumentStore and > there is no MultiplexingDocumentNodeStore. All the above objects > DocumentNodeState, DocumentNodeBuilder refer to DocumentNodeStore (not > DocumentStore). > I meant MultiplexDocumentStore. So many stores and so many layers make it hard for someone who doesn't work in Oak every day to find there way around. > > Use of MultiplexDocumentStore is an implementation detail of > DocumentNodeStore hence these objects need not be affected and aware > of multiplexing logic. May be I am missing something here? > Looking at the code again, I don't think you are missing anything. DocumentNodeStore gets its store from the DocumentMK.builder which is a singleton. I need to re-run my tests of the 30 October to find out why I was seeing the behaviour I was seeing, however IIRC you have recently been looking at /jcr:system etc and the branch I was working may now be obsolete ? If it is, please let me know as there is no point in duplication or wasted effort. Best Regards Ian > > Chetan Mehrotra >
Re: Multiplexing Document Store
Hi Robert, On 5 November 2015 at 22:58, Robert Munteanu <romb...@apache.org> wrote: > Hi Ian, > > On Fri, 2015-10-30 at 15:38 +0000, Ian Boston wrote: > > Hi, > > I am trying to enhance a multiplexing document store written > > initially by > > Robert Munteanu to support multiplexing of content under > > /jcr:system/** in > > particular the version store and the permissions store. I have a > > scheme > > that should theoretically work, encoding the target store in the > > entry name > > of the map key. > > > > However, it seems that DocumentNodeState (and hence > > DocumentNodeBuilder) > > objects created by a DocumentNodeStore get a reference to that > > DocumentNodeStore, bypassing any multiplexing document node store. > > This is > > all ok if all the calls relate to content in the same > > DocumentNodeStore, > > but as soon as anything performs a call into DocumentNodeStore that > > relates > > to a path not within that DocumentNodeStore the multiplexing breaks > > down. > > Code that reads and writes to /jcr:system/** does this. > > > > I have tried hacking the code to ensure that the reference to > > DocumentNodeStore is replaced by the MultiplexingDocumentNodeStore, > > however > > when I do that, MultiplexingDocumentNodeStore gets calls that have > > insufficient context to route to the correct DocumentNodeStore. I > > could > > hack some more, but if I do, anything that works is unlikely to be > > acceptable to Oak as a patch. > > Well, I can relate, as the DocumentStore multiplexing implementation > does have to be a little creative at times to find out the proper > store. > > Could you list (some of) the places where the > MultiplexingDocumentNodeStore does not have enough information to know > where to route the operation? Might be helpful as a starting point. > If I look at the DocumentStore API, it doesn't look too bad. Every method that targets a specific store, contains a key or uses an UpdateOp which contains a primary key. Assuming the store can always be derived from that primary key, everything should work. What's less clear is the implementation specific methods that are used within oak-core. I am reasonably certain that RDBDocumentStore and MongoDocumentStore make assumptions and call protected methods. I say, should work, assuming that every reference to a DocumentStore implementation can be replaced with a reference to the MultiplexingDocumentStore, so that all calls go via the Multiplexer and non go direct to the DocumentStore implementation. To do it properly will be a big patch to oak-core, and I haven't started to look at oak-lucene. There isn't a great deal of point in preparing that patch if the Oak committers don't want this. Best Regards Ian > > Thanks, > > Robert >
Re: Lucene auto-tune of cost
On 4 November 2015 at 00:45, Davide Giannellawrote: > Hello Team, > > Lucene index is always asynchronous and the async index could lag behind > by definition. > > Sometimes we could have the same query better served by a property > index, or traversing for example. In case the async index is lagging > behind it could be that the traversing index is better suited to return > the information as it will be more updated. > > As we know we run an async update every 5 seconds, we could come up with > some algorithm to be used on the cost computing, that auto correct with > some math the cost, increasing it the more the time passed since the > last full execution of async index. > > WDYT? > Going down the property index route, for a DocumentMK instance will bloat the DocumentStore further. That already consumes 60% of a production repository and like many in DB inverted indexes is not an efficient storage structure. It's probably ok for TarMK. Traversals are a problem for production. They will create random outages under any sort of concurrent load. --- If the way the indexing was performed is changed, it could make the index NRT or real time depending on your point of view. eg. Local indexes, each Oak index in the cluster becoming a shard with replication to cover instance unavailability. No more indexing cycles, soft commits with each instance using a FS Directory and a update queue replacing the async indexing queue. Query by map reduce. It might have to copy on write to seed new instances where the number of instances falls below 3. Best Regards Ian > > Davide >
Re: Lucene auto-tune of cost
Hi, Slightly off topic response: With the current indexing scheme: (IIUC). One factor is that with shared index files, indexing can only be performed on a cluster leader, and for updates the lucene segments must be written to the repository to be read by other instances in the cluster. That means a hard lucene commit. If the indexing is sync, then that will mean a large number of hard lucene commits, which generally leads to either latency or lots of IO or lots of segments. Hence Async is more efficient. If all lucene indexing is performed locally and the segments are not shared, sync indexing works without issue as updates can be written to a write ahead log, then added to the index with a soft commit, and the wal adjusted on periodic hard commits. local indexing is viable using the current scheme in a standalone environment. text extraction should ideally happen as a 1 time operation on immutable content bodies, the result being stored as metadata of the content body. imho it should be a separate operation from index update which should only deal with indexing properties, including a already tokenized stream. Tokenizing can be extremely resource expensive, especially with bad content, like vector remastered pdfs, hence why it should not block index updates. Best Regards Ian On 4 November 2015 at 10:37, Julian Sedding <jsedd...@gmail.com> wrote: > Slightly off topic: why is/should Lucene Indexes always be async? I > understand that requirement for a full-text index, which may need to > do (slow) text-extraction. However, updates on a Lucene-based property > index are usually very fast. So it is not obvious to me why they > should not be synchronous. > > Thanks for any enlightening replies! > > Regards > Julian > > On Wed, Nov 4, 2015 at 9:49 AM, Ian Boston <i...@tfd.co.uk> wrote: > > On 4 November 2015 at 00:45, Davide Giannella <dav...@apache.org> wrote: > > > >> Hello Team, > >> > >> Lucene index is always asynchronous and the async index could lag behind > >> by definition. > >> > >> Sometimes we could have the same query better served by a property > >> index, or traversing for example. In case the async index is lagging > >> behind it could be that the traversing index is better suited to return > >> the information as it will be more updated. > >> > >> As we know we run an async update every 5 seconds, we could come up with > >> some algorithm to be used on the cost computing, that auto correct with > >> some math the cost, increasing it the more the time passed since the > >> last full execution of async index. > >> > >> WDYT? > >> > > > > > > Going down the property index route, for a DocumentMK instance will bloat > > the DocumentStore further. That already consumes 60% of a production > > repository and like many in DB inverted indexes is not an efficient > storage > > structure. It's probably ok for TarMK. > > > > Traversals are a problem for production. They will create random outages > > under any sort of concurrent load. > > > > --- > > If the way the indexing was performed is changed, it could make the index > > NRT or real time depending on your point of view. eg. Local indexes, each > > Oak index in the cluster becoming a shard with replication to cover > > instance unavailability. No more indexing cycles, soft commits with each > > instance using a FS Directory and a update queue replacing the async > > indexing queue. Query by map reduce. It might have to copy on write to > seed > > new instances where the number of instances falls below 3. > > > > > > > > Best Regards > > Ian > > > > > > > >> > >> Davide > >> >
Multiplexing Document Store
Hi, I am trying to enhance a multiplexing document store written initially by Robert Munteanu to support multiplexing of content under /jcr:system/** in particular the version store and the permissions store. I have a scheme that should theoretically work, encoding the target store in the entry name of the map key. However, it seems that DocumentNodeState (and hence DocumentNodeBuilder) objects created by a DocumentNodeStore get a reference to that DocumentNodeStore, bypassing any multiplexing document node store. This is all ok if all the calls relate to content in the same DocumentNodeStore, but as soon as anything performs a call into DocumentNodeStore that relates to a path not within that DocumentNodeStore the multiplexing breaks down. Code that reads and writes to /jcr:system/** does this. I have tried hacking the code to ensure that the reference to DocumentNodeStore is replaced by the MultiplexingDocumentNodeStore, however when I do that, MultiplexingDocumentNodeStore gets calls that have insufficient context to route to the correct DocumentNodeStore. I could hack some more, but if I do, anything that works is unlikely to be acceptable to Oak as a patch. Any suggestions ? I haven't even started to look at /oak:index and the code that writes/reads from there. Best Regards Ian
Multiplexing Document Store
Hi, I am trying to enhance a multiplexing document store written initially by Robert Munteanu to support multiplexing of content under /jcr:system/** in particular the version store and the permissions store. I have a scheme that should theoretically work, encoding the target store in the entry name of the map key. However, it seems that DocumentNodeState (and hence DocumentNodeBuilder) objects created by a DocumentNodeStore get a reference to that DocumentNodeStore, bypassing any multiplexing document node store. This is all ok if all the calls relate to content in the same DocumentNodeStore, but as soon as anything performs a call into DocumentNodeStore that relates to a path not within that DocumentNodeStore the multiplexing breaks down. Code that reads and writes to /jcr:system/** does this. I have tried hacking the code to ensure that the reference to DocumentNodeStore is replaced by the MultiplexingDocumentNodeStore, however when I do that, MultiplexingDocumentNodeStore gets calls that have insufficient context to route to the correct DocumentNodeStore. I could hack some more, but if I do, anything that works is unlikely to be acceptable to Oak as a patch. Any suggestions ? I haven't even started to look at /oak:index and the code that writes/reads from there. Best Regards Ian (Apologies if this comes through 2x, afaict, my first email didnt get through).
Re: [discuss] Near real time search to account for latency in background indexing
Hi Chetan, The overall approach looks ok. Some questions about indexing. How will you deal with JVM failure ? and related. How frequently will commits to the persisted index be performed ? I assume that switching to use ElasticSearch, which delivers NRT reliably in the 0.1s range has been rejected as an option ? If it has, you may find yourself implementing much of the core of ElasticSearch to make NTR work properly in a cluster. Best Regards Ian On 24 July 2015 at 08:09, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Fri, Jul 24, 2015 at 12:15 PM, Michael Marth mma...@adobe.com wrote: From your description I am not sure how the indexing would be triggered for local changes. Probably not through the Async Indexer (this would not gain us much, right?). Would this be a Commit Hook? My thought was to use an Observor so as to not add cost to commit call. Observor would listen only for local changes and would invoke IndexUpdate on the diff Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
Hi, On 24 July 2015 at 09:06, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi Ian, To be clear the in memory index is purely ephemeral and is not meant to be persisted. It just compliments the persistent index to allow access to recently added/modified entries. So now to your queries How will you deal with JVM failure ? Do nothing. The index as explained is transient. Current AsyncIndex would anyway be performing the usual indexing and is resilient enough How frequently will commits to the persisted index be performed ? This index lives separately. Persisted index managed by AsyncIndex works as is ok, so there is a hard commit to the persisted index on every update so nothing gets lost on JVM failure I assume that switching to use ElasticSearch, which delivers NRT reliably in the 0.1s range has been rejected as an option ? No. The problem here is bit different. Lucene indexes are being used for all sort of indexing currently in Oak. In many cases its being used as purely property index. ES makes sense mostly for global fulltext index and would be an overkill for smaller more focused property index types of usecases. Well ES is primarily used as a property index. In fact it doesn't have any built in full text digesters which is why people that want that, look first at Solr until they hit the commit and segment ship latency issues with Solr Cloud. The commercial uses of ES (elasticsearch.com) only index properties. As for complexity, running ES in OSGi is not complex at will run embedded OOTB with no configuration and no ES server setup. Generally 1 class is required. That is only required if you want to run a dedicated ES cluster and even then its no more complex than a connection URL. If it has, you may find yourself implementing much of the core of ElasticSearch to make NTR work properly in a cluster. Again usecase here is not to support NTR as is. Current indexing would work as is and this transient index would compliment it. Ok, thanks for the clarification, I misunderstood the subject line. NRT search (sub 0.1s latency) normally needs a write ahead log to work in production and avoid data loss and/or high hard commit volumes killing latency and creating merge/too many files issues as the number of segments grows. Best Regards Ian Chetan Mehrotra On Fri, Jul 24, 2015 at 1:01 PM, Ian Boston i...@tfd.co.uk wrote: Hi Chetan, The overall approach looks ok. Some questions about indexing. How will you deal with JVM failure ? and related. How frequently will commits to the persisted index be performed ? I assume that switching to use ElasticSearch, which delivers NRT reliably in the 0.1s range has been rejected as an option ? If it has, you may find yourself implementing much of the core of ElasticSearch to make NTR work properly in a cluster. Best Regards Ian On 24 July 2015 at 08:09, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Fri, Jul 24, 2015 at 12:15 PM, Michael Marth mma...@adobe.com wrote: From your description I am not sure how the indexing would be triggered for local changes. Probably not through the Async Indexer (this would not gain us much, right?). Would this be a Commit Hook? My thought was to use an Observor so as to not add cost to commit call. Observor would listen only for local changes and would invoke IndexUpdate on the diff Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
Hi Tommaso, My knowledge of Solr is not anything like as deep as yours. I would like to check what I know is correct, to avoid sharing the wrong information. In [1] the first test does not commit and is backed by a RAMDirectory shared by the reader and writer. Does that mean that Lucene natively only supports NRT inside a single JVM, and if the JVM dies anything not hard committed (if the RAMDirectory was backed by a FileDirectory) would be lost or is recovery of soft commits and pre-soft commits now handled automatically in Lucene4. Last time I looked at the source code was shortly before Lucene 4.0 was released which was some years back. Best Regards Ian On 24 July 2015 at 09:49, Tommaso Teofili tommaso.teof...@gmail.com wrote: I think the proposal makes sense; in the end NRT is something that is inherently supported by Lucene (see an example [1]) and, as Ian mentioned, something that has been similarly implemented in ES and Solr. I think it'd be possible though to make use of Lucene's NRT capability by changing a bit the code that creates an IndexReader [2] to use DirectoryReader#open(IndexWriter,boolean) [3]. My 2 cents, Tommaso [1] : https://gist.github.com/mocobeta/4640263 [2] : https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/IndexNode.java#L94 [3] : https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/index/DirectoryReader.html#open(org.apache.lucene.index.IndexWriter , boolean) 2015-07-24 10:23 GMT+02:00 Michael Marth mma...@adobe.com: The reason I preferred using Lucene is that current property index only support single condition evaluation. I did not know this. That’s a strong argument in favour of using Lucene.
Re: [discuss] Near real time search to account for latency in background indexing
Hi, On 24 July 2015 at 11:11, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi Ian, 2015-07-24 11:11 GMT+02:00 Ian Boston i...@tfd.co.uk: Hi Tommaso, My knowledge of Solr is not anything like as deep as yours. I would like to check what I know is correct, to avoid sharing the wrong information. In [1] the first test does not commit and is backed by a RAMDirectory shared by the reader and writer. Does that mean that Lucene natively only supports NRT inside a single JVM, and if the JVM dies anything not hard committed (if the RAMDirectory was backed by a FileDirectory) would be lost yes, exactly. or is recovery of soft commits and pre-soft commits now handled automatically in Lucene4. no, that's not part of Lucene as far as I know; that's Solr allowing soft commits [1], which use Lucene's NRT capabilities and a transaction log [2], if JVM crashes soft commits are recovered from the transaction log and re-executed once Solr is restarted. Thank you for the information, especially [2]. My knowledge was out of date. The transaction log did not exist in SolrCloud when I last looked at the source code and was only available as a commercial component from Lucidworks. For reference, the only difference between SolrCloud4 as described in [2] and ES appears to be that SolrCloud4 works on source documents whereas ES works on digested versions of a document (ie properties only). At some point prior to releasing the code I looked at, SolrCloud4 must have switched from segment based replication in favour of document based replication. Best Regards Ian Last time I looked at the source code was shortly before Lucene 4.0 was released which was some years back. Regards, Tommaso [1] : http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 [2] : http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best Regards Ian On 24 July 2015 at 09:49, Tommaso Teofili tommaso.teof...@gmail.com wrote: I think the proposal makes sense; in the end NRT is something that is inherently supported by Lucene (see an example [1]) and, as Ian mentioned, something that has been similarly implemented in ES and Solr. I think it'd be possible though to make use of Lucene's NRT capability by changing a bit the code that creates an IndexReader [2] to use DirectoryReader#open(IndexWriter,boolean) [3]. My 2 cents, Tommaso [1] : https://gist.github.com/mocobeta/4640263 [2] : https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/IndexNode.java#L94 [3] : https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/index/DirectoryReader.html#open(org.apache.lucene.index.IndexWriter , boolean) 2015-07-24 10:23 GMT+02:00 Michael Marth mma...@adobe.com: The reason I preferred using Lucene is that current property index only support single condition evaluation. I did not know this. That’s a strong argument in favour of using Lucene.
Re: /oak:index (DocumentNodeStore)
Hi Marcel, Thanks for the response, that makes sense. I assume that there already 64 indexes in /oak:index before any custom ones are added, which makes it impossible to remove /oak:index for MongoDB. With that many it's going to be impractical for all RDBMS's. Would there be any benefit in moving /oak:index out of the main document collection so that any MongoDB indexes in the collection of no relevance to /oak:index don't get bloated ? or, more generally Is there a different way of storing the data in /oak:index so that it doesn't result in so many MongoDB documents ? Best Regards Ian On 9 July 2015 at 08:15, Marcel Reutegger mreut...@adobe.com wrote: Hi Ian, there are mainly two reasons why we cannot use DocumentStore based indexes for this purpose: - MongoDB only supports a limited number of indexes (64 per collection) and applications usually have a need for more indexes. - Data in Oak is multi-versioned. It must be possible to query nodes at a specific revision of the tree. Lucene indexes are more efficient, but are only updated asynchronously. Whether this is acceptable usually depends on application requirements. Experience so far shows, many indexes can be asynchronous, because there was no hard requirement for synchronous index updates. Regards Marcel On 08/07/15 18:18, ianbos...@gmail.com on behalf of Ian Boston wrote: Hi, I am confused at how /oak:index works and why it is needed in a MongoDB setting which has native database indexes that appear to cover the same functionality. Could the Oak Query engine use DB indexes directly for all indexes that are built into Oak, and Lucene indexes for all custom indexes ? I am asking this because in MongoDB I observe that 60% of the size of the nodes collection is attributable to /oak:index, and that the 60% increases every non sparse MongoDB index by about 3x. An _id + _modified compound index in MongoDB comes out at about 70GB for 100M documents (in part due to the size of _id). Without the duplication /oak:index it could be closer to 25GB. Disk space is cheap, but MongoDB working set RAM is not cheap, neither is page fault IO. I fully understand why TarMK needs /oak:index, but I can't understand (conceptually) the need to implement an index inside an database table. It's like trying to implement an inverted index in an RDBMS table, which everyone who has ever tried (or used) that approach doesn't scale nearly as far as Lucene bitmaps. Could /oak:index be replaced by something that doesn't generate Documents/db rows as fast as it does ? Best Regards Ian
Re: /oak:index (DocumentNodeStore)
On 9 July 2015 at 09:16, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Thu, Jul 9, 2015 at 12:45 PM, Marcel Reutegger mreut...@adobe.com wrote: - Data in Oak is multi-versioned. It must be possible to query nodes at a specific revision of the tree. To add - That also makes it difficult to use Mongo indexes as the index itself is versioned. So instead of just indexing property 'foo' you need to index it for every revision Won't compound indexes work ? { _id : 1, _modified: 1, _revision: 1 } ? They are bigger. _id is 211 bytes per entry average _modified: _id is 233 _revision, _modified, _id is probably close to 400 bytes as _revision is a string. I guess the way of telling is to generate the index on a test database and see what impact it has. Best Regards Ian Chetan Mehrotra
Re: /oak:index (DocumentNodeStore)
Hi, On 9 July 2015 at 10:33, Thomas Mueller muel...@adobe.com wrote: Hi, Using MongoDB indexes directly doesn't work because of the MVCC model. What we could do is add special collections (basically one collection per index). This would requires some work, which then would need to be repeated for RDBMK. It would be quite some work. ok, understood. I observe that 60% of the size of the nodes collection is attributable to /oak:index Could you try to find out which index(es) are responsible for that? Marcel and Chetan have been working on the repository I was observing. I am sure they can point you to the details offline, if you are not aware of it already. They were able to remove about 25% of the 60% under /oak:index, but IIUC most of the remainder and not local customisations, and perhaps 40% of what remains is not local customisations and must be synchronous, which indicates a 1:2 ratio between real content nodes and MongoDB documents before any MongoDB indexes are considered. That ratio was the motivation for asking the question. Chetan thought I should discuss on oak-dev. Marcel and Chetan have executed 0) and 1) below, far more knowledgable than I in this area. Best Regards Ian There would be multiple ways to reduce the number of nodes: 0) remove unused indexes 1) convert some indexes to Lucene property indexes 2) convert to unique index if possible (as this uses less space) 3) add a feature to only index a subset of the keys (only index what we need) 4) convert the last x levels of the index structure as a property instead of as a node 3) and 4) would require changes in Oak. For 4), the change should reduce the number of nodes, but might cause merge conflicts (not sure). With level = 1, it would be: /content/products/a @color=red /content/products/b @color=red /oak:index/color/red/content /oak:index/color/red/content/products @a = true, @b = true instead of /oak:index/color/red/content /oak:index/color/red/content/products /oak:index/color/red/content/products/a @match = true /oak:index/color/red/content/products/b @match = true With level 1, it would require some escaping magic, but we could save some more nodes, and basically it would be: level = 2: /oak:index/color/red/content @products_a = true, @products_b = true level = 3: /oak:index/color/red @content_products_a = true, @content_products_b = true Regards, Thomas On 08/07/15 18:18, Ian Boston i...@tfd.co.uk wrote: Hi, I am confused at how /oak:index works and why it is needed in a MongoDB setting which has native database indexes that appear to cover the same functionality. Could the Oak Query engine use DB indexes directly for all indexes that are built into Oak, and Lucene indexes for all custom indexes ? I am asking this because in MongoDB I observe that 60% of the size of the nodes collection is attributable to /oak:index, and that the 60% increases every non sparse MongoDB index by about 3x. An _id + _modified compound index in MongoDB comes out at about 70GB for 100M documents (in part due to the size of _id). Without the duplication /oak:index it could be closer to 25GB. Disk space is cheap, but MongoDB working set RAM is not cheap, neither is page fault IO. I fully understand why TarMK needs /oak:index, but I can't understand (conceptually) the need to implement an index inside an database table. It's like trying to implement an inverted index in an RDBMS table, which everyone who has ever tried (or used) that approach doesn't scale nearly as far as Lucene bitmaps. Could /oak:index be replaced by something that doesn't generate Documents/db rows as fast as it does ? Best Regards Ian
/oak:index (DocumentNodeStore)
Hi, I am confused at how /oak:index works and why it is needed in a MongoDB setting which has native database indexes that appear to cover the same functionality. Could the Oak Query engine use DB indexes directly for all indexes that are built into Oak, and Lucene indexes for all custom indexes ? I am asking this because in MongoDB I observe that 60% of the size of the nodes collection is attributable to /oak:index, and that the 60% increases every non sparse MongoDB index by about 3x. An _id + _modified compound index in MongoDB comes out at about 70GB for 100M documents (in part due to the size of _id). Without the duplication /oak:index it could be closer to 25GB. Disk space is cheap, but MongoDB working set RAM is not cheap, neither is page fault IO. I fully understand why TarMK needs /oak:index, but I can't understand (conceptually) the need to implement an index inside an database table. It's like trying to implement an inverted index in an RDBMS table, which everyone who has ever tried (or used) that approach doesn't scale nearly as far as Lucene bitmaps. Could /oak:index be replaced by something that doesn't generate Documents/db rows as fast as it does ? Best Regards Ian
Re: Observation: External vs local - Load distribution
Hi, +1 for unbounded, let the GC take care of it and log periodically when its size becomes significant so that anyone wondering why their JVM is consuming so much GC time gets a clue as to the cause, without having to perform heap dumps, thread dumps or jvm probes. (but ideally all queues would have simple, efficient, metrics that can be monitored all the time in production, not just by someone connecting a JMX console or Web Console) Best Regards Ian On 17 June 2015 at 09:56, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Just ensure that your Observer is fast as its invoked the critical path. This would probably end up with a design similar to Background Observer. May be better option would be to allow BO have non bounded queue. Chetan Mehrotra On Wed, Jun 17, 2015 at 2:05 PM, Carsten Ziegeler cziege...@apache.org wrote: Ok, just to recap. In Sling we can implement the Observer interface (and not use the BackgroundObserver base class). This will give us reliably user id for all local events. Does anyone see a problem with this approach? Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
MongoDB collections in MongoDocumentStore
Hi, Is there a fundamental reason why data stored in MongoDB for MongoDocumentStore cant be stored in more than the 3 MondoDB collections currently used ? I am thinking that the collection name is a fn(key). What problems would that cause elsewhere ? Best Regards Ian
Re: MongoDB collections in MongoDocumentStore
H Norberto, Thank you for the feedback on the questions. I see you work for as an Evangelist for MongoDB, so will probably know the answers, and can save me time. I agree it's not worth doing anything about concurrency even if logs indicate there is contention on locks in 2.6, as the added complexity would make read things worse. If an upgrade to 3.0 has been done, anything collection based makes is a waste of time due to the availability of WiredTiger. Could you confirm that separating one large collection into a number of smaller collections will not reduce the size of the indexes that have to be consulted for queries of the form that Chetan shared earlier ? I'll try and clarify that question. DocumentNodeStore has 1 collection containing all Documents nodes. Some queries are only interested in a key space representing a certain part of the nodes collection, eg n:/largelystatic/**. If those Documents were stored in nodes_x, and count(nodes_x) = 0.001*count(nodes), would there be any performance advantage or does MongoDB, under the covers, treat all collections as a single massive collection from an index and query point of view ? If you have any pointer to how 2.6 scale relative to collection size, number of collections and index size that would help me understand more about its behaviour. Best Regards Ian On 12 June 2015 at 17:08, Norberto Leite norbe...@norbertoleite.com wrote: Hi Ian, Your proposal would not be very efficient. The concurrency control mechanism that 2.6 offers (current supported version), although not neglectable, would not be that beneficial on the write load. On the reading part, which we can assume is the gross workload that JCR will be doing, is not affected by that. One needs to consider that every time you would be reading from the JCR you either would be providing a complex M/R operation, which is designed to span out to the full amount of documents existing in a given collection, and would need to recur all affected collections. Not very effective. The existing mechanism is way more simple and more efficient. With the upcoming support for wired tiger, the concurrency control (potential issue) becomes totally irrelevant. Also don't forget that you cannot predict the number of child nodes that a given system would implement to define their content tree. If you do have a very nested (on specific level) number of documents you would need to treat that collection separately(when needing to scale just shard that collection and not the others) bringing in more operational complexity. What can be a good discussion point would be to separate the blobs collection into its own database given the flexibility that JCR offers when treating these 2 different data types. Actually, this reminded me that I was pending on submitting a jira request on this matter https://issues.apache.org/jira/browse/OAK-2984. As Chetan is mentioning, sharding comes into play once we have to scale the write throughput of the system. N. On Fri, Jun 12, 2015 at 4:15 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Fri, Jun 12, 2015 at 7:32 PM, Ian Boston i...@tfd.co.uk wrote: Initially I was thinking about the locking behaviour but I realises 2.6.* is still locking at a database level, and that only changes to at a collection level 3.0 with MMAPv1 and row if you switch to WiredTiger [1]. I initially thought the same and then we benchmarked the throughput by placing the BlobStore in a separate database (OAK-1153). But did not observed any significant gains. So that approach was not pursued further. If we have some benchmark which can demonstrate that write throughput increases if we _shard_ node collection into separate database on same server then we can look further there Chetan Mehrotra
Re: MongoDB collections in MongoDocumentStore
On 12 June 2015 at 14:13, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Fri, Jun 12, 2015 at 5:20 PM, Ian Boston i...@tfd.co.uk wrote: Are all queries expected to query all keys within a collection as it is now, or is there some logical structure to the querying ? Not sure if I get your question. The queries are always for immediate children. For for 1:/a the query is like $query: { _id: { $gt: 2:/a/, $lt: 2:/a0 } So, knowing that /a and all its children was in collection nodes_a, you would only need to query nodes_a ? But if /a was stored in nodes_root and its children were stored in nodes_[a-z] (26 collections), then you would need to map reduce over all 26 collections ? Initially I was thinking about the locking behaviour but I realises 2.6.* is still locking at a database level, and that only changes to at a collection level 3.0 with MMAPv1 and row if you switch to WiredTiger [1]. Even so, would increasing the number of collections have an impact on query costs. ie put /oak:index in its own collection and isolate its indexes ? Best Regards Ian 1 http://www.wiredtiger.com/ Chetan Mehrotra
Re: MongoDB collections in MongoDocumentStore
Hi Norberto, Thank you. That saved me a lot of time, and I learnt something in the process. So in your opinion, is there anything that can or should be done in the DocumentNodeStore from a schema point of view to improve the read or write performance of Oak on MongoDB without resorting to sharding or upgrading to 3.0 and WiredTiger ? I am interested in JCR nodes not including blobs. Best Regards Ian On 12 June 2015 at 18:54, Norberto Leite norbe...@norbertoleite.com wrote: Hi Ian, indexes are bound per collection. That means that if you have a large collection that index will be correspondingly large. In the case of *_id* which is the primary key of all collections on MongoDB this is proportional to the number of documents that you contain per collection. Having a large data spread across different collections makes those indexes individually smaller but in combination larger (we need to account for the overhead of each index entries and some header information that composes the indexes). Also take into account that every time you switch between collections to perform different queries (there are no joins in MongoDB) you will need to reload to memory the index structure of all individual collections affected by your query, which comes with some penalties, if you do not have enough space in ram for the full amount. That said, in MongoDB all information is handled using one single big file per database (although spread across different extensions on disk) on storage engine MMApv1 (current default for both 3.0 and 2.6). With WiredTiger this is broke down to individual files per collection and per index structure. Bottom line is, if there would be a marginal benefit for insert rates if you break the JCR nodes collection into different collections due to the fact that per insert you would have smaller index and data structures to transverse and update, but a lot more inefficiencies on the query part since you would be page faulting more often to address the traverse required on both indexes and collection data. So yes, Chetan is right by stating that the actual size occupied by the indexes would not be smaller, it would actually increase. What is important to mention is that sharding takes care of this by spreading the load between instances and this reflects immediately both on the size of the data that each individual shard would have to handle (smaller data collections = smaller indexes) and allows paralleled workload while retrieving back the query requests. Another aspect to considered is that fragmentation of the data set will affect reads and writes on the long term. I'm going to be delivering a talk soon at http://www.connectcon.ch/2015/en.html where I address this (If you are interested on attending) on how to handled and detect these situations on JCR implementations. To complete the description, the concurrency control mechanism (often quoted by locking) is more granular in 3.0 MMApv1 implementation, going from database level to collection. N. On Fri, Jun 12, 2015 at 7:31 PM, Ian Boston i...@tfd.co.uk wrote: H Norberto, Thank you for the feedback on the questions. I see you work for as an Evangelist for MongoDB, so will probably know the answers, and can save me time. I agree it's not worth doing anything about concurrency even if logs indicate there is contention on locks in 2.6, as the added complexity would make read things worse. If an upgrade to 3.0 has been done, anything collection based makes is a waste of time due to the availability of WiredTiger. Could you confirm that separating one large collection into a number of smaller collections will not reduce the size of the indexes that have to be consulted for queries of the form that Chetan shared earlier ? I'll try and clarify that question. DocumentNodeStore has 1 collection containing all Documents nodes. Some queries are only interested in a key space representing a certain part of the nodes collection, eg n:/largelystatic/**. If those Documents were stored in nodes_x, and count(nodes_x) = 0.001*count(nodes), would there be any performance advantage or does MongoDB, under the covers, treat all collections as a single massive collection from an index and query point of view ? If you have any pointer to how 2.6 scale relative to collection size, number of collections and index size that would help me understand more about its behaviour. Best Regards Ian On 12 June 2015 at 17:08, Norberto Leite norbe...@norbertoleite.com wrote: Hi Ian, Your proposal would not be very efficient. The concurrency control mechanism that 2.6 offers (current supported version), although not neglectable, would not be that beneficial on the write load. On the reading part, which we can assume is the gross workload that JCR will be doing, is not affected by that. One needs to consider that every time you would
Re: Semantics of the document key at the DocumentStore level
Hi, On 10 June 2015 at 09:41, Robert Munteanu romb...@apache.org wrote: On Tue, 2015-06-09 at 17:01 +0200, Julian Reschke wrote: On 2015-06-09 16:41, Ian Boston wrote: Hi, Should the opaque String key be abstracted into a DocumentKey interface so that how the key is interpreted, and how it might be associated with a certain type of storage can be abstracted as well, rather than relying on some out of band specification of the key to be serialised and parsed at every transition ? Best Regards Ian ... Absolutely. That would also be relevant for some some parts of the MongoDocumentStore implementation which currently make assumptions about ID structure for cache invalidation purposes. If your suggestion aims towards something like @@ -59,7 +59,7 @@ public interface DocumentStore { * @return the document, or null if not found */ @CheckForNull -T extends Document T find(CollectionT collection, String key); +T extends Document T find(CollectionT collection, DocumentKey key); /** * Get the document with the {@code key}. The implementation may serve the that would be a very invasive change which will propagate throughout the Oak code base and also break backwards compatibility ( unless we keep a set of deprecated methods alongside ) , neither of which seems very nice. agreed. Very invasive and not very nice, however the interface does not appear to be exported from the core bundle so deprecation might not be required. Nothing outside the code-bundle should be binding to it or implementing it. Did I read META-INF/MANIFEST.MF from the built jar correctly ? One of the advantages of changing the interface is it will ensure no unexpected use of the key in string form as such code won't compile. btw, supporting multi tenancy probably does require some exported API changes, even if zero downtime does not. I have resorted to making those changes (in a fork) to see if a PoC can work, rather than avoiding the changes completely, as it leads to a cleaner solution. Hopefully the changes will be acceptable, if the PoC works. Another possible approach is to have a DocumentKey class ( or alternatively an interface backed by a DocumentKeyFactory class if you prefer ), which can be used as follows DocumentKey docKey = DocumentKey.from(String key); log.info(Path is {}, depth is {}, docKey.getPath(), docKey.getId()); That could also work, and does centralise the out of band interpretation of the key. Robert Best Regards Ian
Re: ImmutableTreeTest.testHiddenExists , confusion.
Hi, Thanks for the clarification. If ImmutableTree is not exposed the Unit test makes sense. I have fixed my modified code, the test passes. Only a few more to go. Best Regards Ian On 3 June 2015 at 12:43, Angela Schreiber anch...@adobe.com wrote: hi ian here is my take on it: - the ImmutableTree is just a tiny wrapper around the regular non-secured Nodestate for the sake of allowing to make use of the hierarchy operations that are missing in the NodeState interface. those immutable (read-only) trees are meant to be used for oak internal low-level usages where keeping track of the parent and the path is required although the operations are meant to be performed on the un-secured NodeState, which usually requires access to hidden items as well. - the MutableTree is that actual implementation of the Tree interface that is exposed when operating on the Oak API s.str. This one *must never* expose the hidden items. hope that helps angela On 03/06/15 12:50, Ian Boston i...@tfd.co.uk wrote: Hi, I am confused how hidden trees work in MutableTree and ImmutableTree. The ImmutableTreeTesttestHiddenExists asserts that a hidden node from an ImmutableTree will return true() to exists, and yet the same node from a MuttableTree is hard coded to return false to exists(). Is this correct ? If so, why ? Some checks I added to the test to investigate are below. assume I have a node ':hidden' which I believe has been committed to the NodeStore in use. - check1 -- NodeState testNode2 = store.getRoot().getChildNode(:hidden); boolean testNode2Exists = testNode2.exists(); testNode2 is a DocumentNodeState { path: '/:hidden', rev: 'r14db8ebf37d-0-1', properties: '{}' } testNode2Exists == true - check2 -- Tree mutableTree = session.getLatestRoot().getTree(/); Tree hiddenTree = mutableTree.getChild(:hidden); boolean hiddenTreeExists = hiddenTree.exists(); hiddenTree is a HiddenTree /:hidden: {} hiddenTreeExists == false hiddenTree is a HiddenTree because mutableTree.getChild looks at the name and hard codes a HiddenTree for anything beginning with ':' --- check3 ImmutableTree immutable = new ImmutableTree(state); ImmutableTree hiddenChild = immutable.getChild(:hidden); boolean childExists = hiddenChild.exists(); hiddenChild is an ImmutableTree, childExists == true The unit test does: @Test public void testHiddenExists() { ImmutableTree hidden = immutable.getChild(:hidden); assertTrue(hidden.exists()); } BTW: I the modified NodeStore I am working on, check3 childExists returns false, failing the test, hence the question. I would like to understand why the test is correct so I can find the bug in my code. Best Regards Ian
ImmutableTreeTest.testHiddenExists , confusion.
Hi, I am confused how hidden trees work in MutableTree and ImmutableTree. The ImmutableTreeTesttestHiddenExists asserts that a hidden node from an ImmutableTree will return true() to exists, and yet the same node from a MuttableTree is hard coded to return false to exists(). Is this correct ? If so, why ? Some checks I added to the test to investigate are below. assume I have a node ':hidden' which I believe has been committed to the NodeStore in use. - check1 -- NodeState testNode2 = store.getRoot().getChildNode(:hidden); boolean testNode2Exists = testNode2.exists(); testNode2 is a DocumentNodeState { path: '/:hidden', rev: 'r14db8ebf37d-0-1', properties: '{}' } testNode2Exists == true - check2 -- Tree mutableTree = session.getLatestRoot().getTree(/); Tree hiddenTree = mutableTree.getChild(:hidden); boolean hiddenTreeExists = hiddenTree.exists(); hiddenTree is a HiddenTree /:hidden: {} hiddenTreeExists == false hiddenTree is a HiddenTree because mutableTree.getChild looks at the name and hard codes a HiddenTree for anything beginning with ':' --- check3 ImmutableTree immutable = new ImmutableTree(state); ImmutableTree hiddenChild = immutable.getChild(:hidden); boolean childExists = hiddenChild.exists(); hiddenChild is an ImmutableTree, childExists == true The unit test does: @Test public void testHiddenExists() { ImmutableTree hidden = immutable.getChild(:hidden); assertTrue(hidden.exists()); } BTW: I the modified NodeStore I am working on, check3 childExists returns false, failing the test, hence the question. I would like to understand why the test is correct so I can find the bug in my code. Best Regards Ian