Re: Test failures in o.a.j.o.spi.whiteboard.WhiteboardUtilsTest
On Tue, Nov 24, 2015 at 4:36 PM, Francesco Mari <mari.france...@gmail.com> wrote: > Maven home: /usr/local/Cellar/maven32/3.2.5/libexec > Java version: 1.8.0_65, vendor: Oracle Corporation I am on JDK 1.7.0_55 and there it passes. Would try it on JDK 8 Chetan Mehrotra
Re: Test failures in o.a.j.o.spi.whiteboard.WhiteboardUtilsTest
Looks like on JDK 8 the MBean interface has to be public for MBean registration to work. Done that with 1716110 @Francesco - Can you try again with updated trunk? Chetan Mehrotra On Tue, Nov 24, 2015 at 4:58 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Tue, Nov 24, 2015 at 4:36 PM, Francesco Mari > <mari.france...@gmail.com> wrote: >> Maven home: /usr/local/Cellar/maven32/3.2.5/libexec >> Java version: 1.8.0_65, vendor: Oracle Corporation > > I am on JDK 1.7.0_55 and there it passes. Would try it on JDK 8 > > Chetan Mehrotra
Re: Threading Question
Have a look at webapp example [1] for suggested setup. The repository should be created once and then reused. Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/webapp On Wed, Nov 18, 2015 at 4:02 AM, David Marginian <da...@davidmarginian.com> wrote: > https://jackrabbit.apache.org/oak/docs/construct.html > > In a threaded environment (servlet, etc.) is it ok/recommended to create the > repository/nodestore once and store as instance variables and then use the > repository to create a new session per request? I know that sessions should > not be shared across threads but I wasn't sure about repositories. > > Thanks!
Re: API proposal for - Expose URL for Blob source (OAK-1963)
I have started a new mail thread around "Usecases around Binary handling in Oak" so as to first collect the kind of usecases we need to support. Once we decide that we can discuss the possible solution. So lets continue the discussion on that thread Chetan Mehrotra On Tue, May 17, 2016 at 12:31 PM, Angela Schreiber <anch...@adobe.com> wrote: > Hi Oak-Devs > > Just for the record: This topic has been discussed in a Adobe > internal Oak-coordination call last Wednesday. > > Michael Marth first provided some background information and > we discussed the various concerns mentioned in this thread > and tried to identity the core issue(s). > > Marcel, Michael Duerig and Thomas proposed alternative approaches > on how to address the original issues that lead to the API > proposal, which all would avoid leaking out information about > the internal blob handling. > > Unfortunately we ran out of time and didn't conclude the call > with an agreement on how to proceed. > > From my perception the concerns raised here could not be resolved > by the additional information. > > I would suggest that we try to continue the discussion here > on the list. Maybe with a summary of the alternative proposals? > > Kind regards > Angela > > On 11/05/16 15:38, "Ian Boston" <i...@tfd.co.uk> wrote: > > >Hi, > > > >On 11 May 2016 at 14:21, Marius Petria <mpet...@adobe.com> wrote: > > > >> Hi, > >> > >> I would add another use case in the same area, even if it is more > >> problematic from the point of view of security. To better support load > >> spikes an application could return 302 redirects to (signed) S3 urls > >>such > >> that binaries are fetched directly from S3. > >> > > > >Perhaps that question exposes the underlying requirement for some > >downstream users. > > > >This is a question, not a statement: > > > >If the application using Oak exposed a RESTfull API that had all the same > >functionality as [1], and was able to perform at the scale of S3, and had > >the same security semantics as Oak, would applications that are needing > >direct access to S3 or a File based datastore be able to use that API in > >preference ? > > > >Is this really about issues with scalability and performance rather than a > >fundamental need to drill deep into the internals of Oak ? If so, > >shouldn't > >the scalability and performance be fixed ? (assuming its a real concern) > > > > > > > > > >> > >> (if this can already be done or you think is not really related to the > >> other two please disregard). > >> > > > >AFAIK this is not possible at the moment. If it was deployments could use > >nginX X-SendFile and other request offloading mechanisms. > > > >Best Regards > >Ian > > > > > >1 http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html > > > > > >> > >> Marius > >> > >> > >> > >> On 5/11/16, 1:41 PM, "Angela Schreiber" <anch...@adobe.com> wrote: > >> > >> >Hi Chetan > >> > > >> >IMHO your original mail didn't write down the fundamental analysis > >> >but instead presented the solution for every the 2 case I was > >> >lacking the information _why_ this is needed. > >> > > >> >Both have been answered in private conversions only (1 today in > >> >the oak call and 2 in a private discussion with tom). And > >> >having heard didn't make me more confident that the solution > >> >you propose is the right thing to do. > >> > > >> >Kind regards > >> >Angela > >> > > >> >On 11/05/16 12:17, "Chetan Mehrotra" <chetan.mehro...@gmail.com> > wrote: > >> > > >> >>Hi Angela, > >> >> > >> >>On Tue, May 10, 2016 at 9:49 PM, Angela Schreiber <anch...@adobe.com> > >> >>wrote: > >> >> > >> >>> Quite frankly I would very much appreciate if took the time to > >>collect > >> >>> and write down the required (i.e. currently known and expected) > >> >>> functionality. > >> >>> > >> >>> Then look at the requirements and look what is wrong with the > >>current > >> >>> API that we can't meet those requirements: > >> >>> - is it just missing API extensions that can be added with moderate > >> >>>eff
Re: svn commit: r1724598 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/api/ main/java/org/apache/jackrabbit/oak/plugins/document/rdb/ main/java/org/apache/jackrabbit/oak
On Thu, Jan 14, 2016 at 6:40 PM, <resc...@apache.org> wrote: > > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/api/Blob.java > > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDocumentStore.java > > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/BinaryImpl.java I see some changes to Blob/BinaryImpl. Are those change related to this issue? Most likely look like a noise but just wanted to confirm Chetan Mehrotra
Re: svn commit: r1725250 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/ oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/atomic/ oak-core/src/test/java/org/apach
Hi Davide, On Mon, Jan 18, 2016 at 5:46 PM, <dav...@apache.org> wrote: > + */ > +public AtomicCounterEditorProvider() { > +clusterSupplier = new Supplier() { > +@Override > +public Clusterable get() { > +return cluster.get(); > +} > +}; > +schedulerSupplier = new Supplier() { > +@Override > +public ScheduledExecutorService get() { > +return scheduler.get(); > +} > +}; > +storeSupplier = new Supplier() { > +@Override > +public NodeStore get() { > +return store.get(); > +} > +}; > +wbSupplier = new Supplier() { > +@Override > +public Whiteboard get() { > +return whiteboard.get(); > +} > +}; > +} Just curious about use of above approach. Is it for keeping the dependencies as non static or using final instance variable? If you mark references as static then all those bind and unbind method would not be required as by the time component is active the dependencies would be set. Chetan Mehrotra
Re: [Oak origin/1.4] Apache Jackrabbit Oak matrix - Build # 992 - Still Failing
Thanks for the link. Would followup on the issue and have it fixed in branches Chetan Mehrotra On Mon, Jun 27, 2016 at 5:11 PM, Julian Reschke <julian.resc...@gmx.de> wrote: > On 2016-06-27 13:31, Chetan Mehrotra wrote: >> >> On Sat, Jun 25, 2016 at 10:24 AM, Apache Jenkins Server >> <jenk...@builds.apache.org> wrote: >>> >>> Caused by: java.lang.IllegalArgumentException: No enum constant >>> org.apache.jackrabbit.oak.commons.FixturesHelper.Fixture.SEGMENT_TAR >>> at java.lang.Enum.valueOf(Enum.java:238) >>> at >>> org.apache.jackrabbit.oak.commons.FixturesHelper$Fixture.valueOf(FixturesHelper.java:45) >>> at >>> org.apache.jackrabbit.oak.commons.FixturesHelper.(FixturesHelper.java:58) >> >> >> The test are failing due to above issue. Is this related to presence >> of new segment-tar module in trunk but not in branch? >> >> Chetan Mehrotra > > > -> <https://issues.apache.org/jira/browse/OAK-4475>
Re: [VOTE] Release Apache Jackrabbit Oak 1.4.4
On Mon, Jun 27, 2016 at 10:43 AM, Amit Jain <am...@apache.org> wrote: [X] +1 Release this package as Apache Jackrabbit Oak 1.4.4 Chetan Mehrotra
Re: svn commit: r1750601 - in /jackrabbit/oak/trunk: oak-segment-tar/ oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/ oak-segment-tar/src/test/java/org/apache/jackrabbit/oak/segment/
On Wed, Jun 29, 2016 at 1:25 PM, Francesco Mari <mari.france...@gmail.com> wrote: > oak-segment-tar should be releasable at any time. If I had to launch a quick patch release this morning, I would have to either revert your commit or postpone my release until Oak is released. Given the current release frequency on trunk (2 week) I do not think it should be a big problem and holding of commits break the continuity and increases work. But then that might be just an issue for me! For now I have reverted the changes from oak-segment-tar Chetan Mehrotra
Re: svn commit: r1750601 - in /jackrabbit/oak/trunk: oak-segment-tar/ oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/ oak-segment-tar/src/test/java/org/apache/jackrabbit/oak/segment/
Hi Francesco, On Wed, Jun 29, 2016 at 12:49 PM, Francesco Mari <mari.france...@gmail.com> wrote: > Please do not change the "oak.version" property to a snapshot version. If > your change relies on code that is only available in the latest snapshot of > Oak, please revert this commit and hold it back until a proper release of > Oak is performed. I can do that but want to understand the impact here if we switched to SNAPSHOT version? For e.g. in the past we had done some changes in jackrabbit which is need in oak then we had switched to snapshot version of JR2 and later reverted to released version once JR2 release is done. That has worked fine so far and we did not had to hold the feature work for that. So want to understand why it should be different here Chetan Mehrotra
Re: svn commit: r1728341 - /jackrabbit/oak/trunk/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentGraph.java
On Fri, Feb 5, 2016 at 2:54 PM, Michael Dürig <mdue...@apache.org> wrote: > There's always another library ;-) For utility stuff well almost ! Chetan Mehrotra
Re: svn commit: r1727311 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/osgi/OsgiWhiteboard.java test/java/org/apache/jackrabbit/oak/osgi/OsgiWhiteboardTest.java
On Fri, Jan 29, 2016 at 4:08 PM, Michael Dürig <mdue...@apache.org> wrote: > > Shouldn't we make this volatile? Ack. Would do that Chetan Mehrotra
Re: svn commit: r1728341 - /jackrabbit/oak/trunk/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentGraph.java
On Wed, Feb 3, 2016 at 10:17 PM, <mdue...@apache.org> wrote: > +private static String toString(Throwable e) { > +StringWriter sw = new StringWriter(); > +PrintWriter pw = new PrintWriter(sw, true); > +try { > +e.printStackTrace(pw); > +return sw.toString(); > +} finally { > +pw.close(); > +} > } > + May be use com.google.common.base.Throwables#getStackTraceAsString Chetan Mehrotra
Re: testing blob equality
On Mon, Feb 29, 2016 at 6:42 PM, Tomek Rekawek <reka...@adobe.com> wrote: > I wonder if we can switch the order of length and identity comparison in > AbstractBlob#equal() method. Is there any case in which the > getContentIdentity() method will be slower than length()? That can be switched but I am afraid that it would not work as expected. In JackrabbitNodeState#createBlob determining the contentIdentity involves determining the length. You can give org.apache.jackrabbit.oak.upgrade.blob.LengthCachingDataStore a try (See OAK-2882 for details) Chetan Mehrotra
Re: R: info about jackrabbitoak.
On Wed, Feb 24, 2016 at 2:46 PM, Ancona Francesco <francesco.anc...@siav.it> wrote: > that the project depends on felix (osgi) dependency. It does not depend on Felix framework but some modules from Felix project. There is a webapp example [1] where you can deploy the war on Tomcat/WebContainer and have your code in the war access repository instance Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/webapp
Re: Issue using the text extraction with lucene
On Sat, Jan 23, 2016 at 9:34 PM, Stephan Becker <stephan.bec...@netcentric.biz> wrote: > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.commons.csv.CSVFormat.withIgnoreSurroundingSpaces()Lorg/apache/commons/csv/CSVFormat; Looks like tika-app-1.11 is using commons-csv 1.0 [1] while Oak uses 1.1 and CSVFormat.withIgnoreSurroundingSpaces is added in v1.1. We tested it earlier with Tika 1.6. So you can try adding commons-csv jar as the first one in the classpath java -cp commons-csv-1.1.jar:tika-app-1.11.jar:oak-run-1.2.4.jar Chetan Mehrotra [1]http://svn.apache.org/viewvc/tika/tags/1.11-rc1/tika-parsers/pom.xml?view=markup#l328
Re: Issue using the text extraction with lucene
On Sun, Jan 24, 2016 at 2:28 AM, Stephan Becker <stephan.bec...@netcentric.biz> wrote: > How does it then further extract the > text from added documents? Currently the extracted text support does not allow updates i.e. it only has extracted text at the time when extraction is done via the tool. Later extracted text would not be added. The primary aim was to speed up indexing time in migration. Chetan Mehrotra
Re: Restructure docs
On Wed, Jan 20, 2016 at 2:46 PM, Davide Giannella <dav...@apache.org> wrote: > When you change/add/remove an item from the left-hand menu, you'll have > to redeploy the whole site as it will be hardcoded within the html of > each page. Deploying the whole website is a long process. Therefore > limiting the changes over there make things faster. I mostly do partial commit i.e. only the modified page and it ha worked well. Changing of left side menu is not a very frequent task and for that I think doing full deploy of site is fine for now Chetan Mehrotra
Re: JUnit tests with FileDataStore
To make use of FileDataStore you would need to configure a SegmentNodeStore as MemoryNodeStore does not allow plugging in custom BlobStore Have a look at snippet [1] for a possible approach Chetan Mehrotra [1] https://gist.github.com/chetanmeh/6242d0a7fe421955d456 On Wed, Jan 27, 2016 at 6:42 AM, Tobias Bocanegra <tri...@apache.org> wrote: > Hi, > > I have some tests in filevault that I want to run with the > FileDataStore, but I couldn't figure out how to setup the repository > correctly here [0]. I also looked at the tests in oak, but I couldn't > find a valid reference. > > The reason for this is to test the binary references, which afaik only > work with the FileDataStore. > at least my test [1] works with jackrabbit, but not for oak. > > thanks. > regards, toby > > [0] > https://github.com/apache/jackrabbit-filevault/blob/trunk/vault-core/src/test/java/org/apache/jackrabbit/vault/packaging/integration/IntegrationTestBase.java#L118-L120 > [1] > https://github.com/apache/jackrabbit-filevault/blob/trunk/vault-core/src/test/java/org/apache/jackrabbit/vault/packaging/integration/TestBinarylessExport.java
Re: parent pom env.OAK_INTEGRATION_TESTING
On Tue, Mar 22, 2016 at 9:49 PM, Davide Giannella <dav...@apache.org> wrote: > I can't really recall why and if we use this. Its referred to in main README.md so as to allow a developer to always enable running of integration test Chetan Mehrotra
Re: oak-resilience
Cool stuff Tomek! This was something which was discussed in last Oakathon so great to have a way to do resilience testing programatically. Would give it a try Chetan Mehrotra On Mon, Mar 7, 2016 at 1:49 PM, Stefan Egli <stefane...@apache.org> wrote: > Hi Tomek, > > Would also be interesting to see the effect on the leases and thus > discovery-lite under high memory load and network problems. > > Cheers, > Stefan > > On 04/03/16 11:13, "Tomek Rekawek" <reka...@adobe.com> wrote: > >>Hello, >> >>For some time I've worked on a little project called oak-resilience. It >>aims to be a resilience testing framework for the Oak. It uses >>virtualisation to run Java code in a controlled environment, that can be >>spoilt in different ways, by: >> >>* resetting the machine, >>* filling the JVM memory, >>* filling the disk, >>* breaking or deteriorating the network. >> >>I described currently supported features in the README file [1]. >> >>Now, once I have a hammer I'm looking for a nail. Could you share your >>thoughts on areas/features in Oak which may benefit from being >>systematically tested for the resilience in the way described above? >> >>Best regards, >>Tomek >> >>[1] >>https://github.com/trekawek/jackrabbit-oak/tree/resilience/oak-resilience >> >>-- >>Tomek Rękawek | Adobe Research | www.adobe.com >>reka...@adobe.com >> > >
Re: svn commit: r1737349 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBConnectionHandler.java
On Fri, Apr 1, 2016 at 6:40 PM, Julian Reschke <julian.resc...@gmx.de> wrote: > Did you benchmark System.currentTimeMillis() as opposed to checking the log > level? Well time taken by single isDebugEnabled would always be less than System.currentTimeMillis() + isDebugEnabled! In this case it anyway does not matter much as remote call would have much more overhead. Suggestion here was more to have a consistent way of doing such things but not a hard requirement per se ... Chetan Mehrotra
Re: svn commit: r1737349 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBConnectionHandler.java
Hi Julian, On Fri, Apr 1, 2016 at 5:19 PM, <resc...@apache.org> wrote: > +@Nonnull > +private Connection getConnection() throws IllegalStateException, > SQLException { > +long ts = System.currentTimeMillis(); > +Connection c = getDataSource().getConnection(); > +if (LOG.isDebugEnabled()) { > +long elapsed = System.currentTimeMillis() - ts; > +if (elapsed >= 100) { > +LOG.debug("Obtaining a new connection from " + this.ds + " > took " + elapsed + "ms"); > +} > +} > +return c; > +} You can also use PerfLogger here which is also used in other places in DocumentNodeStore --- final PerfLogger PERFLOG = new PerfLogger( LoggerFactory.getLogger(DocumentNodeStore.class.getName() + ".perf")); final long start = PERFLOG.start(); Connection c = getDataSource().getConnection(); PERFLOG.end(start, 100, "Obtaining a new connection from {} ", ds); --- This would also avoid the call to System.currentTimeMillis() if debug log is not enabled Chetan Mehrotra
Re: [VOTE] Please vote for the final name of oak-segment-next
Missed sending nomination on earlier thread. If not late then one more proposal oak-segment-v2 This is somewhat similar to names used in Mongo mmapv1 and mmapv2. Chetan Mehrotra On Tue, Apr 26, 2016 at 2:32 PM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > oak-segment-store +1 > > Regards, > Tommaso > > Il giorno lun 25 apr 2016 alle ore 16:52 Vikas Saurabh < > vikas.saur...@gmail.com> ha scritto: > > > > oak-embedded-store +1 > > > > > > Thanks, > > Vikas > > >
Re: API proposal for - Expose URL for Blob source (OAK-1963)
To highlight - As mentioned earlier the user of proposed api is tying itself to implementation details of Oak and if this changes later then that code would also need to be changed. Or as Ian summed it up > if the API is introduced it should create an out of band agreement with the consumers of the API to act responsibly. The method is to be used for those important case where you do rely on implementation detail to get optimal performance in very specific scenarios. Its like DocumentNodeStore making use of some Mongo specific API to perform some important critical operation to achieve better performance by checking if the underlying DocumentStore is Mongo based. I have seen discussion of JCR-3534 and other related issue but still do not see any conclusion on how to answer such queries where direct access to blobs is required for performance aspect. This issue is not about exposing the blob reference for remote access but more about optimal path for in VM access > who owns the resource? Who coordinates (concurrent) access to it and how? What are the correctness and performance implications here (races, deadlock, corruptions, JCR semantics)? The client code would need to be implemented in a proper way. Its more like implementing a CommitHook. If implemented in incorrect way it would cause issues deadlocks etc. But then we assume that any one implementing that interface would take proper care in implementation. > it limits implementation freedom and hinders further evolution (chunking, de-duplication, content based addressing, compression, gc, etc.) for data stores. As mentioned earlier. Some part of API indicates a closer dependency on how things work (like SPI, or ConsumerType AP on OSGi terms). By using such API client code definitely ties itself to Oak implementation detail but it should not limit how Oak implementation detail evolve. So when it changes client code need to adapt itself accordingly. Oak can express that by increment the minor version of exported package to indicate change in behavior. > bypassing JCR's security model I yet do not see the attack vector which we need to defend differently here. Again the blob url is not being exposed say as part of webdav or any other remote call. So would like to understand the security concern better here (unless it defending against a malicious , badly implemented client code which we discussed above) > Can't we come up with an API that allows the blobs to stay under control of Oak? The code need to work either at OS level say file handle or say S3 object. So I do not see a way where it can work without having access to those details FWIW there is code out there which reverse engineers the blobId to access the actual binary. People do it so as to get decent throughput in image rendition logic for large scale deployment. The proposal here was to formalize that approach by providing a proper api. If we do not provide such an API then the only way for them would be to continue relying on reverse engineering the blobId! > If not, this is probably an indication that those blobs shouldn't go into Oak but just references to it as Francesco already proposed. Anything else is whether fish nor fowl: you can't have the JCR goodies but at the same time access underlying resources at will. Thats a fine argument to make. But then users here have real problem to solve which we should not ignore. Oak based systems are being proposed for large asset deployment where one of the primary requirement is asset handling/processing of 100 of TB of binary data. So we would then have to recommend for such cases to not use JCR Binary abstraction and manage the binaries on your own. That would then solve both the problems (that might though break lots of tooling build on top of JCR API to manage those binaries)! Thinking more - Another approach that I can then suggest it people implement there own BlobStore (may be by extending ours) and provide this API there i.e. say which takes Blob id and provide the required details. This way we "outsource" the problem. Would that be acceptable? Chetan Mehrotra On Mon, May 9, 2016 at 2:28 PM, Michael Dürig <mdue...@apache.org> wrote: > > Hi, > > I very much share Francesco's concerns here. Unconditionally exposing > access to operation system resources underlying Oak's inner working is > troublesome for various reasons: > > - who owns the resource? Who coordinates (concurrent) access to it and > how? What are the correctness and performance implications here (races, > deadlock, corruptions, JCR semantics)? > > - it limits implementation freedom and hinders further evolution > (chunking, de-duplication, content based addressing, compression, gc, etc.) > for data stores. > > - bypassing JCR's security model > > Pretty much all of this has been discussed in the scope of > https://issues.apache.org/jira/browse/JCR-3534 and > https://is
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Had an offline discussion with Michael on this and explained the usecase requirement in more details. One concern that has been raised is that such a generic adaptTo API is too inviting for improper use and Oak does not have any context around when this url is exposed for what time it is used. So instead of having a generic adaptTo API at JCR level we can have a BlobProcessor callback (Approach #B). Below is more of a strawman proposal. Once we have a consensus then we can go over the details interface BlobProcessor { void process(AdaptableBlob blob); } Where AdaptableBlob is public interface AdaptableBlob { AdapterType adaptTo(Class type); } The BlobProcessor instance can be passed via BlobStore API. So client would look for a BlobStore service (so use the Oak level API) and pass it the ContentIdentity of JCR Binary aka blobId interface BlobStore{ void process(String blobId, BlobProcessor processor) } The approach ensures 1. That any blob handle exposed is only guaranteed for the duration of 'process' invocation 2. There is no guarantee on the utility of blob handle (File, S3 Object) beyond the callback. So one should not collect the passed File handle for later use Hopefully this should address some of the concerns raised in this thread. Looking forward to feedback :) Chetan Mehrotra On Mon, May 9, 2016 at 6:24 PM, Michael Dürig <mdue...@apache.org> wrote: > > > On 9.5.16 11:43 , Chetan Mehrotra wrote: > >> To highlight - As mentioned earlier the user of proposed api is tying >> itself to implementation details of Oak and if this changes later then >> that >> code would also need to be changed. Or as Ian summed it up >> >> if the API is introduced it should create an out of band agreement with >>> >> the consumers of the API to act responsibly. >> > > So what does "to act responsibly" actually means? Are we even in a > position to precisely specify this? Experience tells me that we only find > out about those semantics after the fact when dealing with painful and > expensive customer escalations. > > And even if we could, it would tie Oak into very tight constraints on how > it has to behave and how not. Constraints that would turn out prohibitively > expensive for future evolution. Furthermore a huge amount of resources > would be required to formalise such constraints via test coverage to guard > against regressions. > > > >> The method is to be used for those important case where you do rely on >> implementation detail to get optimal performance in very specific >> scenarios. Its like DocumentNodeStore making use of some Mongo specific >> API >> to perform some important critical operation to achieve better performance >> by checking if the underlying DocumentStore is Mongo based. >> > > Right, but the Mongo specific API is a (hopefully) well thought through > API where as with your proposal there are a lot of open questions and > concerns as per my last mail. > > Mongo (and any other COTS DB) for good reasons also don't give you direct > access to its internal file handles. > > > >> I have seen discussion of JCR-3534 and other related issue but still do >> not >> see any conclusion on how to answer such queries where direct access to >> blobs is required for performance aspect. This issue is not about exposing >> the blob reference for remote access but more about optimal path for in VM >> access >> > > One bottom line of the discussions in that issue is that we came to a > conclusion after clarifying the specifics of the use case. Something I'm > still missing here. The case you brought forward is too general to serve as > a guideline for a solution. Quite to the contrary, to me it looks like a > solution to some problem (I'm trying to understand). > > > >> who owns the resource? Who coordinates (concurrent) access to it and how? >>> >> What are the correctness and performance implications here (races, >> deadlock, corruptions, JCR semantics)? >> >> The client code would need to be implemented in a proper way. Its more >> like >> implementing a CommitHook. If implemented in incorrect way it would cause >> issues deadlocks etc. But then we assume that any one implementing that >> interface would take proper care in implementation. >> > > But a commit hook is an internal SPI. It is not advertised to the whole > world as a public API. > > > >> it limits implementation freedom and hinders further evolution >>> >> (chunking, de-duplication, content based addressing, compression, gc, >> etc.) >> for data stores. >> >> As mentioned earlier. Some part of API indicates a clo
Re: API proposal for - Expose URL for Blob source (OAK-1963)
> what guarantees do/can we give re. this file handle within this context. Can it suddenly go away (e.g. because of gc or internal re-organisation)? How do we establish, test and maintain (e.g. from regressions) such guarantees? Logically it should not go away suddenly. So GC logic should be aware of such "inUse" instances (there is already such support for inUse cases). Such a requirement can be validated via integration testcase > and more concerningly, how do we protect Oak from data corruption by misbehaving clients? E.g. clients writing on that handle or removing it? Again, if this is public API we need ways to test this. Not sure by misbehaving client - Is it malicious (by design) or badly written code. For later yes that might pose a problem but we can have some defense. I would expect the code making use of the api to behave properly. In addition as proposed above [1] for FileDataStore we can provide a symlinked file reference which exposes a read only file handle. For S3DataStore code should have access to aws credentials to perform any write operation, which should be a sufficient defense > In an earlier mail you quite fittingly compared this to commit hooks, which for good reason are an internal SPI. Bit of nit pick here ;) As per Jcr class [1] one can provide a CommitHook instance so not sure if we can term it internal. However point that I wanted to emphasize is that Oak does provide some critical extension point and with a misbehaving code one can shoot himself at foot and as implementation only so much can be done. regards Chetan [1] http://markmail.org/thread/6mq4je75p64c5nyn#query:+page:1+mid:237kzuhor5y3tpli+state:results [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/Jcr.java#L190 Chetan Mehrotra
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Hi Angela, On Tue, May 10, 2016 at 9:49 PM, Angela Schreiber <anch...@adobe.com> wrote: > Quite frankly I would very much appreciate if took the time to collect > and write down the required (i.e. currently known and expected) > functionality. > > Then look at the requirements and look what is wrong with the current > API that we can't meet those requirements: > - is it just missing API extensions that can be added with moderate effort? > - are there fundamental problems with the current API that we needed to > address? > - maybe we even have intrinsic issues with the way we think about the role > of the repo? > > IMHO, sticking to kludges might look promising on a short term but > I am convinced that we are better off with a fundamental analysis of > the problems... after all the Binary topic comes up on a regular basis. > That leaves me with the impression that yet another tiny extra and > adaptables won't really address the core issues. > Makes sense. Have a look in of the initial mail in the thread at [1] which talks about the 2 usecase I know of. The image rendition usecase manifest itself in one form or other, basically providing access to Native programs via file path reference. The approach proposed so far would be able to address them and hence closer to "is it just missing API extensions that can be added with moderate effort?". If there are any other approach we can address both of the referred usecases then we implement them. Let me know if more details are required. If required I can put it up on a wiki page also. Chetan Mehrotra [1] http://markmail.org/thread/6mq4je75p64c5nyn#query:+page:1+mid:zv5dzsgmoegupd7l+state:results
API proposal for - Expose URL for Blob source (OAK-1963)
Hi Team, For OAK-1963 we need to allow access to actaul Blob location say in form File instance or S3 object id etc. This access is need to perform optimized IO operation around binary object e.g. 1. The File object can be used to spool the file content with zero copy using NIO by accessing the File Channel directly [1] 2. Client code can efficiently replicate a binary stored in S3 by having direct access to S3 object using copy operation To allow such access we would need a new API in the form of AdaptableBinary. API === public interface AdaptableBinary { /** * Adapts the binary to another type like File, URL etc * * @param The generic type to which this binary is adapted *to * @param type The Class object of the target type, such as *File.class * @return The adapter target or null if the binary cannot * adapt to the requested type */ AdapterType adaptTo(Class type); } Usage = Binary binProp = node.getProperty("jcr:data").getBinary(); //Check if Binary is of type AdaptableBinary if (binProp instanceof AdaptableBinary){ AdaptableBinary adaptableBinary = (AdaptableBinary) binProp; //Adapt it to File instance File file = adaptableBinary.adaptTo(File.class); } The Binary instance returned by Oak i.e. org.apache.jackrabbit.oak.plugins.value.BinaryImpl would then implement this interface and calling code can then check the type and cast it and then adapt it Key Points 1. Depending on backing BlobStore the binary can be adapted to various types. For FileDataStore it can be adapted to File. For S3DataStore it can either be adapted to URL or some S3DataStore specific type. 2. Security - Thomas suggested that for better security the ability to adapt should be restricted based on session permissions. So if the user has required permission then only adaptation would work otherwise null would be returned. 3. Adaptation proposal is based on Sling Adaptable [2] 4. This API is for now exposed only at JCR level. Not sure should we do it at Oak level as Blob instance are currently not bound to any session. So proposal is to place this in 'org.apache.jackrabbit.oak.api' package Kindly provide your feedback! Also any suggestion/guidance around how the access control be implemented Chetan Mehrotra [1] http://www.ibm.com/developerworks/library/j-zerocopy/ [2] https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/Adaptable.html
Re: API proposal for - Expose URL for Blob source (OAK-1963)
On Wed, May 4, 2016 at 10:07 PM, Ian Boston <i...@tfd.co.uk> wrote: > If the File or URL is writable, will writing to the location cause issues > for Oak ? > Yes that would cause problem. Expectation here is that code using a direct location needs to behave responsibly. Chetan Mehrotra
Re: API proposal for - Expose URL for Blob source (OAK-1963)
On Thu, May 5, 2016 at 4:38 PM, Francesco Mari <mari.france...@gmail.com> wrote: > The security concern is quite easy to explain: it's a bypass of our > security model. Imagine that, using a session with the appropriate > privileges, a user accesses a Blob and adapts it to a file handle, an S3 > bucket or a URL. This code passes this reference to another piece of code > that modifies the data directly even if - in the same deployment - it > shouldn't be able to access the Blob instance to begin with. > How is this different from the case where a code obtains a Node via an admin session and passes that Node instance to another code which say deletes important content via it. In the end we have to trust the client code to do correct thing when given appropriate rights. So in current proposal the code can only adapt the binary if the session has expected permissions. Post that we need to trust the code to behave properly. > In both the use case, the customer is coupling the data with the most > appropriate storage solution for his business case. In this case, customer > code - and not Oak - should be responsible for the management of that data. Well then it means that customer implements its very own DataStore like solution and all the application code do not make use of JCR Binary and instead use another service to resolve the references. This would greatly reduce the usefulness of JCR for asset heavy application which use JCR to manage binary content along with its metadata Chetan Mehrotra
Re: API proposal for - Expose URL for Blob source (OAK-1963)
> This proposal introduces a huge leak of abstractions and has deep security implications. I understand the leak of abstractions concern. However would like to understand the security concern bit more. One way I can think of that it can cause security concern is you have some malicious code running in same jvm which can then do bad things with the file handle. Do note that the File handle would not get exposed via any remoting api we currently support. Now in this case if malicious code is already running in same jvm then security is breached and code can anyway make use of reflection to access internal details. So if there is any other possible security concern then would like to discuss. Coming to usecases Usecase A - Image rendition generation - We have some bigger deployments where lots of images gets uploaded to the repository and there are some conversions (rendition generation) which are performed by OS specific native executables. Such programs work directly on file handle. Without this change currently we need to first spool the file content into some temporary location and then pass that to the other program. This add unnecessary overhead and something which can be avoided in case there is a FileDataStore being used where we can provide a direct access to the file Usecase B - Efficient replication across regions in S3 -- This for AEM based setup which is running on Oak with S3DataStore. There we have global deployment where author instance is running in 1 region and binary content is to be distributed to publish instances running in different regions. The DataStore size is huge say 100TB and for efficient operation we need to use Binary less replication. In most cases only a very small subset of binary content would need to be present in other regions. Current way (via shared DataStore) to support that would involve synchronizing the S3 bucket across all such regions which would increase the storage cost considerable. Instead of that plan is to replicate the specific assets via s3 copy operation. This would ensure that big assets can be copied efficiently at S3 level and that would require direct access to the S3 object. Again in all such cases one can always resort to current level support i.e. copy over all the content via inputstream into some temporary store and then use that. But that would add considerable overhead when assets are of 100MB sizes or more. So the approach proposed would allow client code to this efficiently depending on the underlying storage capability > To me sounds like breaching the JCR and NodeState layers to directly > manipulate NodeStore binaries (from the DataStore), e.g. to perform smart > replication across different instances, but imho the right way to address > that is extending one of the current DataStore implementations or create a > new one. The original proposed approach in OAK-1963 was like that i.e. introduce this access method on BlobStore which works on reference. But in that case client code would need to deal with BlobStore API. In either case access to actual binary storage data would be required Chetan Mehrotra On Thu, May 5, 2016 at 2:49 PM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > +1 to Francesco's concerns, exposing the location of a binary at the > application level doesn't sound good from a security perspective. > To me sounds like breaching the JCR and NodeState layers to directly > manipulate NodeStore binaries (from the DataStore), e.g. to perform smart > replication across different instances, but imho the right way to address > that is extending one of the current DataStore implementations or create a > new one. > I am also concerned that this Adaptable pattern would open room for other > such hacks into the stack. > > My 2 cents, > Tommaso > > > Il giorno gio 5 mag 2016 alle ore 11:00 Francesco Mari < > mari.france...@gmail.com> ha scritto: > > > This proposal introduces a huge leak of abstractions and has deep > security > > implications. > > > > I guess that the reason for this proposal is that some users of Oak would > > like to perform some operations on binaries in a more performant way by > > leveraging the way those binaries are stored. If this is the case, I > > suggest those users to evaluate an applicative solution implemented on > top > > of the JCR API. > > > > If a user needs to store some important binary data (files, images, etc.) > > in an S3 bucket or on the file system for performance reasons, this > > shouldn't affect how Oak handles blobs internally. If some assets are of > > special interest for the user, then the user should bypass Oak and take > > care of the storage of those assets directly. Oak can
Re: API proposal for - Expose URL for Blob source (OAK-1963)
On Thu, May 5, 2016 at 5:07 PM, Francesco Mari <mari.france...@gmail.com> wrote: > > This is a totally different thing. The change to the node will be committed > with the privileges of the session that retrieved the node. If the session > doesn't have enough privileges to delete that node, the node will be > deleted, There is no escape from the security model. A "bad code" when passes a node backed via admin session can still do bad thing as admin session has all the privileges. In same way if a bad code is passed a file handle then it can cause issue. So I am still not sure on the attack vector which we are defending against. Chetan Mehrotra
Re: API proposal for - Expose URL for Blob source (OAK-1963)
Some more points around the proposed callback based approach 1.Possible security or enforcing a read only access to the exposed file - The file provided within the BlobProcessor callback can be a symlink created with a os user account which only has read only access. The symlink can be removed once the callback returns 2. S3 DataStore Security Concern - For S3 DataStore we would only be exposing the S3 object identifier and the client code would still need the aws credentials to connect to the bucket and perform required copy operation 3. Possibility of further optimization in S3DataStore processing - Currently when reading a binary from S3DataStore the binary content are *always* spooled to some local temporary file (in local cache) and then a InputStream is opened on that file. So even if the code need to read initial few bytes of stream the whole file would have to be read. This happens because with current JCR Binary API we are not in control of lifetime of exposed InputStream. So if say we expose the InputStream we cannot determine untill when the backing S3 SDK resources need to be held Also current S3DataStore always creates local copy - With a callback based approach we can safely expose this file which would allow layers above to avoid spooling the content again locally for processing. And with callback boundary we can later do required cleanup Chetan Mehrotra On Mon, May 9, 2016 at 7:15 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Had an offline discussion with Michael on this and explained the usecase > requirement in more details. One concern that has been raised is that such > a generic adaptTo API is too inviting for improper use and Oak does not > have any context around when this url is exposed for what time it is used. > > So instead of having a generic adaptTo API at JCR level we can have a > BlobProcessor callback (Approach #B). Below is more of a strawman proposal. > Once we have a consensus then we can go over the details > > interface BlobProcessor { >void process(AdaptableBlob blob); > } > > Where AdaptableBlob is > > public interface AdaptableBlob { > AdapterType adaptTo(Class type); > } > > The BlobProcessor instance can be passed via BlobStore API. So client > would look for a BlobStore service (so use the Oak level API) and pass it > the ContentIdentity of JCR Binary aka blobId > > interface BlobStore{ > void process(String blobId, BlobProcessor processor) > } > > The approach ensures > > 1. That any blob handle exposed is only guaranteed for the duration > of 'process' invocation > 2. There is no guarantee on the utility of blob handle (File, S3 Object) > beyond the callback. So one should not collect the passed File handle for > later use > > Hopefully this should address some of the concerns raised in this thread. > Looking forward to feedback :) > > Chetan Mehrotra > > On Mon, May 9, 2016 at 6:24 PM, Michael Dürig <mdue...@apache.org> wrote: > >> >> >> On 9.5.16 11:43 , Chetan Mehrotra wrote: >> >>> To highlight - As mentioned earlier the user of proposed api is tying >>> itself to implementation details of Oak and if this changes later then >>> that >>> code would also need to be changed. Or as Ian summed it up >>> >>> if the API is introduced it should create an out of band agreement with >>>> >>> the consumers of the API to act responsibly. >>> >> >> So what does "to act responsibly" actually means? Are we even in a >> position to precisely specify this? Experience tells me that we only find >> out about those semantics after the fact when dealing with painful and >> expensive customer escalations. >> >> And even if we could, it would tie Oak into very tight constraints on how >> it has to behave and how not. Constraints that would turn out prohibitively >> expensive for future evolution. Furthermore a huge amount of resources >> would be required to formalise such constraints via test coverage to guard >> against regressions. >> >> >> >>> The method is to be used for those important case where you do rely on >>> implementation detail to get optimal performance in very specific >>> scenarios. Its like DocumentNodeStore making use of some Mongo specific >>> API >>> to perform some important critical operation to achieve better >>> performance >>> by checking if the underlying DocumentStore is Mongo based. >>> >> >> Right, but the Mongo specific API is a (hopefully) well thought through >> API where as with your proposal there are a lot of open questions and >> concerns as per my last mail. >
Re: [VOTE] Release Apache Jackrabbit Oak 1.2.14
On Wed, Apr 20, 2016 at 10:25 AM, Amit Jain <am...@apache.org> wrote: > [ ] +1 Release this package as Apache Jackrabbit Oak 1.2.14 All checks ok Chetan Mehrotra
Re: Way to capture metadata related to commit as part of CommitInfo from within CommitHook
On Wed, Aug 3, 2016 at 8:57 PM, Michael Dürig <mdue...@apache.org> wrote: > I would suggest to add an new, internal mechanism to CommitInfo for your > purpose. So introduce a new CommitAttributes instance which would be returned by CommitInfo ... ? Chetan Mehrotra
Re: Using same index definition for both async and sync indexing
On Wed, Aug 3, 2016 at 7:52 PM, Alex Parvulescu <alex.parvule...@gmail.com> wrote: > sounds interesting, this looks like a good option. > Now comes the hard part ... what should be the name of this new interface ;) ContextualIndexEditorProvider? Chetan Mehrotra
Re: Way to capture metadata related to commit as part of CommitInfo from within CommitHook
That would depend on the CommitHook impl which client code would not be aware of. And commit hook would also know only as commit traversal is done. So it needs to be some mutable state Chetan Mehrotra On Wed, Aug 3, 2016 at 8:27 PM, Michael Dürig <mdue...@apache.org> wrote: > > Couldn't we keep the map immutable and instead add some "WhateverCollector" > instances as values? E.g. add a AffectedNodeTypeCollector right from the > beginning? > > Michael > > > > On 3.8.16 4:06 , Chetan Mehrotra wrote: >> >> So would it be ok to make the map within CommitInfo mutable ? >> Chetan Mehrotra >> >> >> On Wed, Aug 3, 2016 at 7:29 PM, Michael Dürig <mdue...@apache.org> wrote: >>> >>> >>>> >>>> #A -Probably we can introduce a new type CommitAttributes which can be >>>> attached to CommitInfo and which can be modified by the CommitHooks. >>>> The CommitAttributes can then later be accessed by Observer >>> >>> >>> >>> This is already present via the CommitInfo.info map. It is even used in a >>> similar way. See CommitInfo.getPath() and its usages. AFAIU the only part >>> where your cases would differ is that the information is assembled by >>> some >>> commit hooks instead of being provided at the point the commit was >>> initiated. >>> >>> >>> Michael
Re: Way to capture metadata related to commit as part of CommitInfo from within CommitHook
Opened OAK-4640 to track this Chetan Mehrotra On Wed, Aug 3, 2016 at 9:36 PM, Michael Dürig <mdue...@apache.org> wrote: > > > On 3.8.16 5:58 , Chetan Mehrotra wrote: >> >> On Wed, Aug 3, 2016 at 8:57 PM, Michael Dürig <mdue...@apache.org> wrote: >>> >>> I would suggest to add an new, internal mechanism to CommitInfo for your >>> purpose. >> >> >> So introduce a new CommitAttributes instance which would be returned >> by CommitInfo ... ? > > > Probably the best of all ugly solutions yes ;-) (Meaning I don't have a > better idea neither...) > > Michael > >> >> Chetan Mehrotra >> >
Re: Using same index definition for both async and sync indexing
Opened OAK-4641 for this enhancement Chetan Mehrotra On Wed, Aug 3, 2016 at 8:00 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Wed, Aug 3, 2016 at 7:52 PM, Alex Parvulescu > <alex.parvule...@gmail.com> wrote: >> sounds interesting, this looks like a good option. >> > > Now comes the hard part ... what should be the name of this new > interface ;) ContextualIndexEditorProvider? > > Chetan Mehrotra
Re: Provide a way to pass indexing related state to IndexEditorProvider (OAK-4642)
I have updated OAK-4642 with one more option. === O4 - Similar to O2 but here instead of modifying the existing IndexUpdateCallback we can introduce a new interface ContextualCallback which extends IndexUpdateCallback and provide access to IndexingContext. Editor provider implementation can then check if the callback implements this new interface and then cast it and access the context. So only those client which are interested in new capability make use of this === So provide your feedback there or in this thread Chetan Mehrotra On Thu, Aug 4, 2016 at 12:35 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Hi Team, > > As a follow up to previous mail around "Using same index definition > for both async and sync indexing" wanted to discuss the next step. We > need to provide a way to pass indexing related state to > IndexEditorProvider (OAK-4642) > > Over the period of time I have seen need for extra state like > > 1. reindexing - Currently the index implementation use some heuristic > like check before root state being empty to determine if they are > running in reindexing mode > 2. indexing mode - sync or async > 3. index path of the index (see OAK-4152) > 4. CommitInfo (see OAK-4640) > > For #1 and #3 we have done some kind of workaround but it would be > better to have a first class support for that. > > So we would need to introduce some sort of IndexingContext and have > the api for IndexEditorProvider like below > > = > @CheckForNull > Editor getIndexEditor( > @Nonnull String type, @Nonnull NodeBuilder definition, > @Nonnull NodeState root, > @Nonnull IndexingContext context) throws CommitFailedException; > = > > To introduce such a change I see 3 options > > * O1 - Introduce a new interface which takes an {{IndexingContext}} > instance which provide access to such datapoints. This would require > some broader change > ** Whereever the IndexEditorProvider is invoked it would need to check > if the instance implements new interface. If yes then new method needs > to be used > > Overall it introduces noise. > > * O2 - Here we can introduce such data points as part of callback > interface. With this we would need to implement such methods in places > where code constructs the callback > > * O3 - Make a backward incompatible change and just modify the > existing interface and adapt the various implementation > > I am in favour of going for O3 and make this backward compatible change > > Thoughts? > > Chetan Mehrotra
Provide a way to pass indexing related state to IndexEditorProvider (OAK-4642)
Hi Team, As a follow up to previous mail around "Using same index definition for both async and sync indexing" wanted to discuss the next step. We need to provide a way to pass indexing related state to IndexEditorProvider (OAK-4642) Over the period of time I have seen need for extra state like 1. reindexing - Currently the index implementation use some heuristic like check before root state being empty to determine if they are running in reindexing mode 2. indexing mode - sync or async 3. index path of the index (see OAK-4152) 4. CommitInfo (see OAK-4640) For #1 and #3 we have done some kind of workaround but it would be better to have a first class support for that. So we would need to introduce some sort of IndexingContext and have the api for IndexEditorProvider like below = @CheckForNull Editor getIndexEditor( @Nonnull String type, @Nonnull NodeBuilder definition, @Nonnull NodeState root, @Nonnull IndexingContext context) throws CommitFailedException; = To introduce such a change I see 3 options * O1 - Introduce a new interface which takes an {{IndexingContext}} instance which provide access to such datapoints. This would require some broader change ** Whereever the IndexEditorProvider is invoked it would need to check if the instance implements new interface. If yes then new method needs to be used Overall it introduces noise. * O2 - Here we can introduce such data points as part of callback interface. With this we would need to implement such methods in places where code constructs the callback * O3 - Make a backward incompatible change and just modify the existing interface and adapt the various implementation I am in favour of going for O3 and make this backward compatible change Thoughts? Chetan Mehrotra
Re: Oak Indexing. Was Re: Property index replacement / evolution
Couple of points around the motivation, target usecase around Hybrid Indexing and Oak indexing in general. Based on my understanding of various deployments. Any application based on Oak has 2 type of query requirements QR1. Application Query - These mostly involve some property restrictions and are invoked by code itself to perform some operation. The property involved here in most cases would be sparse i.e. present in small subset of whole repository content. Such queries need to be very fast and they might be invoked very frequently. Such queries should also be more accurate and result should not lag repository state much. QR2. User provided query - These queries would consist of both or either of property restriction and fulltext constraints. The target nodes may form majority part of overall repository content. Such queries need to be fast but given user driven need not be very fast. Note that speed criteria is very subjective and relative here. Further Oak needs to support deployments 1. On single setup - For dev, prod on SegmentNodeStore 2. Cluster Setup on premise 3. Deployment in some DataCenter So Oak should enable deployments where for smaller setups it does not require any thirdparty system while still allow plugging in a dedicate system like ES/Solr if need arises. So both usecases need to be supported. And further even if it has access to such third party server it might be fine to rely on embedded Lucene for #QR1 and just delegate queries under #QR2 to remote. This would ensure that query results are still fast for usage falling under #QR1. Hybrid Index Usecase - So far for #QR1 we only had property indexes and to an extent Lucene based property index where results lag repository state and lag might be significant depending on load. Hybrid index aim to support queries under #QR1 and can be seen as replacement for existing non unique property indexes. Such indexes would have lower storage requirement and would not put much load on remote storage for execution. Its not meant as a replacement for ES/Solr but then intends to address different type of usage Very large Indexes - For deployments having very large repository Solr or ES based indexes would be preferable and there oak-solr can be used (some day oak-es!) So in brief Oak should be self sufficient for smaller deployment and still allow plugging in Solr/ES for large deployment and there also provide a choice to admin to configure a sub set of index for such usage depending on the size. Chetan Mehrotra On Thu, Aug 11, 2016 at 1:59 PM, Ian Boston <i...@tfd.co.uk> wrote: > Hi, > > On 11 August 2016 at 09:14, Michael Marth <mma...@adobe.com> wrote: > >> Hi Ian, >> >> No worries - good discussion. >> >> I should point out though that my reply to Davide was based on a >> comparison of the current design vs the Jackrabbit 2 design (in which >> indexes were stored locally). Maybe I misunderstood Davide’s comment. >> >> I will split my answer to your mail in 2 parts: >> >> >> > >> >Full text extraction should be separated from indexing, as the DS blobs >> are >> >immutable, so is the full text. There is code to do this in the Oak >> >indexer, but it's not used to write to the DS at present. It should be >> done >> >in a Job, distributed to all nodes, run only once per item. Full text >> >extraction is hugely expensive. >> >> My understanding is that Oak currently: >> A) runs full text extraction in a separate thread (separate form the >> “other” indexer) >> B) runs it only once per cluster >> If that is correct then the difference to what you mention above would be >> that you would like the FT indexing not be pinned to one instance but >> rather be distributed, say round-robin. >> Right? >> > > > Yes. > > >> >> >> >Building the same index on every node doesn't scale for the reasons you >> >point out, and eventually hits a brick wall. >> >http://lucene.apache.org/core/6_1_0/core/org/apache/ >> lucene/codecs/lucene60/package-summary.html#Limitations. >> >(Int32 on Document ID per index). One of the reasons for the Hybrid >> >approach was the number of Oak documents in some repositories will exceed >> >that limit. >> >> I am not sure what you are arguing for with this comment… >> It sounds like an argument in favour of the current design - which is >> probably not what you mean… Could you explain, please? >> > > I didn't communicate that very well. > > Currently Lucene (6.1) has a limit of Int32 to the number of documents it > can store in an index, IIUC There is a long term desire to increase that > but using Int64 but no long term commitment as its probably significant > work given arrays in Java are indexed with Int32. > > The Hybrid approach doesn't help the potential Lucene brick wall, but one > motivation for looking at it was the number of Oak Documents including > those under /oak:index which is, in some cases, approaching that limit. > > > >> >> >> Thanks! >> Michael >>
Re: Oak Indexing. Was Re: Property index replacement / evolution
On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <i...@tfd.co.uk> wrote: > Both Solr Cloud and ES address this by sharding and > replicating the indexes, so that all commits are soft, instant and real > time. That introduces problems. ... > Both Solr Cloud and ES address this by sharding and > replicating the indexes, so that all commits are soft, instant and real > time. This would really be useful. However I have couple of aspects to clear Index Update Gurantee Lets say if commit succeeds and then we update the index and index update fails for some reason. Then would that update be missed or there can be some mechanism to recover. I am not very sure about WAL here that may be the answer here but still confirming. In Oak with the way async index update works based on checkpoint its ensured that index would "eventually" contain the right data and no update would be lost. if there is a failure in index update then that would fail and next cycle would start again from same base state Order of index update - Lets say I have 2 cluster nodes where same node is being performed Original state /a {x:1} Cluster Node N1 - /a {x:1, y:2} Cluster Node N2 - /a {x:1, z:3} End State /a {x:1, y:2, z:3} At Oak level both the commits would succeed as there is no conflict. However N1 and N2 would not be seeing each other updates immediately and that would depend on background read. So in this case how would index update would look like. 1. Would index update for specific paths go to some master which would order the update 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3} Here current async index update logic ensures that it sees the eventually expected order of changes and hence would be consistent with repository state. Backup and Restore --- Would the backup now involve backup of ES index files from each cluster node. Or assuming full replication it would involve backup of files from any one of the nodes. Would the back be in sync with last changes done in repository (assuming sudden shutdown where changes got committed to repository but not yet to any index) Here current approach of storing index files as part of MVCC storage ensures that index state is consistent to some "checkpointed" state in repository. And post restart it would eventually catch up with the current repository state and hence would not require complete rebuild of index in case of unclean shutdowns Chetan Mehrotra
Re: svn commit: r1752601 - in /jackrabbit/oak/trunk/oak-segment-tar: pom.xml src/main/java/org/apache/jackrabbit/oak/segment/SegmentWriter.java
On Thu, Jul 14, 2016 at 2:04 PM, <f...@apache.org> wrote: > > +commons-math3 commons-math is a 2.1 MB jar. Would it be possible to avoid embedding it whole and only have some parts embedded/copied. (See [1] for an example) Chetan Mehrotra [1] https://issues.apache.org/jira/browse/SLING-2361
Re: Specifying threadpool name for periodic scheduled jobs (OAK-4563)
On Tue, Jul 19, 2016 at 12:54 PM, Michael Dürig <mdue...@apache.org> wrote: > For blocking or time intensive tasks I would go for a dedicated thread pool. So wrt current issue that means option #B ? Chetan Mehrotra
Re: Specifying threadpool name for periodic scheduled jobs (OAK-4563)
On Tue, Jul 19, 2016 at 1:44 PM, Stefan Egli <stefane...@apache.org> wrote: > I'd go for #A to limit cross-effects between oak and other layers. Note that for #4 there can be multiple task scheduled. So if a system has 100 JCR Listeners than there would be 1 task/listener to manage the time series stats. These should be quick and non blocking though. All other task are much more critical for repository to function properly. Hence thoughts to go for #B where we have a dedicated pool for those 'n' tasks. Where n is much small i.e. number of async lanes + 2 from DocumentNodeStore so far. So its easy to size Chetan Mehrotra
Re: Specifying threadpool name for periodic scheduled jobs (OAK-4563)
On Tue, Jul 19, 2016 at 1:21 PM, Michael Dürig <mdue...@apache.org> wrote: > Not sure as I'm confused by your description of that option. I don't > understand which of 1, 2, 3 and 4 would run in the "default pool" and which > should run in its own dedicated pool. #1, #2 and #3 would run in dedicated pool and each using same pool. Pool name would be 'oak'. Also see OAK-4563 for the patch While for #4 default pool would be used as those are non blocking and short tasks Chetan Mehrotra
Specifying threadpool name for periodic scheduled jobs (OAK-4563)
Hi Team, While running Oak in Sling we rely on Sling Scheduler [1] to execute the periodic jobs. By default Sling Scheduler uses a pool of 5 threads to run all such periodic jobs in the system. Recently we saw an issue OAK-4563 where due to some reason the pool got exhausted for long time and that prevented the async indexing job to run for long time and hence affected the query result. To address that Sling now provides a new option (SLING-5831) where one can specify the pool name to be used to execute a specific job. So we can specify custom pool which can be used for Oak related jobs. Now currently in Oak we use following types of periodic jobs 1. Async indexing - (Cluster Singleton) 2. Document Store - Journal GC (Cluster Singleton) 3. Document Store - LastRevRecovery 4. Statistic Collection - For timeseries data update in ChangeProcessor, SegmentNodeStore GCMonitor Now should we use A - one single pool for all of the above B - use the pool only for 1-3. The default pool would be of 5. So even if #2 #3 are running it would not hamper #1 Assuming #4 is not that critical to run and may consist of lots of jobs. My suggestion would be to go for #B Chetan Mehrotra [1] https://sling.apache.org/documentation/bundles/scheduler-service-commons-scheduler.html
Re: Why is nt:resource referencable?
On Wed, Jul 20, 2016 at 2:49 PM, Bertrand Delacretaz <bdelacre...@apache.org> wrote: > but the JCR spec (JSR 283 10 August 2009) only has > > [nt:resource] > mix:mimeType, mix:lastModified > primaryitem jcr:data > - jcr:data (BINARY) mandatory Thats interesting. Did not knew its not mandated in JCR 2.0. However looks like for backward compatibility we need to support it. See [1] where this was changed @Marcel - I did not understood JCR-2170 properly. But any chance we can switch to newer version of nt:resource and do not modify existing nodes and let the new definition effect/enforced only on new node. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/JCR-2170?focusedCommentId=12754941=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12754941
Re: Why is nt:resource referencable?
On Wed, Jul 20, 2016 at 4:04 PM, Marcel Reutegger <mreut...@adobe.com> wrote: > Maybe we would keep the jcr:uuid property on the referenceable node and add > the mixin? What if we do not add any mixin and just have jcr:uuid property present. The node would anyway be indexed so search would still work. Not sure if API semantics require that nodes lookedup by UUID have to be referenceable. For now I think oak:Resource is safest way. But just exploring other options if possible! Chetan Mehrotra
Re: multilingual content and indexing
On Tue, Jul 12, 2016 at 3:53 PM, Lukas Kahwe Smith <sm...@pooteeweet.org> wrote: >> Alternatively, you can create different index definitions for each subtree >> (see [1]), e.g. Using the “includedPaths” property. This would lead to >> smaller indexes at the downside that you would have to create an index >> definition if you add a new language tree. Another way would be to have your index definition under each node /content/en/oak:index/fooIndex /content/jp/oak:index/fooIndex And have each index config analyzer configured as per the language. Chetan Mehrotra
Re: [proposal] New oak:Resource nodetype as alternative to nt:resource
Thanks for the feedback. Opened OAK-4567 to track the change On Mon, Jul 18, 2016 at 12:14 PM, Angela Schreiber <anch...@adobe.com> wrote: > Additionally or alternatively we could create a separate method (e.g. > putOakFile > or putOakResource or something explicitly mentioning the non-referenceable > nature of the content) that uses 'oak:Resource' and state that it requires > the > node type to be registered and will fail otherwise... that would be as easy > to use as 'putFile', which is IMO important. @Angela - What about Justin's suggestion later around changing the current putFile implementation. Have it use oak:Resource is present otherwise fallback to nt:resource. This can lead to compatibility issue though as javadoc of putFile says it would use nt:resource Chetan Mehrotra
[proposal] New oak:Resource nodetype as alternative to nt:resource
In most cases where code uses JcrUtils.putFile [1] it leads to creation of below content structure + foo.jpg (nt:file) + jcr:content (nt:resource) - jcr:data Due to usage of nt:resource each nt:file node creates a entry in uuid index as nt:resource is referenceable [2]. So if a system has 1M nt:file nodes then we would have 1M entries in /oak:index/uuid as in most cases the files are created via [1] and hence all such files are referenceable The nodetype defn for nt:file [3] does not mandate that the requirement for jcr:content being nt:resource. So should we register a new oak:Resource nodetype which is same as nt:resource but not referenceable. This would be similar to oak:Unstructured. Also what should we do for [1]. Should we provide an overloaded method which also accepts a nodetype for jcr:content node as it cannot use oak:Resource Chetan Mehrotra [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-jcr-commons/src/main/java/org/apache/jackrabbit/commons/JcrUtils.java#L1062 [2] [nt:resource] > mix:lastModified, mix:mimeType, mix:referenceable primaryitem jcr:data - jcr:data (binary) mandatory [3] [nt:file] > nt:hierarchyNode primaryitem jcr:content + jcr:content (nt:base) mandatory
[multiplex] - Review the proposed SPI interface MountInfoProvider and Mount for OAK-3404
Hi Team, As we start on integrating the work done related to multiplexing support to trunk I would like your thoughts on new SPI interface MountInfoProvider [1] being proposed as part of OAK-3404. This would be used by various part of Oak to determine the Mount information. Kindly provide your feedback on the issue. Chetan Mehrotra [1] https://github.com/rombert/jackrabbit-oak/tree/features/docstore-multiplex/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/mount
Re: [Oak origin/1.4] Apache Jackrabbit Oak matrix - Build # 992 - Still Failing
On Sat, Jun 25, 2016 at 10:24 AM, Apache Jenkins Server <jenk...@builds.apache.org> wrote: > Caused by: java.lang.IllegalArgumentException: No enum constant > org.apache.jackrabbit.oak.commons.FixturesHelper.Fixture.SEGMENT_TAR > at java.lang.Enum.valueOf(Enum.java:238) > at > org.apache.jackrabbit.oak.commons.FixturesHelper$Fixture.valueOf(FixturesHelper.java:45) > at > org.apache.jackrabbit.oak.commons.FixturesHelper.(FixturesHelper.java:58) The test are failing due to above issue. Is this related to presence of new segment-tar module in trunk but not in branch? Chetan Mehrotra
Re: Way to capture metadata related to commit as part of CommitInfo from within CommitHook
So would it be ok to make the map within CommitInfo mutable ? Chetan Mehrotra On Wed, Aug 3, 2016 at 7:29 PM, Michael Dürig <mdue...@apache.org> wrote: > >> >> #A -Probably we can introduce a new type CommitAttributes which can be >> attached to CommitInfo and which can be modified by the CommitHooks. >> The CommitAttributes can then later be accessed by Observer > > > This is already present via the CommitInfo.info map. It is even used in a > similar way. See CommitInfo.getPath() and its usages. AFAIU the only part > where your cases would differ is that the information is assembled by some > commit hooks instead of being provided at the point the commit was > initiated. > > > Michael
OAK-4475 - CI failing on branches due to unknown fixture SEGMENT_TAR
Hi Team, Sometime back build was failing for branches because of new trunk only fixture usage of SEGMENT_TAR. As this fixture was not present on the branch it caused the build to fail. My initial attempt to fix this was to ignore exception when FixturesHelper resolves enum like SEGMENT_TAR on branch [1]. With this build comes fine but I have a hunch that current fix would lead to all fixtures getting activated and that would cause waste of time A- Which solution to use So have 2 options 1. Treat SEGMENT_TAR as SEGMENT_MK for branches - This would cause test to run 2 times against SEGMENT_MK 2. Create separate build profile for branches B - Use of nsfixtures system property == However before doing that I am trying to understand how the fixture get set. From CI logs the command that gets fired is --- /home/jenkins/tools/maven/apache-maven-3.2.1/bin/mvn -Dnsfixtures=DOCUMENT_NS -Dlabel=Ubuntu -Djdk=jdk1.8.0_11 -Dprofile=integrationTesting clean verify -PintegrationTesting -Dsurefire.skip.ut=true -Prdb-derby -DREMOVEMErdb.jdbc- --- It sets system property 'nsfixtures' to required fixture. However in our parent pom we rely on system property 'fixtures' which defaults to SEGMENT_MK. And in no place we override 'fixtures' in our CI. Looking at all things it appears to me that currently all test are only running against SEGMENT_MK fixture and other fixtures are not getting used. But then exception should not have come with usage of SEGMENT_TAR. So I am missing some connection here in the build process >From my test it appears that if we specify a system property in mvn command line and same property is configured in maven-surefire-plugin then property specified in command line is used and one in pom.xml is ignored. That would explain why settings in pom.xml are not used for fixture So what should we opt for #A? My vote would be for A1! Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/commit/319433e9400429592065d4b3997dd31f93b6c549 [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-parent/pom.xml#L289 maven-failsafe-plugin ${test.opts} ${known.issues} ${mongo.host} ${mongo.port} ${mongo.db} ${mongo.db2} ${fixtures} ${project.build.directory}/derby.log
Re: normalising the rdb database schema
Hi Tomek, I like the idea of revisiting our current schema based on usage so far. However couple of points around potential issue with such a normalized approach - This approach would lead to a thin and lng table. As noted in [1] in a small repo ~14 M nodes we have ~26 M properties. With multiple revisions (GC takes some time) this can go higher. This would then increase the memory requirement for id index. Memory consumption increases further with id+key+revision index. For any db to perform optimally the index should fit in ram. So such such a design would possibly reduce the max size of repository which can be supported (compared to older one) for given memory - The read for specific id can be done in 1 remote call. But that would involve select across multiple rows which might increase the time taken as it would involve 'm' index lookup and then 'm' reads of row data for any node having 'n' properties (m > n assuming multiple revision for property present) May be we should explore the json support being introduced in multiple dbs. DB2 [2], SQL Server [3], Oracle [4], Postgres [5], MySql [6]. Problem here is that we would need DB specific implementation and also increases the testing effort! > we can better use the database features, as now the DBE is aware about the > document internal structure (it’s not a blob anymore). Eg. we can fetch only > a few properties. In most cases the kind of properties stored in blob part of db row are always read as a whole. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-4471 [2] http://www.ibm.com/developerworks/data/library/techarticle/dm-1306nosqlforjson1/ [3] https://msdn.microsoft.com/en-in/library/dn921897.aspx [4] https://docs.oracle.com/database/121/ADXDB/json.htm [5] https://www.postgresql.org/docs/9.3/static/functions-json.html [6] https://dev.mysql.com/doc/refman/5.7/en/json.html On Wed, Aug 17, 2016 at 7:19 AM, Michael Marth <mma...@adobe.com> wrote: > Hi Tomek, > > I like the idea (agree with Vikas’ comments / cautions as well). > > You are hinting at expected performance differences (maybe faster or slower > than the current approach). That would probably be worthwhile to investigate > in order to assess your idea. > > One more (hypothetical at this point) advantage of your approach: we could > utilise DB-native indexes as a replacement for property indexes. > > Cheers > Michael > > > > On 16/08/16 07:42, "Tomek Rekawek" <reka...@adobe.com> wrote: > >>Hi Vikas, >> >>thanks for the reply. >> >>> On 16 Aug 2016, at 14:38, Vikas Saurabh <vikas.saur...@gmail.com> wrote: >> >>> * It'd incur a very heavy migration impact on upgrade or RDB setups - >>> that, most probably, would translate to us having to support both >>> schemas. I don't feel that it'd easy to flip the switch for existing >>> setups. >> >>That’s true. I think we should take a similar approach here as with the >>segment / segment-tar implementations (and we can use oak-upgrade to convert >>between them). At least for now. >> >>> * DocumentNodeStore implementation very freely touches prop:rev=value >>> for a given id… […] I think this would get >>> expensive for index (_id+propName+rev) maintenance. >> >>Indeed, probably we’ll have to analyse the indexing capabilities offered by >>different database engines more closely, choosing the one that offers good >>writing speed. >> >>Best regards, >>Tomek >> >>-- >>Tomek Rękawek | Adobe Research | www.adobe.com >>reka...@adobe.com
Re: svn commit: r1781064 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/index-management.md
Hi Thomas, On Tue, Jan 31, 2017 at 5:07 PM, <thom...@apache.org> wrote: > +The following script created the index externalId: > + > +// create-externalId.txt > +// create a unique index on externalId > +{"print": "check if externalId already exists"} > +{"xpath": "/jcr:root/oak:index/externalId"} > +{"if": "$resultSize", "=": 1, "print": "index externalId already exists"} > +{"$create": []} > +{"if": "$resultSize", "=": 0, "$create": [1]} > +{"for": "$create", "do": [ > +{"print": "does not exist; creating..."}, > +{"addNode": "/oak:index/externalId", "node": { > +"jcr:primaryType": "oak:QueryIndexDefinition", > +"{Name}propertyNames": ["rep:externalId"], > +"type": "property", > +"unique": true > +}}, > +{"session": "save"}, > +{"print": "done; index is now:"}, > +{"xpath": "/jcr:root/oak:index/externalId", "depth": 2} > +]} > +exit Scripting in JSON looks interesting! However would like to understand the approach here and how it should be used. Oak Console already uses Groovy and can execute those script via ":load" construct. We already use this with customer setups to execute scripts hosted on github [1], [2] So not sure why we need to use this approach and how it meets the requirement for OAK-5324 Chetan Mehrotra [1] https://gist.github.com/stillalex/e7067bcb86c89bef66c8 [2] https://gist.github.com/chetanmeh/d7588d96a839dd2d26760913e4055215
Close a CI issue when resolving it as duplicate
It appears that CI issues which are resolved as duplicate but not closed are still updated upon each successful build. So to reduce the noise it would be good to also close the issue when resolving a CI issue as duplicate Chetan Mehrotra
Re: Strong all documents under Root - severe slowness on start-up
Can you provide a thread dump around startup time where you see Oak is reading all child nodes? Chetan Mehrotra On Fri, Feb 24, 2017 at 2:26 AM, Eugene Prystupa <eugene.pryst...@gmail.com> wrote: > Thanks, Michael. > > I should have included more details in the original email. > We are on 1.4.10 version of Jackrabbit Oak, we are using Mongo backend. > > > On Thu, Feb 23, 2017 at 3:40 PM, Michael Dürig <mdue...@apache.org> wrote: > >> >> >> On 23.02.17 19:11, Eugene Prystupa wrote: >> >>> We are seeing severe delays on start-up (20 minutes+) when repository is >>> created (new Jcr(oak).createRepository()). >>> >> >> Regardless of the content structure, 20 min. seems off. What back-end are >> you on? Which version of Oak is this? >> >> Michael >> > > > > -- > Thanks, > Eugene
Re: Merging OAK-5784 into 1.6.1
Changes look fine however one aspect might cause issue RestrictionImpl#hashCode -> PropertyValues#hashCode -> PropertyStateValue#hashCode private String getInternalString() { StringBuilder sb = new StringBuilder(); Iterator iterator = getValue(Type.STRINGS).iterator(); while (iterator.hasNext()) { sb.append(iterator.next()); if (iterator.hasNext()) { sb.append(","); } } return sb.toString(); } @Override public int hashCode() { return getType().tag() ^ getInternalString().hashCode(); } Here it tries to get value as STRINGS which leads to PropertyState#getValue(Type.STRINGS) which would lead to a Binary getting coerced to String in Conversions#convert(Blob) which would lead to load of whole binary. Now I am not sure if PropertyState in RestrictionImpl is applicable for Binary property also Probably PropertyStateValue#hashCode should take care of Binary properties and thats why PropertyState#hashCode does not take into account the value Chetan Mehrotra On Fri, Feb 24, 2017 at 2:34 PM, Angela Schreiber <anch...@adobe.com> wrote: > hi oak-devs > > i would like to merge another improvement into the 1.6.1 branch: > https://issues.apache.org/jira/browse/OAK-5784 > > in addition to additional tests i run the AceCreationTest benchmark and > attached the results to the issue. > however, having some extra pair of eyes would be appreciated in order to > limit the risk of regressions. > > thanks > angela > >
Re: Supporting "resumable" operations on a large tree
Hi Thomas, On Fri, Feb 24, 2017 at 1:09 PM, Thomas Mueller <muel...@adobe.com> wrote: > 9) Sorting of path is needed, so that the repository can be processed bit > by bit by bit. For that, the following logic is used, recursively: read at > most 1000 child nodes. If there are more than 1000, then this subtree is > never split but processed in one step (so many child nodes can still lead > to large transactions, unfortunately). If less than 1000 child nodes, then > the names of all child nodes are read, and processed in sorted order > (sorted by node name). This should work! So we can implement a "paginated tree traversal" via above approach and similar approach can be used for Lucene indexes. Would be good to record this in OAK-2556 (or better a new issue) and we can look into implementing it in those parts which do such large transaction (reindex async index, reindex sync index, content migration in sidegrade) etc Chetan Mehrotra
Re: Merging OAK-5784 into 1.6.1
On Fri, Feb 24, 2017 at 4:10 PM, Angela Schreiber <anch...@adobe.com> wrote: > maybe this is > another indication that we should think about having an implementation > with plugins.memory and deal with the binary topic there. +1 Then we can go with current fix (and also merge to 1.6) and later backport the change to 1.6 branch Chetan Mehrotra
Re: CommitEditors looking for specific child node like oak:index, rep:cugPolicy leads to lots of redundant remote calls
I realized now that I logged an issue for this recently OAK-5511 which mentioned similar approach. So lets move this discussion there Chetan Mehrotra On Thu, Feb 23, 2017 at 7:06 PM, Thomas Mueller <muel...@adobe.com> wrote: > Hi, > >>I like Marcel proposal for "enforcing" use of mixin on parent node to >>indicate that it can have a child node of 'oak:index'. So we can >>leverage mxin 'mix:indexable' (OAK-3725) to mark such parent nodes >>(like root) and IndexUpdate would only look for 'oak:index' node if >>current node has that mixin. > > Ah I didn't know about OAK-3725. > > I'm a bit worried that we mix different aspects together, not sure which > is better. > > "oak:Indexable" is visible, so it can be added and _removed_ by the user. > So when trying to remove that mixin, we would need to check there is no > oak:index child node with nodetype oak:QueryIndexDefinition. We need to > check the nodetype hierarchy. On the other hand, possibly we can enforce > that the parent node of oak:index is oak:Indexable (can we?) > > I'm not saying with a hidden property hidden property ":hasOakIndex" > (automatically set and removed) it would be painless. For example when > moving an oak:index node to a new parent, the setting has to be changed at > both the original and the new parents. > > Regards, > Thomas > >
Re: CommitEditors looking for specific child node like oak:index, rep:cugPolicy leads to lots of redundant remote calls
I like Marcel proposal for "enforcing" use of mixin on parent node to indicate that it can have a child node of 'oak:index'. So we can leverage mxin 'mix:indexable' (OAK-3725) to mark such parent nodes (like root) and IndexUpdate would only look for 'oak:index' node if current node has that mixin. This would avoid the extra calls. For new setups we can enforce this and for upgrade we can migrate the existing code by using nodetype index to update all such "indexable" nodes Chetan Mehrotra On Thu, Feb 23, 2017 at 4:47 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > On Wed, Feb 22, 2017 at 8:21 PM, Davide Giannella <dav...@apache.org> wrote: >> Did you mean for ALL the nodes, or only specific nodes? >> >> Any way you're suggesting something like the following flow: >> >> 1) user call nodebuilder.child(":index") >> 2) lookup in hidden property >> 3) if not there, leverage the existing code >> >> If so I guess the property has been already fetched and it does not >> require roundtrips towards the DB. Am I right? > > Currently the lookup is being done for ALL nodes. So IndexUpdate class > does following on each changed node > > -- >> @Override > public void enter(NodeState before, NodeState after) > throws CommitFailedException { > collectIndexEditors(builder.getChildNode(INDEX_DEFINITIONS_NAME), > before); > -- > > Which transalates into checking if the current node has a child node > 'oak:index' and this leads to redudant calls. > > Chetan Mehrotra
Re: CommitEditors looking for specific child node like oak:index, rep:cugPolicy leads to lots of redundant remote calls
On Wed, Feb 22, 2017 at 8:21 PM, Davide Giannella <dav...@apache.org> wrote: > Did you mean for ALL the nodes, or only specific nodes? > > Any way you're suggesting something like the following flow: > > 1) user call nodebuilder.child(":index") > 2) lookup in hidden property > 3) if not there, leverage the existing code > > If so I guess the property has been already fetched and it does not > require roundtrips towards the DB. Am I right? Currently the lookup is being done for ALL nodes. So IndexUpdate class does following on each changed node -- > @Override public void enter(NodeState before, NodeState after) throws CommitFailedException { collectIndexEditors(builder.getChildNode(INDEX_DEFINITIONS_NAME), before); -- Which transalates into checking if the current node has a child node 'oak:index' and this leads to redudant calls. Chetan Mehrotra
Re: [DISCUSS] Which I/O statistics should the FileStore expose?
Hi Francesco, As Julian mentioned it would be good to collects stats as Metrics. Have a look at DocumentStoreStats which collects some stats around operations being performed by DocumentStore implementations Chetan Mehrotra On Tue, Feb 14, 2017 at 12:37 AM, Julian Sedding <jsedd...@gmail.com> wrote: > Hi Francesco > > I believe you should implement an IOMonitor using the metrics in the > org.apache.jackrabbit.oak.stats package. These can be backed by > swappable StatisticsProvider implementations. I believe by default > it's a NOOP implementation. However, I believe that if the > MetricStatisticsProvider implementation is used, it automatically > exposes the metrics via JMX. So all you need to do is feed the correct > data into a suitable metric. I believe Chetan contributed these, so he > will know more about the details. > > Regards > Julian > > > On Mon, Feb 13, 2017 at 6:21 PM, Francesco Mari > <mari.france...@gmail.com> wrote: >> Hi all, >> >> The recently introduced IOMonitor allows the FileStore to trigger I/O >> events. Callback methods from IOMonitor can be implemented to receive >> information about segment reads and writes. >> >> A trivial implementation of IOMonitor is able to track the following raw >> data. >> >> - The number of segments read and write operations. >> - The duration in nanoseconds of every read and write. >> - The number of bytes read or written by each operation. >> >> We are about to expose this kind of information from an MBean - for >> the sake of discussion, let's call it IOMonitorMBean. I'm currently in >> favour of starting small and exposing the following statistics: >> >> - The duration of the latest write (long). >> - The duration of the latest read (long). >> - The number of write operations (long). >> - The number of read operations (long). >> >> I would like your opinion about what's the most useful way to present >> this data through an MBean. Should just raw data be exposed? Is it >> appropriate for IOMonitorMBean to perform some kind of aggregation, >> like sum and average? Should richer data be returned from the MBean, >> like tabular data? >> >> Please keep in mind that this data is supposed to be consumed by a >> monitoring solution, and not a by human reader.
Re: [DISCUSS] Which I/O statistics should the FileStore expose?
On Tue, Feb 14, 2017 at 1:15 PM, Francesco Mari <mari.france...@gmail.com> wrote: > What could be gained > by adding Metrics to the trivial implementation of IOMonitorMBean > described above? The metrics created here are automatically registered in JMX (see MetricStatisticsProvider) and also accessible over web ui for example when running in Sling [1]. The JMX one then can be read by external monitoring agent > In example, how the methods in DocumentStoreStats returning time series as > CompositeData play with other monitoring solutions like ElasticSearch/Kibana? That is just for convenience i.e. for those setups which do not have any external monitoring setup installed the time series provides some insight for stats in past via JMX Chetan Mehrotra [1] https://sling.apache.org/documentation/bundles/metrics.html#webconsole-plugin
Collect data for test failure in issue itself
It would be helpful while renaming the Hudson created jira issue we also attach relevant unit-test.log from the module for which test failed and also record test failure message as comment This simplifies later analysis as CI only retains reports for past few builds (not sure on number). For example for some of the older issue the links to CI are now resulting in 404 (see OAK-5263 for example) Chetan Mehrotra
Re: svn commit: r1779324 - in /jackrabbit/oak/trunk/oak-segment-tar: ./ src/test/java/org/apache/jackrabbit/oak/segment/standby/ src/test/java/org/apache/jackrabbit/oak/segment/test/
Hi Francesco, On Wed, Jan 18, 2017 at 7:01 PM, <f...@apache.org> wrote: > +package org.apache.jackrabbit.oak.segment.test; > + > +import java.net.ServerSocket; > + > +import org.junit.rules.ExternalResource; > + > +public class TemporaryPort extends ExternalResource { > + > +private int port; > + > +@Override > +protected void before() throws Throwable { > +try (ServerSocket socket = new ServerSocket(0)) { > +port = socket.getLocalPort(); > +} > +} > + > +public int getPort() { > +return port; > +} > + > +} This looks useful and can be used in other places also like in [1]. Would be good if we can move it to oak-commons in org.apache.jackrabbit.oak.commons.junit package. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-5441?focusedCommentId=15823491=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15823491
Specify versions for maven plugins used in build for ensuring stable builds (OAK-5455)
Hi Team, While checking the test failure reports I realized that we do not specify versions for some important maven plugin which can make build behaviour env dependent. Should this be something we change at this of release i.e. specifying 1. At least current used versions for the plugin 2. Updating the version to latest where possible Opened OAK-5455 to track this. Chetan Mehrotra
RepositorySidegrade and commit hooks
Hi, Does RepositorySidegrade runs all the commit hooks required for getting a consistent JCR level state like permission editor, property editor etc I can such hooks configured for RepositoryUpgrade but not seeing any such hook configured for RepositorySidegrade Probably we should also configure same set of hooks? Chetan Mehrotra
Re: Help with unit tests for JMX stats for S3DataStore
Hi Matt, It would be easier if you can open an issue and provide your patch there so that one can have better understanding of what needs to be tested. In general we have you can use with MemoryDocumentStore (default used by DocumentMK builder) and then possibly use Sling OSGi mocks to pick the registered MBean services. For an example have a look at SegmentNodeStoreServiceTest which uses OSGi mocks to activate the service and then pick up the registered services to do the assertion Chetan Mehrotra On Fri, Aug 19, 2016 at 6:14 AM, Matt Ryan <o...@mvryan.org> wrote: > Hi, > > I’m working on a patch for Oak that would add some JMX stats for > S3DataStore. I’m adding code to register a new Mbean in > DocumentNodeStoreService (also SegmentNodeStoreService, but let’s just > worry about the first one for now). > > I wanted to create some unit tests to verify that my new JMX stats are > available via JMX. The idea I had would be that I would simply instantiate > a DocumentNodeStoreService, create an S3DataStore, wrap it in a > DataStoreBlobStore, and bind that in the DocumentNodeStoreService. Then > with a JMX connection I could check that my Mbean had been registered, > which it should have been by this time. > > > This was all going relatively fine until I hit a roadblock in > DocumentNodeStoreService::registerNodeStore(). The DocumentMKBuilder uses > a DocumentNodeStore object that I need to mock in order to do the test, and > I cannot mock DocumentNodeStore because it is a final class. I tried > working around that, but ended up hitting another road block in the > DocumentNodeStore constructor where I then needed to mock a NodeDocument - > again, can’t mock it because it is a final class. > > > I realize it is theoretically possible to mock final classes using > PowerMock, although by this point I am starting to wonder if all this > effort is a good way to use my time or if I should just test my code > manually. > > > Is it important that DocumentNodeStore be a final class? If not, how would > we feel about me simply making the class non-final? If so, what > suggestions do you have to help me unit test this thing? I feel that it > should be easier to unit test new code than this, so maybe I’m missing > something. > > > Thanks > > > -Matt Ryan
Re: RepositorySidegrade and commit hooks
Thanks Tomek for confirmation. Opened OAK-4684 to track that Chetan Mehrotra On Fri, Aug 19, 2016 at 3:52 PM, Tomek Rekawek <reka...@adobe.com> wrote: > Hi Chetan, > > yes, it seems that this has been overlooked in the OAK-3239 (porting the > —include-paths support from RepositoryUpgrade). Feel free to create an issue > / commit a patch or let me know if you want me to do it. > > Best regards, > Tomek > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > reka...@adobe.com > >> On 19 Aug 2016, at 10:38, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: >> >> For complete migration yes all bits are there. However people also use >> this for partial incremental migration from source system to target >> system. In that case include paths are provide for those paths whose >> content need to be updated. In such a case it can happen that derived >> content for those paths (property index, permission store entries) do >> not get updated and that would result in inconsistent state >> Chetan Mehrotra >> >> >> On Fri, Aug 19, 2016 at 1:59 PM, Alex Parvulescu >> <alex.parvule...@gmail.com> wrote: >>> Hi, >>> >>> I don't think any extra hooks are needed here. Sidegrade is just a change >>> in persistence format, all the bits should be there already in the old >>> repository. >>> >>> best, >>> alex >>> >>> On Fri, Aug 19, 2016 at 6:45 AM, Chetan Mehrotra <chetan.mehro...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Does RepositorySidegrade runs all the commit hooks required for >>>> getting a consistent JCR level state like permission editor, property >>>> editor etc >>>> >>>> I can such hooks configured for RepositoryUpgrade but not seeing any >>>> such hook configured for RepositorySidegrade >>>> >>>> Probably we should also configure same set of hooks? >>>> >>>> Chetan Mehrotra >>>> >
Re: Using same index definition for both async and sync indexing
On Wed, Aug 3, 2016 at 2:23 PM, Alex Parvulescu <alex.parvule...@gmail.com> wrote: > extend the current index definition > for the 'async' property and allow multiple values. That should work and looks like natural extension of the flag. Just that having empty value in array does not look good (might confuse people in ui). So we can have a marker value to indicate empty >What about overloading the 'IndexUpdateCallback' with a 'isSync()' method > coming from the 'IndexUpdate' component. This will reduce the change > footprint and only components that need to know this information will use > it. That can be done. Going forward we also need to pass in CommitInfo or something like that (see other mail). Another option can be to have a new interface for IndexEditorProvider (on same line as AdvancedQueryIndex > QueryIndex). So the editor implementing new interface would have the extra params passed in. And there we introduce something like IndexingContext which folds in IndexUpdateCallback, indexing mode, index path, CommitInfo etc Chetan Mehrotra
Re: [observation] pure internal or external listeners
On Fri, Sep 2, 2016 at 4:00 PM, Stefan Egli <stefane...@apache.org> wrote: > If we > separate listeners into purely internal vs external, then a queue as a whole > is either purely internal or external and we no longer have this issue. Not sure here on how this would work. The observation queue is made up of ContentChange which is a tuple of [root NodeState , CommitInfo (null for external)] --- NS1-L---NS2-L--NS3---NS4-L---NS5-L ---NS6-L --- a /a/b - /a/c --- /a/c /a/b /a/b /a/d So if we dedicate a queue for local changes only what would happen. If we drop NS3 then while diffing [NS2-L, NS4-L] /a/c would be reported as "added" and "local". Now we have a listener which listens for locally added nt:file node such it can start some processing job for it. Such a listener would then think its a locally added node and would start a duplicate job In general I believe Listener for external Change -- listener which are listening for external changes are maintaining some state and purge/refresh it upon detecting change in interested paths. They would work fine if multiple content change occurrences are merged [NS4-L, NS5-L] + [NS5-L,NS6-L] = [NS4, NS6] (external) as they would still detect the change An example of this is LuceneIndexObserver which sets queue size to 5 and does not care its local or not. It just interested in if index node is updated Listener for local Change -- Such a listener is more particular about type of change and is doing some persisted state change i.e. like registering a job, invoking some third party service to update the value. This listener is only interested in local as it know same listener is also active on other cluster node (homogeneous cluster setup) so if a node gets added it only need to react on the cluster node where it got added. So for such it needs to be ensured that mixed content changes are not compacted. So its fine to [NS4-L, NS5-L] + [NS5-L,NS6-L] = [NS4, NS6] (can be treated as local with loss of user identity which caused the change) [NS2-L, NS3]+ [NS3, NS4-L] = [NS2-L, NS4-L] (cannot be treated as local) Just thinking out loud here to understand the problem space better :) Chetan Mehrotra
Re: CommitHooks as OSGi Components.
On Mon, Sep 12, 2016 at 3:12 PM, Ian Boston <i...@tfd.co.uk> wrote: > but if the information that connect a sessionID/userID to the > paths that are modified is available through some other route, I might be > able to use something else. A regular Observer should work for that case. Just register an instance with service registry and it would be picked up and for non external event CommitInfo would be present Chetan Mehrotra
Re: CommitHooks as OSGi Components.
On Mon, Sep 12, 2016 at 2:08 PM, Ian Boston <i...@tfd.co.uk> wrote: > Unfortunately the IndexProvider route doesn't appear give me the > information I am after (CommitInfo). Any details around intended usage? CommitInfo is now exposed via OAK-4642 to IndexEditorProvider Chetan Mehrotra
Re: Minimum JDK version
I think Marcel created OAK-4791 for the same. So that should take care of enforcing this constraing Chetan Mehrotra On Mon, Sep 12, 2016 at 4:40 PM, Stefan Seifert <sseif...@pro-vision.de> wrote: > in sling we use the animal sniffer plugin for exactly this purpose [1]. > it checks that the compiled codes only uses signatures available in the > configured jdk. > > stefan > > [1] http://www.mojohaus.org/animal-sniffer/animal-sniffer-maven-plugin/ > >>-Original Message- >>From: Tomek Rekawek [mailto:reka...@adobe.com] >>Sent: Monday, September 12, 2016 1:06 PM >>To: oak-dev@jackrabbit.apache.org >>Subject: Re: Minimum JDK version >> >>Hi, >> >>the interesting thing here is that we actually compile the code with - >>source and -target=1.6 in these branches [1][2]. However, the javac still >>uses the rt.jar coming from the current JDK and it does contain the >>java.nio package. It seems that the only way to check the API usage >>correctness is to switch to JDK 1.6. >> >>Or maybe there’s some way to validate whether the used packages matches >>selected JDK version (eg. via some plugin reading the @since javadocs in >>API classes)? >> >>Regards, >>Tomek >> >>[1] https://github.com/apache/jackrabbit-oak/blob/1.4/oak- >>parent/pom.xml#L97 >>[2] https://github.com/apache/jackrabbit-oak/blob/1.2/oak- >>parent/pom.xml#L95 >> >> >>-- >>Tomek Rękawek | Adobe Research | www.adobe.com >>reka...@adobe.com >> >>> On 12 Sep 2016, at 11:42, Davide Giannella <dav...@apache.org> wrote: >>> >>> Hello team, >>> >>> following the recent mishap about JDK version and releases highlighted >>> two main issues: >>> >>> cannot find jenkins for anything that is not 1.6 >>> >>> we should enforce the build to build with the minimum required JDK. >>> >>> Now for the second point, this is easily achievable. What we have to >>> decide is whether we want this enforcement done on all the builds, or >>> only during releases build and checks. >>> >>> I'm for having it enforced on all the builds. >>> >>> Thoughts? >>> >>> Davide >>> >>> >
Re: Infinite loop
Looks like index would need to be reindex. It would be better to contact Adobe Support as closer analysis would be required Chetan Mehrotra On Thu, Sep 15, 2016 at 6:32 PM, Thiago Sanches <tsi...@gmail.com> wrote: > I removed the index folder but the error persists. I tried to remove the > "/crx/packmgr/service.jsp/file" node (that was causing the error before) > But still failing... > > 15.09.2016 12:58:31.417 *DEBUG* [pool-7-thread-3] > org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate [async] The index > update is still failing > org.apache.jackrabbit.oak.api.CommitFailedException: OakLucene0005: Failed > to remove the index entries of the removed subtree > /crx/packmgr/service.jsp/file > > I know that we removed and maybe caused some other issue, but I don't know > why the cause of the first error (before the node deletion). > > On Thu, Sep 15, 2016 at 8:51 AM, Thiago Sanches <tsi...@gmail.com> wrote: > >> Hello Chetan, good morning. >> >> Yes, there is a lot of ".bak" inside segmentstore folder with the creation >> time close to the "force" restart. >> I'll try to remove the index folder. >> >> Thanks for your help. >> >> On Thu, Sep 15, 2016 at 1:44 AM, Chetan Mehrotra < >> chetan.mehro...@gmail.com> wrote: >> >>> On Thu, Sep 15, 2016 at 2:17 AM, Thiago Sanches <tsi...@gmail.com> wrote: >>> > This issue start to appers after some problemas with disk space and some >>> > force restarts on AEM. >>> >>> Do you see presence of ".bak" files in segmentstore folder post system >>> restart after unclean shutdown with creation time close to subsequent >>> restart times? Can you try cleaning local index folder >>> (repository/index) and restart to see if its resolved. If not would >>> suggest to followup on Adobe Support portal >>> >>> >>> Chetan Mehrotra >>> >> >>
Re: Faster reference binary handling
I think we fixes have been recently done in this area. However it would be good to have an integration test for reference check scenario to ensure that it unnecessarily does not download the blobs Chetan Mehrotra On Fri, Sep 16, 2016 at 11:56 AM, Thomas Mueller <muel...@adobe.com> wrote: > Hi, > > Possibly the binary is downloaded from S3 in this case. We have seen > similar performance issues with datastore GC when using the S3 datastore. > > It should be possible to verify this with full thread dumps. Plus we would > see where exactly the download occurs. Maybe it is checking the length or > so. > >> this API requires Oak to always retrieve the binary value from the DS > > I think the problem is in the S3 datastore implementation, and not the > API. But lets see. > > Regards, > Thomas > > > On 15/09/16 18:04, "Tommaso Teofili" <tommaso.teof...@gmail.com> wrote: > >>Hi all, >> >>while working with Oak S3 DS I have witnessed slowness (no numbers, just >>'slow' from a user perspective) in persisting a binary using its >>reference; >>although this may be related to some environment specific issue I wondered >>about the reference binary handling we introduced in JCR-3534 [1]. >>In fact the implementation there requires to do something like >> >>ReferenceBinary ref = new SimpleReferenceBinary(referenceString); >>Binary referencedBinary = >>session.getValueFactory().createValue(ref).getBinary(); >>node.setProperty("foo", referencedBinary); >> >>on the "installation" side. >>Despite all possible issues in the implementation it seems this API >>requires Oak to always retrieve the binary value from the DS and then >>store >>its value into the node whereas it'd be much better to avoid having to >>read >>the value but instead bind it to that referenced binary. >> >>ReferenceBinary ref = new SimpleReferenceBinary(referenceString); >>if (ref.isValid()) { // referenced binary exists in the DS >> node.setProperty("foo", ref, Type.BINARY); // set a string with binary >>type !? >>} >> >>I am not sure if the above code could make sense, probably not, but at >>least wanted to point out the problem as to seek for possible >>enhancements. >> >>Regards, >>Tommaso >> >>[1] : https://issues.apache.org/jira/browse/JCR-3534 >
Re: Possibility of making nt:resource unreferenceable
On Fri, Oct 7, 2016 at 11:34 AM, Carsten Ziegeler <cziege...@apache.org> wrote: > Whenever a nt:resource child node of a nt:file node is created, it is > silently changed to oak:resource. I like this! This can be done via an Editor which does this transformation upon addition of new node. Something which can be easily enabled/disabled if need arises. With this we would not have make change in many places like JcrUtil.putFile, WebDav, Vault, Sling Post Servlet, any custom code creating nt:file say using JcrUtil.putFile. Chetan Mehrotra
Re: CommitHooks as OSGi Components.
On Thu, Sep 15, 2016 at 1:15 AM, Rob Ryan <rr...@adobe.com> wrote: > Last I heard even local events can be subject to loss of the user id if so > many events are being processed that ‘compaction’ is used to mitigate the > load. Is this still the case? > > Please don’t point people toward the availability of the user id from events > (without full disclaimers) if it will not *always* be available. Thats the case for JCR level ObservationListener which makes use of BackgroundObserver. In Ian case he is directly building on top of Observer and hence can control the compaction aspect. Chetan Mehrotra
Re: IndexEditorProvider behaviour question.
Note that so far LuceneIndexEditor was used only for async indexing case and hence invoked only on leader node every 5 sec. So performance aspects here were not that critical. However with recent work on Hybrid indexes they would be used in critical path and hence such aspects are important On Wed, Sep 14, 2016 at 3:10 PM, Ian Boston <i...@tfd.co.uk> wrote: > A and B mean that the work of creating the tree and working out the changes > in a tree will be duplicated roughly n times, where n is the number of > index definitions. Here note that diff would be performed only once at any level and IndexUpdate would then pass them to various editors. However construction of trees can be avoided and I have opened OAK-4806 for that now. Oak issue has details around why Tree was used also. Also with multiple index editors performance does decrease. See OAK-1273. If we switch to Hybrid Index then this aspects improves a bit as instead of having 50 different property indexes (with 50 editor instance for each commit) we can have a single editor with 50 property definition. This can be seen in benchmark in Hybrid Index (OAk-4412) by changing the numOfIndexes If you see any other area of improvement say around unnecessary object generation then let us know! Chetan Mehrotra
Re: Infinite loop
On Thu, Sep 15, 2016 at 2:17 AM, Thiago Sanches <tsi...@gmail.com> wrote: > This issue start to appers after some problemas with disk space and some > force restarts on AEM. Do you see presence of ".bak" files in segmentstore folder post system restart after unclean shutdown with creation time close to subsequent restart times? Can you try cleaning local index folder (repository/index) and restart to see if its resolved. If not would suggest to followup on Adobe Support portal Chetan Mehrotra
Re: [VOTE] Require JDK7 for Oak 1.4
+1 Chetan Mehrotra On Mon, Sep 19, 2016 at 12:41 PM, Marcel Reutegger <mreut...@adobe.com> wrote: > +1 > > Regards > Marcel > > > On 16/09/16 17:16, Julian Reschke wrote: >> >> On 2016-09-16 17:11, Davide Giannella wrote: >>> >>> ... >> >> >> OK then. >> >> [ ] +1 Yes, require JDK7 for Oak 1.4 >> [ ] -1 No, continue to support JDK6 >> >> This majority vote is open for at least 72 hours. >> >> Best regards, Julian >> >> >
Re: Stopping long running traversal queries
You can specify a traversal limit via QueryEngineSettingsMBean. This would be applicable on any running query Chetan Mehrotra On Wed, Sep 21, 2016 at 6:26 AM, Pantula Rajesh <praj...@adobe.com> wrote: > Hi All, > > Is there a way to stop long running traversal queries? I was looking if there > is any JMX bean which can stop such queries. > > Regards, > Rajesh
Possibility of making nt:resource unreferenceable
<> Hi Team, Sometime back we discussed the requirement for oak:Resource as a non referenceable replacement for nt:resource (OAk-4567). This topic was also discussed on DL [1] and at that time it was decided that changing the defaults (making nt:resource non referenceable ) is not possible and hence applications should switch to other nodetypes while creating nt:file instance. Towards that end I started discussion on Sling side as part of SLING-6090. See [2] for discussion thread. However team there is of the view that this would require changes in many places and wants us to think again about changing the defaults. So question here is === Can we change the defaults for nt:resource nodetype to be non referenceable. This has also been proposed in JCR 2.0. JR2 and Oak though use the nodetype definition from JCR 1.0 === To reiterate I am just aiming for a solution here which enables a user to use a more optimum nodetype and get best performance out of underlying repository. Hopefully we can converge on some agreement here :) Chetan Mehrotra [1] http://markmail.org/thread/uj2ht4jwdrck7eja [2] http://markmail.org/thread/77xvjxtx42euhss4 [3] https://java.net/jira/browse/JSR_283-428
Re: Oak 1.5.13 release plan
I would like to have OAK-4975 included. Marked that issue as blocker. I hope to resolve that today itself Chetan Mehrotra On Thu, Oct 20, 2016 at 7:07 PM, Davide Giannella <dav...@apache.org> wrote: > Hello team, > > I'm planning to cut Oak 1.5.13 on Monday 24th. > > If there are any objections please let me know. Otherwise I will > re-schedule any non-resolved issue for the next iteration. > > Thanks > Davide > >
Issues waiting for changes in DocumentStore API
We currently have few open issues which are dependent on updating the DocumentStore API OAK-3878 - Avoid caching of NodeDocument while iterating in BlobReferenceIterator OAK-3001 - Simplify JournalGarbageCollector using a dedicated timestamp property It would be good if we can decide what the api should be now such that these issues can be addressed in 1.6 release. May be we go for usecase specific api? Chetan Mehrotra
Re: Oak 1.5.13 release plan
On Mon, Oct 24, 2016 at 4:53 PM, Julian Reschke <julian.resc...@gmx.de> wrote: > Chetan: I see that you marked OAK-3036 as "blocker" for this release -- but > then, do we have a plan to resolve it in a timely manner? Missed that. Moved it to next release and would try get it resolved by that time! Chetan Mehrotra
[REVIEW] Configuration required for node bundling config for DocumentNodeStore - OAK-1312
Hi Team, Work for OAK-1312 is now in trunk. To enable this feature user has to provision some config as content in repository. The config needs to be created under '/jcr:system/rep:documentStore/bundlor' [1] Example - jcr:system rep:documentStore bundlor app:Asset{pattern = [jcr:content/metadata, jcr:content/renditions, jcr:content/renditions/**, jcr:content]} nt:file{pattern = [jcr:content]} - Key points * This config is only required when system is using DocumentNodeStore * Any change here would be picked via Observation * Config is supposed to be changed only by system admin. So needs to be secured (OAK-4959) * Config can be changed anytime and would impact only newly created nodes. Open Questions Bootstrap default config --- Should we ship with a default config for nt:file (may be other like rep:AccessControllable). If yes then how to do that. One way can be to introduce a new 'WhiteboardRepositoryInitializer' and then DocumentNodeStore can register one which bootstraps a default config Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1312?focusedCommentId=15387241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387241
Re: svn commit: r1765583 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/api/jmx/ oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/property/strategy/ oak-cor
On Thu, Oct 20, 2016 at 6:08 PM, Julian Sedding <jsedd...@gmail.com> wrote: > I think we could get away with increasing this to 4.1.0 if we can > annotate QueryEngineSettingsMBean with @ProviderType. Makes sense. Opened OAK-4977 for that Chetan Mehrotra
Build failing due to compilation errors in oak-segment-tar
Build is failing locally and in CI [1] due to compilation error in oak-segment-tar. Looks like SegmentGCStatus class is not checked in [ERROR] /home/chetanm/git/apache/jackrabbit-oak/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStoreGCMonitor.java:[32,51] error: cannot find symbol [ERROR] symbol: class SegmentGCStatus [ERROR] location: package org.apache.jackrabbit.oak.segment.compaction Chetan Mehrotra [1] https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1296/jdk=JDK%201.8%20(latest),nsfixtures=SEGMENT_MK,profile=unittesting/console
Re: Build failing due to compilation errors in oak-segment-tar
Added missed file in r1770910 @Francesco/Andrei Can you check if its the intended file. With this compilation passes on my setup Chetan Mehrotra On Wed, Nov 23, 2016 at 10:42 AM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Build is failing locally and in CI [1] due to compilation error in > oak-segment-tar. Looks like SegmentGCStatus class is not checked in > > [ERROR] > /home/chetanm/git/apache/jackrabbit-oak/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStoreGCMonitor.java:[32,51] > error: cannot find symbol > [ERROR] symbol: class SegmentGCStatus > [ERROR] location: package org.apache.jackrabbit.oak.segment.compaction > > > Chetan Mehrotra > [1] > https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1296/jdk=JDK%201.8%20(latest),nsfixtures=SEGMENT_MK,profile=unittesting/console
Re: oak-lucene shaded
Hi Torgeir, We would not be able shade Lucene classes as they are exported and meant to be used by certain SPI implementations. So as of now there is no solution for using a different Lucene version in non OSGi world Chetan Mehrotra On Wed, Nov 23, 2016 at 7:15 PM, Torgeir Veimo <torgeir.ve...@gmail.com> wrote: > Second version, this pom file can be put in a separate directly as a self > contained maven artifact and includes oak-lucene remotely. > > > > > > http://maven.apache.org/POM/4.0.0; xmlns:xsi=" > http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=" > http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd > "> > 4.0.0 > > no.karriere > 0.1-SNAPSHOT > oak-lucene-shaded > Oak Lucene (shaded) > Oak Lucene integration subproject > > > 1.5 > 4.7.1 > 1.4.6 > > > > > > > org.apache.maven.plugins > maven-source-plugin > 3.0.1 > > > generate-sources-for-shade-plugin > package > > jar-no-fork > > > > > > org.apache.maven.plugins > maven-shade-plugin > 3.0.0-SNAPSHOT > > > package > > shade > > > > > > false > true > true > > implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> > > > > org.apache.lucene > > org.shaded.apache.lucene > > > org.tartarus.snowball > > org.shaded.tartarus.snowball > > > > > > org.apache.jackrabbit:oak-core > > org.apache.jackrabbit:oak-commons > > org.apache.jackrabbit:oak-blob > > com.google.guava:guava > > commons-codec:commons-codec > commons-io:commons-io > javax.jcr:jcr > > org.apache.jackrabbit:jackrabbit-api > > org.apache.jackrabbit:jackrabbit-jcr-commons > > org.apache.tika:tika-core > org.slf4j:slf4j-api > > > > > > > > > > > > org.apache.jackrabbit > oak-core > ${oak.version} > > > org.apache.jackrabbit > oak-lucene > ${oak.version} > > > org.apache.tika > tika-core > ${tika.version} > > > > > org.apache.lucene > lucene-core > ${lucene.version} > > > org.apache.lucene > lucene-analyzers-common > ${lucene.version} > > > org.apache.lucene > lucene-queryparser > ${lucene.version} > > > org.apache.lucene > lucene-queries > ${lucene.version} > > > org.apache.lucene > lucene-suggest > ${lucene.version} > > > org.apache.lucene > lucene-highlighter > ${lucene.version} > > > org.apache.lucene > lucene-memory > ${lucene.version} > > > org.apache.lucene > lucene-misc > ${lucene.version} > > > org.apache.lucene > lucene-facet > ${lucene.version} > > > > org.apache.tika > tika-parsers >
Re: Frequent failures in standby test
Per https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1298/ the test again failed but mostly on Jdk 1.7. The test on Jdk 1.8 looks like passed. Chetan Mehrotra On Tue, Nov 22, 2016 at 12:48 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > They are from oak-segment-tar. See > https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1295/#showFailuresLink > Chetan Mehrotra > > > On Tue, Nov 22, 2016 at 12:42 PM, Francesco Mari > <mari.france...@gmail.com> wrote: >> Are those from oak-tarmk-standby or oak-segment-tar? >> >> 2016-11-22 6:11 GMT+01:00 Chetan Mehrotra <chetan.mehro...@gmail.com>: >>> Hi Team, >>> >>> Since last 4-6 builds I am seeing a recurring failure of few test in >>> standby module >>> >>> * FailoverIPRangeIT >>> * ExternalPrivateStoreIT >>> * StandbyTestIT >>> >>> Probably something to be looked into >>> >>> Chetan Mehrotra
Re: oak-lucene shaded
On Fri, Nov 25, 2016 at 2:33 PM, Torgeir Veimo <torgeir.ve...@gmail.com> wrote: > I wasn't suggesting oak should adopt this approach at this time, it's > merely a solution for those that need to combining oak with other code > (usually elasticsearch) in a non-osgi environment (usually spring). Okies ... I thought you wanted Oak to adopt the approach and hence the confusion! Yes approach used should work fine for such cases. May be we can later add support to produce such a oak-lucene jar Chetan Mehrotra