Re: R: R: R: Critical questions about OAK
Hi Francesco, Query Engine 1. I didn't understand how Traverse recovers phisically the graph to traverse. Is provided in memory ? does it make a search on filesystem or db to obtain a correct portion of graph and then traverse it ? 2. Can you point out the Traverse classes ? Or unit test? The Traversing Index is a fall back that Oak’s built-in query engine uses if no “real” index is able to answer a specific query. (this implies that all your queries should be backed by indexes). If the traversal index is used then the query engine will traverse the relevant parts of the tree (relevant == the tree specified in your query). Whether this traversal happens in memory, on disc or else is a concern of the lower level persistence layer and thus transparent to the query engine. You can find related code here: https://github.com/apache/jackrabbit-oak/search?utf8=%E2%9C%93=traversingindex=Code (but please note that: if you see traversals in the log this means that you should an index) Instead for RDBMS question i noticed that with our simple class, the first time i add the node. The second time we obtain an error loading RepositoryImpl. In detail when MutableTree try to make beforewrite, throw an illegalstate exception ("this tree does not exist") It is hard to give a proper answer, but you mention “ MutableTree” which leads me to suspect that you have initialized/used Oak-internal classes. On application layer you should only use the JCR API to interact with the repository. HTH Michael On 07/03/16 15:15, "Ancona Francesco" <francesco.anc...@siav.it<mailto:francesco.anc...@siav.it>> wrote: Hi, sorry if i continue to ask you about these critical questions but we'd like to build on OAK a platform that manage over 200M of documents so we'd like to know in deep how OAK works. Query Engine 1. I didn't understand how Traverse recovers phisically the graph to traverse. Is provided in memory ? does it make a search on filesystem or db to obtain a correct portion of graph and then traverse it ? 2. Can you point out the Traverse classes ? Or unit test? Instead for RDBMS question i noticed that with our simple class, the first time i add the node. The second time we obtain an error loading RepositoryImpl. In detail when MutableTree try to make beforewrite, throw an illegalstate exception ("this tree does not exist") Thanks in advance, best regards -Messaggio originale- Da: Julian Reschke [mailto:julian.resc...@gmx.de] Inviato: venerdì 4 marzo 2016 08:09 A: oak-dev@jackrabbit.apache.org<mailto:oak-dev@jackrabbit.apache.org> Oggetto: Re: R: R: Critical questions about OAK On 2016-03-03 15:48, Ancona Francesco wrote: Yes but i'm asking if there is a way or a configuration to call rdbms using jcrrepository like oak examples in getting start. final DocumentMK.Builder builder = new DocumentMK.Builder(); builder.setBlobStore(createFileSystemBlobStore()); final DocumentNodeStore ns = getRDBDocumentNodeStore(builder); Oak oak = new Oak(ns); Jcr jcr = new Jcr(oak); Repository repo = jcr.createRepository(); Thanks. It looks like some RepositoryInitializer is missing (AFAIU, it would take care of creating the initial content). Best regards, Julian This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
R: R: R: Critical questions about OAK
Hi, sorry if i continue to ask you about these critical questions but we'd like to build on OAK a platform that manage over 200M of documents so we'd like to know in deep how OAK works. Query Engine 1. I didn't understand how Traverse recovers phisically the graph to traverse. Is provided in memory ? does it make a search on filesystem or db to obtain a correct portion of graph and then traverse it ? 2. Can you point out the Traverse classes ? Or unit test? Instead for RDBMS question i noticed that with our simple class, the first time i add the node. The second time we obtain an error loading RepositoryImpl. In detail when MutableTree try to make beforewrite, throw an illegalstate exception ("this tree does not exist") Thanks in advance, best regards -Messaggio originale- Da: Julian Reschke [mailto:julian.resc...@gmx.de] Inviato: venerdì 4 marzo 2016 08:09 A: oak-dev@jackrabbit.apache.org Oggetto: Re: R: R: Critical questions about OAK On 2016-03-03 15:48, Ancona Francesco wrote: > Yes but i'm asking if there is a way or a configuration to call rdbms using > jcrrepository like oak examples in getting start. > > final DocumentMK.Builder builder = new DocumentMK.Builder(); > builder.setBlobStore(createFileSystemBlobStore()); > final DocumentNodeStore ns = getRDBDocumentNodeStore(builder); > Oak oak = new Oak(ns); > Jcr jcr = new Jcr(oak); > > Repository repo = jcr.createRepository(); > > Thanks. It looks like some RepositoryInitializer is missing (AFAIU, it would take care of creating the initial content). Best regards, Julian This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
R: R: Critical questions about OAK
Hi, other question, always about the query Engine. 1. I didn't understand how Traverse recovers phisically the graph to traverse. Is provided in memory ? does it make a search on filesystem or db to obtain a correct portion of graph and then traverse it ? 2. Can you point out the Traverse classes ? Or unit test? Thanks in advance Best regards -Messaggio originale- Da: Davide Giannella [mailto:dav...@apache.org] Inviato: giovedì 3 marzo 2016 15:52 A: oak-dev@jackrabbit.apache.org Oggetto: Re: R: Critical questions about OAK On 03/03/2016 14:15, Ancona Francesco wrote: > ... > About query Engine > - Could you explain more in deep what traverse is ? If we have > understood, Treverse doesn't delegate to index server engine (good in case of > index server trouble) but is built incomponent in oak: but where keep > repository graph to Traverse ? In memory ? on filesystem ? getting data from > db ? Traverse will physically traverse the repository in search for the right data. It's not the most efficient index and it's there mainly to operate in case either all other indexes are not suitable for the provided query or there are no other indexes. But be careful. It doesn't mean it's intrinsically a bad index. Let's take the following query as an example SELECT * FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/content/mysite/colour/red') AND colour = 'red' and you initialised the repository with the InitialContent that provides you some indexes, as I said in a previous email, and on top you have a PropertyIndex on `colour` and you have no Lucene index. Lucene is quite powerful with a lot of configuration options. Overall in the repository you have grossly the following node distribution - 10k nodes nt:unstructured - 5k nodes with colour red - 3 nodes under /content/mysite/colour/red For the above query, if you look at the plans you'll have the following costs (taking some freedom on numbers): - NodeTypeIndex 1 - PropertyIndex: 3000 - Traversing: 3 In this case the traversing index would actually be more performant than any other index as the query engine will have to post-analyse a set of only 3 nodes. > - we have to manage a potentially large amount of documents so we need > more than a node, so is it possibile clustering lucene ? You can't cluster the built-in lucene. If you're looking for such feature maybe a remote Solr can be a better solution but so far I don't think I heard the need of clustering lucene. You can have a look at my slides from the talk I gave to the adaptTo conference last year. They may help shedding some light on the query engine, even if the biggest part of my presentation were the 20 minutes of Q :) http://adapt.to/2015/en/schedule/scaling-the-query-with-oak.html HTH Davide This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
Re: R: R: Critical questions about OAK
On 2016-03-03 15:48, Ancona Francesco wrote: Yes but i'm asking if there is a way or a configuration to call rdbms using jcrrepository like oak examples in getting start. final DocumentMK.Builder builder = new DocumentMK.Builder(); builder.setBlobStore(createFileSystemBlobStore()); final DocumentNodeStore ns = getRDBDocumentNodeStore(builder); Oak oak = new Oak(ns); Jcr jcr = new Jcr(oak); Repository repo = jcr.createRepository(); Thanks. It looks like some RepositoryInitializer is missing (AFAIU, it would take care of creating the initial content). Best regards, Julian
Re: R: Critical questions about OAK
On 03/03/2016 14:15, Ancona Francesco wrote: > ... > About query Engine > - Could you explain more in deep what traverse is ? If we have > understood, Treverse doesn't delegate to index server engine (good in case of > index server trouble) but is built incomponent in oak: but where keep > repository graph to Traverse ? In memory ? on filesystem ? getting data from > db ? Traverse will physically traverse the repository in search for the right data. It's not the most efficient index and it's there mainly to operate in case either all other indexes are not suitable for the provided query or there are no other indexes. But be careful. It doesn't mean it's intrinsically a bad index. Let's take the following query as an example SELECT * FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/content/mysite/colour/red') AND colour = 'red' and you initialised the repository with the InitialContent that provides you some indexes, as I said in a previous email, and on top you have a PropertyIndex on `colour` and you have no Lucene index. Lucene is quite powerful with a lot of configuration options. Overall in the repository you have grossly the following node distribution - 10k nodes nt:unstructured - 5k nodes with colour red - 3 nodes under /content/mysite/colour/red For the above query, if you look at the plans you'll have the following costs (taking some freedom on numbers): - NodeTypeIndex 1 - PropertyIndex: 3000 - Traversing: 3 In this case the traversing index would actually be more performant than any other index as the query engine will have to post-analyse a set of only 3 nodes. > - we have to manage a potentially large amount of documents so we need > more than a node, so is it possibile clustering lucene ? You can't cluster the built-in lucene. If you're looking for such feature maybe a remote Solr can be a better solution but so far I don't think I heard the need of clustering lucene. You can have a look at my slides from the talk I gave to the adaptTo conference last year. They may help shedding some light on the query engine, even if the biggest part of my presentation were the 20 minutes of Q :) http://adapt.to/2015/en/schedule/scaling-the-query-with-oak.html HTH Davide
R: R: Critical questions about OAK
Yes but i'm asking if there is a way or a configuration to call rdbms using jcrrepository like oak examples in getting start. final DocumentMK.Builder builder = new DocumentMK.Builder(); builder.setBlobStore(createFileSystemBlobStore()); final DocumentNodeStore ns = getRDBDocumentNodeStore(builder); Oak oak = new Oak(ns); Jcr jcr = new Jcr(oak); Repository repo = jcr.createRepository(); Thanks. -Messaggio originale- Da: Julian Reschke [mailto:julian.resc...@greenbytes.de] Inviato: giovedì 3 marzo 2016 15:19 A: oak-dev@jackrabbit.apache.org Oggetto: Re: R: Critical questions about OAK On 2016-03-03 15:15, Ancona Francesco wrote: > Hi, > we have other questions about rdbms and query engine. > > About RDBMS > - we tried to run unit test on my postgres and seems to work but only if > i use oak methods such as update and remove. We'd like to use jcrRepository > to start our ECM management but doesn't work (can you see previous mail). > Could you confirm us this scenario () ? Could you give us other examples on > RDBMS that use jcrRepository ? > ... I already did. The test cases in oak-jcr run against an RDB persistence when invoked they way I told you yesterday...: mvn clean install -Prdb-postgres -Drdb.jdbc-url=jdbc:postgresql:oak -Drdb.jdbc-user=... -Drdb.jdbc-passwd=... -Dnsfixures=DOCUMENT_RDB -PintegrationTesting -Prdb-postgres Best regards, Julian This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
Re: R: Critical questions about OAK
On 2016-03-03 15:15, Ancona Francesco wrote: Hi, we have other questions about rdbms and query engine. About RDBMS - we tried to run unit test on my postgres and seems to work but only if i use oak methods such as update and remove. We'd like to use jcrRepository to start our ECM management but doesn't work (can you see previous mail). Could you confirm us this scenario () ? Could you give us other examples on RDBMS that use jcrRepository ? ... I already did. The test cases in oak-jcr run against an RDB persistence when invoked they way I told you yesterday...: mvn clean install -Prdb-postgres -Drdb.jdbc-url=jdbc:postgresql:oak -Drdb.jdbc-user=... -Drdb.jdbc-passwd=... -Dnsfixures=DOCUMENT_RDB -PintegrationTesting -Prdb-postgres Best regards, Julian
R: Critical questions about OAK
Hi, we have other questions about rdbms and query engine. About RDBMS - we tried to run unit test on my postgres and seems to work but only if i use oak methods such as update and remove. We'd like to use jcrRepository to start our ECM management but doesn't work (can you see previous mail). Could you confirm us this scenario () ? Could you give us other examples on RDBMS that use jcrRepository ? About query Engine - Could you explain more in deep what traverse is ? If we have understood, Treverse doesn't delegate to index server engine (good in case of index server trouble) but is built incomponent in oak: but where keep repository graph to Traverse ? In memory ? on filesystem ? getting data from db ? - we have to manage a potentially large amount of documents so we need more than a node, so is it possibile clustering lucene ? Thanks in advance, best regards -Messaggio originale- Da: Davide Giannella [mailto:dav...@apache.org] Inviato: mercoledì 2 marzo 2016 18:12 A: oak-dev@jackrabbit.apache.org Oggetto: Re: Critical questions about OAK On 01/03/2016 15:33, Ancona Francesco wrote: > ...2. Oak esplicitally doesn'i index anything so what's happens > when i search a document (or node) the first time ? (this is not > clear) > > a. The search is delegated always on index server (lucene > embedded or solr) return a resultset of nodes that match the query. > Oak never delegates to any persistence. It relies on its own query engine. Oak provides, 4 main index types: traverse, property, lucene and solr. If no index is defined, or no one is suitable for the provided query, the Traverse will come to play. It's a built-in index always there that will traverse the repository in search for the content complying with the query you provided. You define the index you need. Please read my previous email where I explained in more details the "doesn't index anything" aspect as well as the docs around the query engine. They may not explain how the query engine works but provides enough details for not having to read the code http://markmail.org/message/wvq7ggu737ex277b http://jackrabbit.apache.org/oak/docs/query/query.html > b. So mongodb (or RDBMS) is used only to render the metadata or > content binary > > 3. If i want better performance or i want want full text search > i have to create some indexes (3 type of indexes lucene, solr and > property of nodes) that improve efficiency of index server (lucene or > solr). These indexes don't have effect on RDBMS or mongodb in which > these kind of metadata are stored > If you need full-text capabilities, the only two indexes that provides it are Lucene and Solr. I'd go for lucene if you don't need any solr specific feature. You'll need to define your own index. You can find details in the docs http://jackrabbit.apache.org/oak/docs/query/query.html HTH Davide This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
Re: Critical questions about OAK
On 01/03/2016 15:33, Ancona Francesco wrote: > ...2. Oak esplicitally doesn’i index anything so what’s happens > when i search a document (or node) the first time ? (this is not clear) > > a. The search is delegated always on index server (lucene > embedded or solr) return a resultset of nodes that match the query. > Oak never delegates to any persistence. It relies on its own query engine. Oak provides, 4 main index types: traverse, property, lucene and solr. If no index is defined, or no one is suitable for the provided query, the Traverse will come to play. It's a built-in index always there that will traverse the repository in search for the content complying with the query you provided. You define the index you need. Please read my previous email where I explained in more details the "doesn't index anything" aspect as well as the docs around the query engine. They may not explain how the query engine works but provides enough details for not having to read the code http://markmail.org/message/wvq7ggu737ex277b http://jackrabbit.apache.org/oak/docs/query/query.html > b. So mongodb (or RDBMS) is used only to render the metadata or > content binary > > 3. If i want better performance or i want want full text search > i have to create some indexes (3 type of indexes lucene, solr and > property of nodes) that improve efficiency of index server (lucene or > solr). These indexes don’t have effect on RDBMS or mongodb in which > these kind of metadata are stored > If you need full-text capabilities, the only two indexes that provides it are Lucene and Solr. I'd go for lucene if you don't need any solr specific feature. You'll need to define your own index. You can find details in the docs http://jackrabbit.apache.org/oak/docs/query/query.html HTH Davide
Re: R: Critical questions about OAK
On 2016-03-02 08:33, Ancona Francesco wrote: We have used "Oracle 11.2 Express Edition" and Potgres 9.4 Oracle 11 is not supported (and yes, that's missing in the Javadocs). I'll send again the 2 logs but, can we have a matrix software compatibility for RDBMS that OAK supports ? RDBDocumentStore actually INFO-logs when it doesn't support a DB (so you really should have a look at the log file). But yes, it also needs to be in the documentation. Best regards, Julian
R: Critical questions about OAK
We have used "Oracle 11.2 Express Edition" and Potgres 9.4 I'll send again the 2 logs but, can we have a matrix software compatibility for RDBMS that OAK supports ? Thanks in advance. Best regards -Messaggio originale- Da: Julian Reschke [mailto:julian.resc...@gmx.de] Inviato: martedì 1 marzo 2016 18:59 A: oak-dev@jackrabbit.apache.org Oggetto: Re: Critical questions about OAK On 2016-03-01 16:33, Ancona Francesco wrote: > Hello, > > i'm very sorry but we have 2 big problem to solve if we want to > continue our project with oak platform. > > The first is that we can't manage to save on RDBMS neither metadata > nor binary. > > We tried both postgres and Oracle with a simple class that load a > simple node (is similar to the class in "getting start") > > In Oracle and in postgres we have a problem when we create a repository: > > Repository repo = jcr.createRepository(); > > I add to this mail 2 files that describe in detail these 2 errors > > This is a critical issue, cause some clients want to use an RDBMS; > besides should be very easy store in a RDBMS so we are a little perplexed. > ... Interesting enough, the exceptions for the two databases are very different. Again, please check the log files for any output of RDBDocumentStore and post it here. We test both with Oracle (12!, and that is important...) and Postgres, and we do not see these exceptions. You can verify that yourself by running the OAK unit tests. This means that something likely is different in the way you configure things (maybe the datasource implementation?, isolation levels?). So again, check the logs, or try to reproduce your problems inside the Oak unit test framework, so we can more easily investigate them. Best regards, Julian This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
Re: questions
Hi, If I had to do that I would probably model the ACLs for those state changes on application level (in your Workflow engine), not in the repository. But if you really want to do it in the repository I see 2 possible ways: 1. model the states as child nodes of the item in workflow, e.g. | -item -- draft Then, you could probably use wild card ACLs such that e.g. only a given group can remove nodes named “draft” and add nodes named “approved”. 2. another possible approach is to add your own SecurityProvider (Angela would know what the actual name is) that evaluates writes based on your logic. HTH Michael On 13 Oct 2014, at 18:58, TALHAOUI Mohamed m.talha...@rsd.com wrote: Hi, Most probably for the states. What about enforcing allowed transition and permissions ? Ex : state cannot change from DRAFT to APPROVED only users with approve privilege can set the state to APPROVE What would be your recommendation here ? Thanks -Original Message- From: Michael Marth [mailto:mma...@adobe.com] Sent: lundi 13 octobre 2014 17:43 To: oak-dev@jackrabbit.apache.org Subject: Re: questions Hi, My use case is very basic, I need to bind some LC states to a node type (something like DRAFT, PENDING, REJECTED, APPROVED) and allow a node to follow LC transition in response to a user action or a workflow action. I would simply add a property with those values to these nodes. Would that work? Cheers Michael
questions
Hi, I have some questions regarding the POC I am working on: Lifecycle management I have seen that it is not implemented in Oak while specified by the JCR. Is there any plan to implement it ? Observation How does it scale ? I need to have some custom operations executed on node creation, move, deletion, ... I guess Observation is the way to go, but I wonder how this scale in case I need to be able to handle several billions nodes ? ACL How does it scale ? If I query a large repo for nodes and only have access to few ones, how does the filtering work ? JCR vs RDBMS I come from the RDBMS world and I am pretty new to JCR so I apologize if these are dumb questions: * So far, I have manipulated the JCR API (node, properties, events, ...) and was able to cover my basic use cases. But, in a real application, I need to have OO modelisation and, therefore, at some point, have a way to map my business model to JCR nodes (something like an ORM). I found Jackrabbit OCMhttp://jackrabbit.apache.org/5-with-jackrabbit-ocm.html but nothing in Oak. Is there something in the pipe ? * What are the strategies and tooling for data migration ? I mean if I have millions of nodes of a certain type and need to do some modification in this type definition (adding a mandatory property or node, changing a property type, ); in this case how should I proceed ? Thanks in advance for your answers, Mohamed