Re: R: R: R: Critical questions about OAK

2016-03-12 Thread Michael Marth
Hi Francesco,

Query Engine
1. I didn't understand how Traverse recovers phisically the graph to traverse. 
Is provided in memory ? does it make a search on filesystem or db to obtain a 
correct portion of graph and then traverse it ?
2. Can you point out the Traverse classes ? Or unit test?

The Traversing Index is a fall back that Oak’s built-in query engine uses if no 
“real” index is able to answer a specific query. (this implies that all your 
queries should be backed by indexes). If the traversal index is used then the 
query engine will traverse the relevant parts of the tree (relevant == the tree 
specified in your query). Whether this traversal happens in memory, on disc or 
else is a concern of the lower level persistence layer and thus transparent to 
the query engine.
You can find related code here: 
https://github.com/apache/jackrabbit-oak/search?utf8=%E2%9C%93=traversingindex=Code
(but please note that: if you see traversals in the log this means that you 
should an index)

Instead for RDBMS question i noticed that with our simple class, the first time 
i add the node. The second time we obtain an error loading RepositoryImpl.
In detail when MutableTree try to make beforewrite, throw an illegalstate 
exception ("this tree does not exist")

It is hard to give a proper answer, but you mention “ MutableTree” which leads 
me to suspect that you have initialized/used Oak-internal classes. On 
application layer you should only use the JCR API to interact with the 
repository.

HTH
Michael


On 07/03/16 15:15, "Ancona Francesco" 
<francesco.anc...@siav.it<mailto:francesco.anc...@siav.it>> wrote:

Hi,
sorry if i continue to ask you about these critical questions but we'd like to 
build on OAK a platform that manage over 200M of documents so we'd like to know 
in deep how OAK works.

Query Engine
1. I didn't understand how Traverse recovers phisically the graph to traverse. 
Is provided in memory ? does it make a search on filesystem or db to obtain a 
correct portion of graph and then traverse it ?
2. Can you point out the Traverse classes ? Or unit test?

Instead for RDBMS question i noticed that with our simple class, the first time 
i add the node. The second time we obtain an error loading RepositoryImpl.
In detail when MutableTree try to make beforewrite, throw an illegalstate 
exception ("this tree does not exist")

Thanks in advance,
best regards

-Messaggio originale-
Da: Julian Reschke [mailto:julian.resc...@gmx.de]
Inviato: venerdì 4 marzo 2016 08:09
A: oak-dev@jackrabbit.apache.org<mailto:oak-dev@jackrabbit.apache.org>
Oggetto: Re: R: R: Critical questions about OAK

On 2016-03-03 15:48, Ancona Francesco wrote:
Yes but i'm asking if there is a way or a configuration to call rdbms using 
jcrrepository like oak examples in getting start.

final DocumentMK.Builder builder = new DocumentMK.Builder();
builder.setBlobStore(createFileSystemBlobStore());
final DocumentNodeStore ns = getRDBDocumentNodeStore(builder);
Oak oak = new Oak(ns);
Jcr jcr = new Jcr(oak);
Repository repo = jcr.createRepository();

Thanks.

It looks like some RepositoryInitializer is missing (AFAIU, it would take care 
of creating the initial content).

Best regards, Julian


This footnote confirms that this email message has been scanned by PineApp 
Mail-SeCure for the presence of malicious code, vandals & computer viruses.







R: R: R: Critical questions about OAK

2016-03-07 Thread Ancona Francesco
Hi,
sorry if i continue to ask you about these critical questions but we'd like to 
build on OAK a platform that manage over 200M of documents so we'd like to know 
in deep how OAK works.

Query Engine 
1.  I didn't understand how Traverse recovers phisically the graph to 
traverse. Is provided in memory ? does it make a search on filesystem or db to 
obtain a correct portion of graph and  then traverse it ?
2.  Can you point out the Traverse classes ? Or unit test? 

Instead for RDBMS question i noticed that with our simple class, the first time 
i add the node. The second time we obtain an error loading RepositoryImpl.
In detail when MutableTree try to make beforewrite, throw an illegalstate 
exception ("this tree does not exist")

Thanks in advance,
best regards

-Messaggio originale-
Da: Julian Reschke [mailto:julian.resc...@gmx.de] 
Inviato: venerdì 4 marzo 2016 08:09
A: oak-dev@jackrabbit.apache.org
Oggetto: Re: R: R: Critical questions about OAK

On 2016-03-03 15:48, Ancona Francesco wrote:
> Yes but i'm asking if there is a way or a configuration to call rdbms using 
> jcrrepository like oak examples in getting start.
>
>   final DocumentMK.Builder builder = new DocumentMK.Builder();
>   builder.setBlobStore(createFileSystemBlobStore());
>   final DocumentNodeStore ns = getRDBDocumentNodeStore(builder);
>   Oak oak = new Oak(ns);
>   Jcr jcr = new Jcr(oak);
>   
>   Repository repo = jcr.createRepository();
>
> Thanks.

It looks like some RepositoryInitializer is missing (AFAIU, it would take care 
of creating the initial content).

Best regards, Julian

 
 

This footnote confirms that this email message has been scanned by PineApp 
Mail-SeCure for the presence of malicious code, vandals & computer viruses.






R: R: Critical questions about OAK

2016-03-04 Thread Ancona Francesco
Hi,
other question, always about the query Engine.

1.  I didn't understand how Traverse recovers phisically the graph to 
traverse. Is provided in memory ? does it make a search on filesystem or db to 
obtain a correct portion of graph and  then traverse it ?

2.  Can you point out the Traverse classes ? Or unit test?

Thanks in advance
Best regards

-Messaggio originale-
Da: Davide Giannella [mailto:dav...@apache.org] 
Inviato: giovedì 3 marzo 2016 15:52
A: oak-dev@jackrabbit.apache.org
Oggetto: Re: R: Critical questions about OAK

On 03/03/2016 14:15, Ancona Francesco wrote:
> ...
> About query Engine
> - Could you explain more in deep what traverse is ? If we have 
> understood, Treverse doesn't delegate to index server engine (good in case of 
> index server trouble) but is built incomponent in oak: but where keep 
> repository graph to Traverse ? In memory ? on filesystem ? getting data from 
> db ? 
Traverse will physically traverse the repository in search for the right data. 
It's not the most efficient index and it's there mainly to operate in case 
either all other indexes are not suitable for the provided query or there are 
no other indexes.

But be careful. It doesn't mean it's intrinsically a bad index. Let's take the 
following query as an example

SELECT *
FROM [nt:unstructured] AS a
WHERE
ISDESCENDANTNODE(a, '/content/mysite/colour/red') AND colour = 'red'

and you initialised the repository with the InitialContent that provides you 
some indexes, as I said in a previous email, and on top you have a 
PropertyIndex on `colour` and you have no Lucene index. Lucene is quite 
powerful with a lot of configuration options.

Overall in the repository you have grossly the following node distribution

- 10k nodes nt:unstructured
- 5k nodes with colour red
- 3 nodes under /content/mysite/colour/red

For the above query, if you look at the plans you'll have the following costs 
(taking some freedom on numbers):

- NodeTypeIndex 1
- PropertyIndex: 3000
- Traversing: 3

In this case the traversing index would actually be more performant than any 
other index as the query engine will have to post-analyse a set of only 3 nodes.
> - we have to manage a potentially large amount of documents so we need 
> more than a node, so is it possibile clustering lucene ?
You can't cluster the built-in lucene. If you're looking for such feature maybe 
a remote Solr can be a better solution but so far I don't think I heard the 
need of clustering lucene.

You can have a look at my slides from the talk I gave to the adaptTo conference 
last year. They may help shedding some light on the query engine, even if the 
biggest part of my presentation were the 20 minutes of Q :)

http://adapt.to/2015/en/schedule/scaling-the-query-with-oak.html

HTH
Davide

 
 

This footnote confirms that this email message has been scanned by PineApp 
Mail-SeCure for the presence of malicious code, vandals & computer viruses.






Re: R: R: Critical questions about OAK

2016-03-03 Thread Julian Reschke

On 2016-03-03 15:48, Ancona Francesco wrote:

Yes but i'm asking if there is a way or a configuration to call rdbms using 
jcrrepository like oak examples in getting start.

final DocumentMK.Builder builder = new DocumentMK.Builder();
builder.setBlobStore(createFileSystemBlobStore());
final DocumentNodeStore ns = getRDBDocumentNodeStore(builder);
Oak oak = new Oak(ns);
Jcr jcr = new Jcr(oak);

Repository repo = jcr.createRepository();

Thanks.


It looks like some RepositoryInitializer is missing (AFAIU, it would 
take care of creating the initial content).


Best regards, Julian


Re: R: Critical questions about OAK

2016-03-03 Thread Davide Giannella
On 03/03/2016 14:15, Ancona Francesco wrote:
> ...
> About query Engine
> - Could you explain more in deep what traverse is ? If we have 
> understood, Treverse doesn't delegate to index server engine (good in case of 
> index server trouble) but is built incomponent in oak: but where keep 
> repository graph to Traverse ? In memory ? on filesystem ? getting data from 
> db ? 
Traverse will physically traverse the repository in search for the right
data. It's not the most efficient index and it's there mainly to operate
in case either all other indexes are not suitable for the provided query
or there are no other indexes.

But be careful. It doesn't mean it's intrinsically a bad index. Let's
take the following query as an example

SELECT *
FROM [nt:unstructured] AS a
WHERE
ISDESCENDANTNODE(a, '/content/mysite/colour/red')
AND colour = 'red'

and you initialised the repository with the InitialContent that provides
you some indexes, as I said in a previous email, and on top you have a
PropertyIndex on `colour` and you have no Lucene index. Lucene is quite
powerful with a lot of configuration options.

Overall in the repository you have grossly the following node distribution

- 10k nodes nt:unstructured
- 5k nodes with colour red
- 3 nodes under /content/mysite/colour/red

For the above query, if you look at the plans you'll have the following
costs (taking some freedom on numbers):

- NodeTypeIndex 1
- PropertyIndex: 3000
- Traversing: 3

In this case the traversing index would actually be more performant than
any other index as the query engine will have to post-analyse a set of
only 3 nodes.
> - we have to manage a potentially large amount of documents so we need 
> more than a node, so is it possibile clustering lucene ?
You can't cluster the built-in lucene. If you're looking for such
feature maybe a remote Solr can be a better solution but so far I don't
think I heard the need of clustering lucene.

You can have a look at my slides from the talk I gave to the adaptTo
conference last year. They may help shedding some light on the query
engine, even if the biggest part of my presentation were the 20 minutes
of Q :)

http://adapt.to/2015/en/schedule/scaling-the-query-with-oak.html

HTH
Davide


R: R: Critical questions about OAK

2016-03-03 Thread Ancona Francesco
Yes but i'm asking if there is a way or a configuration to call rdbms using 
jcrrepository like oak examples in getting start.

final DocumentMK.Builder builder = new DocumentMK.Builder();
builder.setBlobStore(createFileSystemBlobStore());
final DocumentNodeStore ns = getRDBDocumentNodeStore(builder);
Oak oak = new Oak(ns);
Jcr jcr = new Jcr(oak);

Repository repo = jcr.createRepository();

Thanks.

-Messaggio originale-
Da: Julian Reschke [mailto:julian.resc...@greenbytes.de] 
Inviato: giovedì 3 marzo 2016 15:19
A: oak-dev@jackrabbit.apache.org
Oggetto: Re: R: Critical questions about OAK

On 2016-03-03 15:15, Ancona Francesco wrote:
> Hi,
> we have other questions about rdbms and query engine.
>
> About RDBMS
> - we tried to run unit test on my postgres and seems to work but only if 
> i use oak methods such as update and remove. We'd like to use jcrRepository 
> to start our ECM management  but doesn't work (can you see previous mail). 
> Could you confirm us this scenario () ? Could you give us other examples on 
> RDBMS that use jcrRepository ?
> ...

I already did. The test cases in oak-jcr run against an RDB persistence when 
invoked they way I told you yesterday...:

mvn clean install -Prdb-postgres -Drdb.jdbc-url=jdbc:postgresql:oak
-Drdb.jdbc-user=... -Drdb.jdbc-passwd=... -Dnsfixures=DOCUMENT_RDB 
-PintegrationTesting -Prdb-postgres

Best regards, Julian

 
 

This footnote confirms that this email message has been scanned by PineApp 
Mail-SeCure for the presence of malicious code, vandals & computer viruses.






Re: R: Critical questions about OAK

2016-03-03 Thread Julian Reschke

On 2016-03-03 15:15, Ancona Francesco wrote:

Hi,
we have other questions about rdbms and query engine.

About RDBMS
-   we tried to run unit test on my postgres and seems to work but only if 
i use oak methods such as update and remove. We'd like to use jcrRepository to 
start our ECM management  but doesn't work (can you see previous mail). Could 
you confirm us this scenario () ? Could you give us other examples on RDBMS 
that use jcrRepository ?
...


I already did. The test cases in oak-jcr run against an RDB persistence 
when invoked they way I told you yesterday...:


mvn clean install -Prdb-postgres -Drdb.jdbc-url=jdbc:postgresql:oak 
-Drdb.jdbc-user=... -Drdb.jdbc-passwd=... -Dnsfixures=DOCUMENT_RDB 
-PintegrationTesting -Prdb-postgres


Best regards, Julian


R: Critical questions about OAK

2016-03-03 Thread Ancona Francesco
Hi,
we have other questions about rdbms and query engine.

About RDBMS
-   we tried to run unit test on my postgres and seems to work but only if 
i use oak methods such as update and remove. We'd like to use jcrRepository to 
start our ECM management  but doesn't work (can you see previous mail). Could 
you confirm us this scenario () ? Could you give us other examples on RDBMS 
that use jcrRepository ? 

About query Engine
-   Could you explain more in deep what traverse is ? If we have 
understood, Treverse doesn't delegate to index server engine (good in case of 
index server trouble) but is built incomponent in oak: but where keep 
repository graph to Traverse ? In memory ? on filesystem ? getting data from db 
? 
-   we have to manage a potentially large amount of documents so we need 
more than a node, so is it possibile clustering lucene ?   

Thanks in advance,
best regards

-Messaggio originale-
Da: Davide Giannella [mailto:dav...@apache.org] 
Inviato: mercoledì 2 marzo 2016 18:12
A: oak-dev@jackrabbit.apache.org
Oggetto: Re: Critical questions about OAK

On 01/03/2016 15:33, Ancona Francesco wrote:
> ...2.   Oak esplicitally doesn'i index anything so what's happens
> when i search a document (or node)  the first time ? (this is not 
> clear)
>
> a.   The search is delegated always on index server (lucene
> embedded or solr) return a resultset of nodes that match the query.
>
Oak never delegates to any persistence. It relies on its own query engine.

Oak provides, 4 main index types: traverse, property, lucene and solr.
If no index is defined, or no one is suitable for the provided query, the 
Traverse will come to play. It's a built-in index always there that will 
traverse the repository in search for the content complying with the query you 
provided.

You define the index you need. Please read my previous email where I explained 
in more details the "doesn't index anything" aspect as well as the docs around 
the query engine. They may not explain how the query engine works but provides 
enough details for not having to read the code

http://markmail.org/message/wvq7ggu737ex277b
http://jackrabbit.apache.org/oak/docs/query/query.html

> b.  So mongodb (or RDBMS) is used only to render the metadata or
> content binary
>
> 3.   If i want better performance or i want want full text search
> i have to create some indexes (3 type of indexes lucene, solr and 
> property of nodes) that improve efficiency of index server (lucene or 
> solr).  These indexes don't have effect on RDBMS or mongodb in which 
> these kind of metadata are stored
>
If you need full-text capabilities, the only two indexes that provides it are 
Lucene and Solr. I'd go for lucene if you don't need any solr specific feature. 
You'll need to define your own index. You can find details in the docs

http://jackrabbit.apache.org/oak/docs/query/query.html

HTH
Davide



 
 

This footnote confirms that this email message has been scanned by PineApp 
Mail-SeCure for the presence of malicious code, vandals & computer viruses.






Re: Critical questions about OAK

2016-03-02 Thread Davide Giannella
On 01/03/2016 15:33, Ancona Francesco wrote:
> ...2.   Oak esplicitally doesn’i index anything so what’s happens
> when i search a document (or node)  the first time ? (this is not clear)
>
> a.   The search is delegated always on index server (lucene
> embedded or solr) return a resultset of nodes that match the query.
>
Oak never delegates to any persistence. It relies on its own query engine.

Oak provides, 4 main index types: traverse, property, lucene and solr.
If no index is defined, or no one is suitable for the provided query, 
the Traverse will come to play. It's a built-in index always there that
will traverse the repository in search for the content complying with
the query you provided.

You define the index you need. Please read my previous email where I
explained in more details the "doesn't index anything" aspect as well as
the docs around the query engine. They may not explain how the query
engine works but provides enough details for not having to read the code

http://markmail.org/message/wvq7ggu737ex277b
http://jackrabbit.apache.org/oak/docs/query/query.html

> b.  So mongodb (or RDBMS) is used only to render the metadata or
> content binary
>
> 3.   If i want better performance or i want want full text search
> i have to create some indexes (3 type of indexes lucene, solr and
> property of nodes) that improve efficiency of index server (lucene or
> solr).  These indexes don’t have effect on RDBMS or mongodb in which
> these kind of metadata are stored
>
If you need full-text capabilities, the only two indexes that provides
it are Lucene and Solr. I'd go for lucene if you don't need any solr
specific feature. You'll need to define your own index. You can find
details in the docs

http://jackrabbit.apache.org/oak/docs/query/query.html

HTH
Davide




Re: R: Critical questions about OAK

2016-03-01 Thread Julian Reschke

On 2016-03-02 08:33, Ancona Francesco wrote:

We have used "Oracle 11.2 Express Edition" and Potgres 9.4


Oracle 11 is not supported (and yes, that's missing in the Javadocs).


I'll send again the 2 logs but, can we have a matrix software compatibility for 
RDBMS that OAK supports ?


RDBDocumentStore actually INFO-logs when it doesn't support a DB (so you 
really should have a look at the log file).


But yes, it also needs to be in the documentation.

Best regards, Julian



R: Critical questions about OAK

2016-03-01 Thread Ancona Francesco
We have used "Oracle 11.2 Express Edition" and Potgres 9.4
I'll send again the 2 logs but, can we have a matrix software compatibility for 
RDBMS that OAK supports ?

Thanks in advance.
Best regards


-Messaggio originale-
Da: Julian Reschke [mailto:julian.resc...@gmx.de] 
Inviato: martedì 1 marzo 2016 18:59
A: oak-dev@jackrabbit.apache.org
Oggetto: Re: Critical questions about OAK

On 2016-03-01 16:33, Ancona Francesco wrote:
> Hello,
>
> i'm very sorry but we have 2 big problem to solve if we want to 
> continue our project with oak platform.
>
> The first is that we can't manage to save on RDBMS neither metadata 
> nor binary.
>
> We tried both postgres  and Oracle with a simple class that load a 
> simple node (is similar to the class in "getting start")
>
> In Oracle and in postgres we have a problem when we create a repository:
>
> Repository repo = jcr.createRepository();
>
> I add to this mail 2 files that describe in detail these 2 errors
>
> This is a critical issue, cause some clients want to use an RDBMS; 
> besides should be very easy store in a RDBMS so we are a little perplexed.
> ...

Interesting enough, the exceptions for the two databases are very different.

Again, please check the log files for any output of RDBDocumentStore and post 
it here.

We test both with Oracle (12!, and that is important...) and Postgres, and we 
do not see these exceptions. You can verify that yourself by running the OAK 
unit tests. This means that something likely is different in the way you 
configure things (maybe the datasource implementation?, isolation levels?).

So again, check the logs, or try to reproduce your problems inside the Oak unit 
test framework, so we can more easily investigate them.

Best regards, Julian

 
 

This footnote confirms that this email message has been scanned by PineApp 
Mail-SeCure for the presence of malicious code, vandals & computer viruses.






Re: questions

2014-10-16 Thread Michael Marth
Hi,

If I had to do that I would probably model the ACLs for those state changes on 
application level (in your Workflow engine), not in the repository.

But if you really want to do it in the repository I see 2 possible ways:
1. model the states as child nodes of the item in workflow, e.g.
|
-item
-- draft
Then, you could probably use wild card ACLs such that e.g. only a given group 
can remove nodes named “draft” and add nodes named “approved”.
2. another possible approach is to add your own SecurityProvider (Angela would 
know what the actual name is) that evaluates writes based on your logic.

HTH
Michael


On 13 Oct 2014, at 18:58, TALHAOUI Mohamed m.talha...@rsd.com wrote:

 Hi,
 
 Most probably for the states.
 What about enforcing allowed transition and permissions ?
 Ex :
 state cannot change from DRAFT to APPROVED
 only users with approve privilege can set the state to APPROVE  
 
 What would be your recommendation here ?
 
 Thanks
 
 -Original Message-
 From: Michael Marth [mailto:mma...@adobe.com] 
 Sent: lundi 13 octobre 2014 17:43
 To: oak-dev@jackrabbit.apache.org
 Subject: Re: questions
 
 Hi,
 
 My use case is very basic, I need to bind some LC states to a node type 
 (something like DRAFT, PENDING, REJECTED, APPROVED) and allow a node to 
 follow LC transition in response to a user action or a workflow action.
 
 I would simply add a property with those values to these nodes. Would that 
 work?
 
 Cheers
 Michael



questions

2014-10-13 Thread TALHAOUI Mohamed
Hi,

I have some questions regarding the POC I am working on:

Lifecycle management
I have seen that it is not implemented in Oak while specified by the JCR.
Is there any plan to implement it ?

Observation
How does it scale ?
I need to have some custom operations executed on node creation, move, 
deletion, ...
I guess Observation is the way to go, but I wonder how this scale in case I 
need to be able to handle several billions nodes ?

ACL
How does it scale ?
If I query a large repo for nodes and only have access to few ones, how does 
the filtering work ?

JCR vs RDBMS
I come from the RDBMS world and I am pretty new to JCR so I apologize if these 
are dumb questions:

* So far, I have manipulated the JCR API (node, properties, events, 
...) and was able to cover my basic use cases.

But, in a real application, I need to have OO modelisation and, therefore, at 
some point, have a way to map my business model to JCR nodes (something like an 
ORM).

I found Jackrabbit OCMhttp://jackrabbit.apache.org/5-with-jackrabbit-ocm.html 
but nothing in Oak.

Is there something in the pipe  ?

* What are the strategies and tooling for data migration ?

I mean if I have millions of nodes of a certain type and need to do some 
modification in this type definition (adding a mandatory property or node, 
changing a property type, ); in this case how should I proceed ?

Thanks in advance for your answers,
Mohamed