[jira] [Updated] (OAK-4802) Basic cache consistency test on exception

2016-09-15 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-4802:

Labels: candidate_oak_1_0 candidate_oak_1_2 candidate_oak_1_4 resilience  
(was: )

> Basic cache consistency test on exception
> -
>
> Key: OAK-4802
> URL: https://issues.apache.org/jira/browse/OAK-4802
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: core, documentmk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
>  Labels: candidate_oak_1_0, candidate_oak_1_2, candidate_oak_1_4, 
> resilience
> Fix For: 1.6, 1.5.11
>
>
> OAK-4774 and OAK-4793 aim to check if the cache behaviour of a DocumentStore 
> implementation when the underlying backend throws an exception even though 
> the operation succeeded. E.g. the response cannot be sent back because of a 
> network issue.
> This issue will provide the DocumentStore independent part of those tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4793) Check usage of DocumentStoreException in RDBDocumentStore

2016-09-15 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-4793:

Labels: candidate_oak_1_0 candidate_oak_1_2 candidate_oak_1_4 resilience  
(was: )

> Check usage of DocumentStoreException in RDBDocumentStore
> -
>
> Key: OAK-4793
> URL: https://issues.apache.org/jira/browse/OAK-4793
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, rdbmk
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
>Priority: Minor
>  Labels: candidate_oak_1_0, candidate_oak_1_2, candidate_oak_1_4, 
> resilience
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4793-1.diff, OAK-4793-2.diff, OAK-4793.diff, 
> OAK-4793.diff
>
>
> With OAK-4771 the usage of DocumentStoreException was clarified in the 
> DocumentStore interface. The purpose of this task is to check usage of the 
> DocumentStoreException in RDBDocumentStore and make sure JDBC driver specific 
> exceptions are handled consistently and wrapped in a DocumentStoreException. 
> At the same time, cache consistency needs to be checked as well in case of a 
> driver exception. E.g. invalidate if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-15 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4043:
-
Priority: Blocker  (was: Critical)

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Davide Giannella
>Priority: Blocker
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-15 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495519#comment-15495519
 ] 

Chetan Mehrotra edited comment on OAK-4043 at 9/16/16 6:23 AM:
---

I would prefer to avoid any change in existing names. Probably we can follow 
the convention of such names having prefix of {{async}} (possibly add a check 
in Oak also to enforce that) and then figure out the list of checkpoints based 
on that. 

Or you can possibly consider any string property in {{:async}} to be a possible 
checkpoint to determine match!

Given the impact of this issue marking this as blocker for next dot release



was (Author: chetanm):
I would prefer to avoid any change in existing names. Probably we can follow 
the convention of such names having prefix of {{async}} (possibly add a check 
in Oak also to enforce that) and then figure out the list of checkpoints based 
on that. 

Or you can possibly consider any string property in {{:async}} to be a possible 
checkpoint to determine match!



> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Davide Giannella
>Priority: Blocker
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-15 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4043:
-
Fix Version/s: 1.5.11

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Davide Giannella
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-15 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495519#comment-15495519
 ] 

Chetan Mehrotra commented on OAK-4043:
--

I would prefer to avoid any change in existing names. Probably we can follow 
the convention of such names having prefix of {{async}} (possibly add a check 
in Oak also to enforce that) and then figure out the list of checkpoints based 
on that. 

Or you can possibly consider any string property in {{:async}} to be a possible 
checkpoint to determine match!



> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Davide Giannella
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-15 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4043:
-
Labels: candidate_oak_1_4  (was: )

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Davide Giannella
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2498) Root record references provide too little context for parsing a segment

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493751#comment-15493751
 ] 

Michael Dürig commented on OAK-2498:


For OAK-4740 we also need to be able to identify references to external 
binaries. More generally I would suggest we specialise the VALUE type to the 
different types of values. 

> Root record references provide too little context for parsing a segment
> ---
>
> Key: OAK-2498
> URL: https://issues.apache.org/jira/browse/OAK-2498
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>  Labels: tools
> Fix For: Segment Tar 0.0.14
>
>
> According to the [documentation | 
> http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html] the root 
> record references in a segment header provide enough context for parsing all 
> records within this segment without any external information. 
> Turns out this is not true: if a root record reference turns e.g. to a list 
> record. The items in that list are record ids of unknown type. So even though 
> those records might live in the same segment, we can't parse them as we don't 
> know their type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493743#comment-15493743
 ] 

Michael Dürig commented on OAK-4740:


Ok got it. As you mentioned before we best wait with this for OAK-2498 as this 
will simplify regenerating the binary references quite a bit. 

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-4793) Check usage of DocumentStoreException in RDBDocumentStore

2016-09-15 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-4793.
-
   Resolution: Fixed
Fix Version/s: 1.5.11

trunk: [r1760946|http://svn.apache.org/r1760946]


> Check usage of DocumentStoreException in RDBDocumentStore
> -
>
> Key: OAK-4793
> URL: https://issues.apache.org/jira/browse/OAK-4793
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, rdbmk
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4793-1.diff, OAK-4793-2.diff, OAK-4793.diff, 
> OAK-4793.diff
>
>
> With OAK-4771 the usage of DocumentStoreException was clarified in the 
> DocumentStore interface. The purpose of this task is to check usage of the 
> DocumentStoreException in RDBDocumentStore and make sure JDBC driver specific 
> exceptions are handled consistently and wrapped in a DocumentStoreException. 
> At the same time, cache consistency needs to be checked as well in case of a 
> driver exception. E.g. invalidate if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493526#comment-15493526
 ] 

Marcel Reutegger commented on OAK-4811:
---

I see, then I apologize for the rant. It was just rather annoying to analyze 
what's going wrong here...

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4814) Add orderby support for nodename index

2016-09-15 Thread Thomas Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-4814:

Fix Version/s: 1.6

> Add orderby support for nodename index
> --
>
> Key: OAK-4814
> URL: https://issues.apache.org/jira/browse/OAK-4814
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.5.10
>Reporter: Ankush Malhotra
> Fix For: 1.6
>
>
> In OAK-1752 you have implemented the index support for :nodeName. The JCR 
> Query explain tool shows that it is used for conditions like equals.
> But it is not used for ORDER BY name() .
> Is name() supported in order by clause? If yes then we would need to add 
> support for that in oak-lucene



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493450#comment-15493450
 ] 

Francesco Mari commented on OAK-4740:
-

[~mduerig], the missing part would be to hook into the recovery process, as 
invoked from {{TarReader.generateTarFile()}} and implemented by 
{{TarReader.DEFAULT_TAR_RECOVERY}}, and regenerate the graph and the binary 
references index by parsing recovered segments and extracting the necessary 
information.

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4814) Add orderby support for nodename index

2016-09-15 Thread Ankush Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493429#comment-15493429
 ] 

Ankush Malhotra commented on OAK-4814:
--

[~tmueller] [~chetanm] can you please review this one? Thanks.

> Add orderby support for nodename index
> --
>
> Key: OAK-4814
> URL: https://issues.apache.org/jira/browse/OAK-4814
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.5.10
>Reporter: Ankush Malhotra
>
> In OAK-1752 you have implemented the index support for :nodeName. The JCR 
> Query explain tool shows that it is used for conditions like equals.
> But it is not used for ORDER BY name() .
> Is name() supported in order by clause? If yes then we would need to add 
> support for that in oak-lucene



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-4814) Add orderby support for nodename index

2016-09-15 Thread Ankush Malhotra (JIRA)
Ankush Malhotra created OAK-4814:


 Summary: Add orderby support for nodename index
 Key: OAK-4814
 URL: https://issues.apache.org/jira/browse/OAK-4814
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query
Affects Versions: 1.5.10
Reporter: Ankush Malhotra


In OAK-1752 you have implemented the index support for :nodeName. The JCR Query 
explain tool shows that it is used for conditions like equals.
But it is not used for ORDER BY name() .
Is name() supported in order by clause? If yes then we would need to add 
support for that in oak-lucene



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-4803) Simplify the client side of the cold standby

2016-09-15 Thread Francesco Mari (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Mari resolved OAK-4803.
-
Resolution: Fixed

Fixed at r1760934.

> Simplify the client side of the cold standby
> 
>
> Key: OAK-4803
> URL: https://issues.apache.org/jira/browse/OAK-4803
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.12
>
> Attachments: OAK-4803-01.patch
>
>
> The implementation of the cold standby client is overly and unnecessarily 
> complicated. It would be way clearer to separate the client code in two major 
> components: a simple client responsible for sending messages to and receive 
> responses from the standby server, and the synchronization algorithm used to 
> read data from the server and to save read data in the local {{FileStore}}.
> Moreover, the client simple client could be further modularised by 
> encapsulating request encoding, response decoding and message handling into 
> their own Netty handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493345#comment-15493345
 ] 

Dominique Jäggi commented on OAK-4811:
--

[~mreutegg], i didn't consciously ignore this test. It came with the merged 
commit.

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493322#comment-15493322
 ] 

Michael Dürig commented on OAK-4740:


Yes that would also help my understanding what this issue was about 
initially... is recovery for the binary index completely broken or did we just 
break it for binaries with > 4k ids? My fix applies to the latter only.



> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger resolved OAK-4811.
---
Resolution: Fixed

With the changes for OAK-4174, the MongoToMongoFbsTest now also succeeds.

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4174) SegmentToJdbcTest failing with improvements of OAK-4119

2016-09-15 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger updated OAK-4174:
--
Fix Version/s: 1.4.8

Merged into 1.4 branch: http://svn.apache.org/r1760930

> SegmentToJdbcTest failing with improvements of  OAK-4119
> 
>
> Key: OAK-4174
> URL: https://issues.apache.org/jira/browse/OAK-4174
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Reporter: angela
>Assignee: Tomek Rękawek
>Priority: Critical
> Fix For: 1.6, 1.4.8
>
>
> Despite the fact that  OAK-4128 has been fixed I get a test failure for 
> SegmentToJdbcTest.validateMigration with my pending improvements from 
> OAK-4119.
> In order not to break the build I will tmp mark the test with @Ignore as 
> discussed on the mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493291#comment-15493291
 ] 

Marcel Reutegger commented on OAK-4811:
---

The root cause is a missing fix when changes for OAK-4679 were backported: 
OAK-4174 is also required.

[~dominique.jaeggi], please do not simply ignore tests when you commit changes. 
At least a comment in the issue would be nice and makes it easier for others to 
clean up afterwards.

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493289#comment-15493289
 ] 

Alex Parvulescu commented on OAK-4740:
--

bq. From here I'm not entirely sure what's left to do here.
add a test for the recovery bits maybe? :)

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493251#comment-15493251
 ] 

Michael Dürig commented on OAK-4740:


Ok, committed that fix at http://svn.apache.org/viewvc?rev=1760927&view=rev.

>From here I'm not entirely sure what's left to do here. Does tar regeneration 
>work already with this fix or is there work left do do in that area? 

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492909#comment-15492909
 ] 

Marcel Reutegger edited comment on OAK-4811 at 9/15/16 12:49 PM:
-

The test starts to fail on the 1.4 branch with changes merged from OAK-4679 in 
revision http://svn.apache.org/r1756641

However, I don't think those changes are the root cause for the failing test.


was (Author: mreutegg):
The test starts to fail on the 1.4 branch with changes merged from OAK-4679 in 
revision svn.apache.org/r1756641

However, I don't think those changes are the root cause for the failing test.

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493210#comment-15493210
 ] 

Francesco Mari commented on OAK-4740:
-

It looks good to me. It seems like a reasonable tradeoff between having a 
comprehensive binary reference index and being able to recover the index in 
case of dirty shutdown.

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493191#comment-15493191
 ] 

Michael Dürig edited comment on OAK-4740 at 9/15/16 12:40 PM:
--

Given the realisation that above monotonicity assumption does not hold and the 
possibly extra complexity wrt. DSGC I started thinking about other ways to fix 
this. 

One idea would be to keep the discrimination of binaries ids (smaller / bigger 
than 4k) and the way they are stored but to change their representation in the 
binary index introduced with OAK-4201: for binary ids bigger that 4k, what if 
we just put the record id pointing to the string record containing the blob id 
into the index (instead of the blob id itself)? This would give us back 
recoverability. OTOH it would make the index a bit more expensive to use as big 
binaries would still need an additional resolution step. However, I think this 
is a good trade off to make as we should discourage binary ids bigger than 4k 
anyway. 

See 
https://github.com/mduerig/jackrabbit-oak/commit/c7ce960a422fa3ae9f5cbe97d1cf05c63988b036
 for a POC of this.

[~frm], let me know what you think about this. 


was (Author: mduerig):
Given the realisation that above monotonicity assumption does not hold and the 
possibly extra complexity wrt. DSGC I started thinking about other ways to fix 
this. 

One idea would be to keep the discrimination of binaries ids (smaller / bigger 
than 4k) and the way they are stored but to change their representation in the 
binary index introduced with OAK-4201: for binary ids bigger that 4k, what if 
we just put the record id pointing to the string record containing the blob id 
into the index (instead of the blob id itself)? This would give us back 
recoverability. OTOH it would make the index a bit more expensive to use as big 
binaries would still need an additional resolution step. However, I think this 
is a good trade off to make as we should discourage binary ids bigger than 4k 
anyway. 

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493191#comment-15493191
 ] 

Michael Dürig commented on OAK-4740:


Given the realisation that above monotonicity assumption does not hold and the 
possibly extra complexity wrt. DSGC I started thinking about other ways to fix 
this. 

One idea would be to keep the discrimination of binaries ids (smaller / bigger 
than 4k) and the way they are stored but to change their representation in the 
binary index introduced with OAK-4201: for binary ids bigger that 4k, what if 
we just put the record id pointing to the string record containing the blob id 
into the index (instead of the blob id itself)? This would give us back 
recoverability. OTOH it would make the index a bit more expensive to use as big 
binaries would still need an additional resolution step. However, I think this 
is a good trade off to make as we should discourage binary ids bigger than 4k 
anyway. 

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4740) TarReader recovery skips generating the index and binary graphs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493168#comment-15493168
 ] 

Michael Dürig commented on OAK-4740:


That threshold means that our assumption about the id of binaries would grow 
monotonically with the size of the binary doesn't hold. 

> TarReader recovery skips generating the index and binary graphs
> ---
>
> Key: OAK-4740
> URL: https://issues.apache.org/jira/browse/OAK-4740
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.16
>
>
> As noticed from the tar recovery bits [0] the resulting tar file would lack 
> the binary reference graph and index graph. This has implications on the DSGC 
> (not properly reporting binary references would result in binaries being 
> GC'ed) and GC operations.
> / cc [~frm], [~mduerig]
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-4813) Simplify the server side of cold standby

2016-09-15 Thread Andrei Dulceanu (JIRA)
Andrei Dulceanu created OAK-4813:


 Summary: Simplify the server side of cold standby
 Key: OAK-4813
 URL: https://issues.apache.org/jira/browse/OAK-4813
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: segment-tar
Reporter: Andrei Dulceanu
Assignee: Andrei Dulceanu
Priority: Minor
 Fix For: Segment Tar 0.0.12


With the changes introduced in OAK-4803, it would be nice to keep the previous 
symmetry between the client and server and remove thus the  {{FileStore}} 
reference from the latter.

Per [~frm]'s suggestion from one of the comments in OAK-4803:
bq. In the end, these are the only three lines where the FileStore is used in 
the server, which already suggests that this separation of concerns exists - at 
least at the level of the handlers.

{code:java}
p.addLast(new GetHeadRequestHandler(new DefaultStandbyHeadReader(store)));
p.addLast(new GetSegmentRequestHandler(new DefaultStandbySegmentReader(store)));
p.addLast(new GetBlobRequestHandler(new DefaultStandbyBlobReader(store)));
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4803) Simplify the client side of the cold standby

2016-09-15 Thread Andrei Dulceanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493120#comment-15493120
 ] 

Andrei Dulceanu commented on OAK-4803:
--

[~frm], I guess you're right with the separation of concerns pointed out for my 
1st suggestion.
bq. Is this maybe the scope of another improvement issue?
Maybe it makes sense to move this to another issue, since this issue already 
handles quite a number of aspects.
bq. I quite don't agree here. You need a client to perform a sync.
Reading the whole explanation, I think it's a valid point of view so I will 
adhere to it :)

> Simplify the client side of the cold standby
> 
>
> Key: OAK-4803
> URL: https://issues.apache.org/jira/browse/OAK-4803
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.12
>
> Attachments: OAK-4803-01.patch
>
>
> The implementation of the cold standby client is overly and unnecessarily 
> complicated. It would be way clearer to separate the client code in two major 
> components: a simple client responsible for sending messages to and receive 
> responses from the standby server, and the synchronization algorithm used to 
> read data from the server and to save read data in the local {{FileStore}}.
> Moreover, the client simple client could be further modularised by 
> encapsulating request encoding, response decoding and message handling into 
> their own Netty handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-4783) Update Oak 1.0 to Jackrabbit 2.8.3

2016-09-15 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-4783.
-
Resolution: Fixed

1.0: [r1760921|http://svn.apache.org/r1760921]

> Update Oak 1.0 to  Jackrabbit 2.8.3
> ---
>
> Key: OAK-4783
> URL: https://issues.apache.org/jira/browse/OAK-4783
> Project: Jackrabbit Oak
>  Issue Type: Task
>Affects Versions: 1.0.33
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4783) Update Oak 1.0 to Jackrabbit 2.8.3

2016-09-15 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-4783:

Fix Version/s: 1.0.34

> Update Oak 1.0 to  Jackrabbit 2.8.3
> ---
>
> Key: OAK-4783
> URL: https://issues.apache.org/jira/browse/OAK-4783
> Project: Jackrabbit Oak
>  Issue Type: Task
>Affects Versions: 1.0.33
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Fix For: 1.0.34
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4803) Simplify the client side of the cold standby

2016-09-15 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493103#comment-15493103
 ] 

Francesco Mari commented on OAK-4803:
-

bq. When looking at StandbyClient and StandbyServer I find it a little bit odd 
that the former doesn't have a FileStore, while the latter has. IMHO having a 
FileStore reference in the client would make sense and would also help reading 
the code, as it is much clearer from the beginning who owns what.

This comment makes very much sense, because it highlights an asymmetry between 
the client and the server. To be honest, I prefer the way the client is 
written, without an explicit reference to the {{FileStore}}. This highlights 
the fact that the client is concerned with sending requests and parsing 
responses from the server, instead of taking care of how the data is actually 
used. I could have achieved the same separation of concerns in the server as 
well by introducing a very small interface. In the end, these are the only 
three lines where the {{FileStore}} is used in the server, which already 
suggests that this separation of concerns exists - at least at the level of the 
handlers.

{noformat}
p.addLast(new GetHeadRequestHandler(new DefaultStandbyHeadReader(store)));
p.addLast(new GetSegmentRequestHandler(new DefaultStandbySegmentReader(store)));
p.addLast(new GetBlobRequestHandler(new DefaultStandbyBlobReader(store)));
{noformat}

Is this maybe the scope of another improvement issue?

bq. It looks more natural to have a client which wants to perform a sync as 
opposed to have a sync which will create a client.

I quite don't agree here. You need a client to perform a sync. The client could 
be used for different purposes other than the sync, so it makes sense of having 
the sync process depending on a client and not the other way around. Anyway, 
the lines you pointed out are leftovers from the refactoring and I recognise 
that they are confusing. I will clean them up.

bq. One minor change in {{StandbySync.run()}}, to allow the state to actually 
enter {{STATUS_STARTING}}.

Good catch. It has to be cleaned up as part of this patch.

bq. Rename {{copySegmentFromPrimary}} to {{copySegmentHierarchyFromPrimary}} or 
any other explanatory method name, since this method does a BFS starting with 
the initial segment to fetch from server.

Nice suggestion. That method should have a more appropriate name, and I like 
your proposal.

> Simplify the client side of the cold standby
> 
>
> Key: OAK-4803
> URL: https://issues.apache.org/jira/browse/OAK-4803
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.12
>
> Attachments: OAK-4803-01.patch
>
>
> The implementation of the cold standby client is overly and unnecessarily 
> complicated. It would be way clearer to separate the client code in two major 
> components: a simple client responsible for sending messages to and receive 
> responses from the standby server, and the synchronization algorithm used to 
> read data from the server and to save read data in the local {{FileStore}}.
> Moreover, the client simple client could be further modularised by 
> encapsulating request encoding, response decoding and message handling into 
> their own Netty handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4803) Simplify the client side of the cold standby

2016-09-15 Thread Andrei Dulceanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492953#comment-15492953
 ] 

Andrei Dulceanu edited comment on OAK-4803 at 9/15/16 10:26 AM:


[~frm], here are some of my observations regarding the patch:

# When looking at {{StandbyClient}} and {{StandbyServer}} I find it a little 
bit odd that the former doesn't have a {{FileStore}}, while the latter has. 
IMHO having a {{FileStore}} reference in the client would make sense and would 
also help reading the code, as it is much clearer from the beginning who owns 
what.
# Along the same lines, IMHO it would make sense to reverse the relationship 
between {{StandbyClient}} and {{StandbySync}} since it looks more natural to 
have a client which wants to perform a sync as opposed to have a sync which 
will create a client, assign it a file store and then execute the sync. For 
example, replacing the line
{code:java}
StandbyClient cl = newStandbyClient(secondary);
{code}
with this line
{code:java}
StandbySync cl = newStandbyClient(secondary);
{code}
seems a little bit confusing to me.
# One minor change in {{StandbySync.run()}}, to allow the state to actually 
enter {{STATUS_STARTING}}:
{code:java}
state = STATUS_STARTING;
synchronized (sync) {
if (active) {
return;
}
state = STATUS_RUNNING;
active = true;
}
{code}
# another minor change in {{StandbySyncExecution}}: rename 
{{copySegmentFromPrimary}} to {{copySegmentHierarchyFromPrimary}} or any other 
explanatory method name, since this method does a BFS starting with the initial 
segment to fetch from server.

/cc [~marett]


was (Author: dulceanu):
@frm, here are some of my observations regarding the patch:

# When looking at {{StandbyClient}} and {{StandbyServer}} I find it a little 
bit odd that the former doesn't have a {{FileStore}}, while the latter has. 
IMHO having a {{FileStore}} reference in the client would make sense and would 
also help reading the code, as it is much clearer from the beginning who owns 
what.
# Along the same lines, IMHO it would make sense to reverse the relationship 
between {{StandbyClient}} and {{StandbySync}} since it looks more natural to 
have a client which wants to perform a sync as opposed to have a sync which 
will create a client, assign it a file store and then execute the sync. For 
example, replacing the line
{code:java}
StandbyClient cl = newStandbyClient(secondary);
{code}
with this line
{code:java}
StandbySync cl = newStandbyClient(secondary);
{code}
seems a little bit confusing to me.
# One minor change in {{StandbySync.run()}}, to allow the state to actually 
enter {{STATUS_STARTING}}:
{code:java}
state = STATUS_STARTING;
synchronized (sync) {
if (active) {
return;
}
state = STATUS_RUNNING;
active = true;
}
{code}
# another minor change in {{StandbySyncExecution}}: rename 
{{copySegmentFromPrimary}} to {{copySegmentHierarchyFromPrimary}} or any other 
explanatory method name, since this method does a BFS starting with the initial 
segment to fetch from server.

/cc [~marett]

> Simplify the client side of the cold standby
> 
>
> Key: OAK-4803
> URL: https://issues.apache.org/jira/browse/OAK-4803
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.12
>
> Attachments: OAK-4803-01.patch
>
>
> The implementation of the cold standby client is overly and unnecessarily 
> complicated. It would be way clearer to separate the client code in two major 
> components: a simple client responsible for sending messages to and receive 
> responses from the standby server, and the synchronization algorithm used to 
> read data from the server and to save read data in the local {{FileStore}}.
> Moreover, the client simple client could be further modularised by 
> encapsulating request encoding, response decoding and message handling into 
> their own Netty handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4803) Simplify the client side of the cold standby

2016-09-15 Thread Andrei Dulceanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492953#comment-15492953
 ] 

Andrei Dulceanu commented on OAK-4803:
--

@frm, here are some of my observations regarding the patch:

# When looking at {{StandbyClient}} and {{StandbyServer}} I find it a little 
bit odd that the former doesn't have a {{FileStore}}, while the latter has. 
IMHO having a {{FileStore}} reference in the client would make sense and would 
also help reading the code, as it is much clearer from the beginning who owns 
what.
# Along the same lines, IMHO it would make sense to reverse the relationship 
between {{StandbyClient}} and {{StandbySync}} since it looks more natural to 
have a client which wants to perform a sync as opposed to have a sync which 
will create a client, assign it a file store and then execute the sync. For 
example, replacing the line
{code:java}
StandbyClient cl = newStandbyClient(secondary);
{code}
with this line
{code:java}
StandbySync cl = newStandbyClient(secondary);
{code}
seems a little bit confusing to me.
# One minor change in {{StandbySync.run()}}, to allow the state to actually 
enter {{STATUS_STARTING}}:
{code:java}
state = STATUS_STARTING;
synchronized (sync) {
if (active) {
return;
}
state = STATUS_RUNNING;
active = true;
}
{code}
# another minor change in {{StandbySyncExecution}}: rename 
{{copySegmentFromPrimary}} to {{copySegmentHierarchyFromPrimary}} or any other 
explanatory method name, since this method does a BFS starting with the initial 
segment to fetch from server.

/cc [~marett]

> Simplify the client side of the cold standby
> 
>
> Key: OAK-4803
> URL: https://issues.apache.org/jira/browse/OAK-4803
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.12
>
> Attachments: OAK-4803-01.patch
>
>
> The implementation of the cold standby client is overly and unnecessarily 
> complicated. It would be way clearer to separate the client code in two major 
> components: a simple client responsible for sending messages to and receive 
> responses from the standby server, and the synchronization algorithm used to 
> read data from the server and to save read data in the local {{FileStore}}.
> Moreover, the client simple client could be further modularised by 
> encapsulating request encoding, response decoding and message handling into 
> their own Netty handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-4287) Disable / remove SegmentBufferWriter#checkGCGen

2016-09-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-4287.

Resolution: Fixed

Fixed at http://svn.apache.org/viewvc?rev=1760914&view=rev

Turns out that OAK-4631 removed the check already. I added it back but disabled 
it by default. It can be enabled with {{-Denable-generation-check=true}}.

cc [~volteanu]

> Disable / remove SegmentBufferWriter#checkGCGen
> ---
>
> Key: OAK-4287
> URL: https://issues.apache.org/jira/browse/OAK-4287
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>  Labels: assertion, compaction, gc
> Fix For: Segment Tar 0.0.12
>
>
> {{SegmentBufferWriter#checkGCGen}} is an after the fact check for back 
> references (see OAK-3348), logging a warning if detects any. As this check 
> loads the segment it checks the reference for, it is somewhat expensive. We 
> should either come up with a cheaper way for this check or remove it (at 
> least disable it by default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4287) Disable / remove SegmentBufferWriter#checkGCGen

2016-09-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4287:
---
Fix Version/s: (was: Segment Tar 0.0.24)
   Segment Tar 0.0.12

> Disable / remove SegmentBufferWriter#checkGCGen
> ---
>
> Key: OAK-4287
> URL: https://issues.apache.org/jira/browse/OAK-4287
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>  Labels: assertion, compaction, gc
> Fix For: Segment Tar 0.0.12
>
>
> {{SegmentBufferWriter#checkGCGen}} is an after the fact check for back 
> references (see OAK-3348), logging a warning if detects any. As this check 
> loads the segment it checks the reference for, it is somewhat expensive. We 
> should either come up with a cheaper way for this check or remove it (at 
> least disable it by default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492909#comment-15492909
 ] 

Marcel Reutegger commented on OAK-4811:
---

The test starts to fail on the 1.4 branch with changes merged from OAK-4679 in 
revision svn.apache.org/r1756641

However, I don't think those changes are the root cause for the failing test.

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-09-15 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4412:
-
Labels: docs-impacting  (was: )

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
>  Labels: docs-impacting
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh, 
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger updated OAK-4811:
--
Fix Version/s: 1.4.8

> MongoToMongoFbsTest fails
> -
>
> Key: OAK-4811
> URL: https://issues.apache.org/jira/browse/OAK-4811
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: upgrade
>Affects Versions: 1.4.7
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.4.8
>
>
> The test fails in the current 1.4 branch and also with the 1.4.7 release when 
> a local MongoDB is running.
> {noformat}
> validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
> Time elapsed: 5.628 sec  <<< ERROR!
> java.lang.IllegalStateException: This builder does not exist: default
> {noformat}
> The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4812) Reduce calls to SegmentStore#newSegmentId from the Segment class

2016-09-15 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492820#comment-15492820
 ] 

Francesco Mari commented on OAK-4812:
-

I guess the record ID cache per segment is kind of useless. It should better be 
a segment ID cache, since this seems to be the bulk of the problem. Creating 
record IDs is cheap, but it's not the same for segment IDs.

> Reduce calls to SegmentStore#newSegmentId from the Segment class
> 
>
> Key: OAK-4812
> URL: https://issues.apache.org/jira/browse/OAK-4812
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Alex Parvulescu
>Priority: Minor
>
> OAK-4631 introduced a change in records handling in a segment that will 
> amplify the number of calls to {{SegmentStore#newSegmentId}} by the number of 
> external references [0]. It usually is the case that there are a lot of 
> record references that point to the same segment id, and the existing 
> {{recordIdCache}} would not help much in this case.
> The scenario I'm seeing for offline compaction (might be a bit biased) is a 
> full traversal of segments that increases pressure on the {{SegmentIdTable}} 
> by calling {{newSegmentId}} with a lot of already existing segments.
> I'm creating this issue as an 'Improvement' as I think it is interesting to 
> look into reducing this pressure. This might be by squeezing more out of the 
> {{SegmentIdTable}} bits (I'd like to followup on this with a benchmark) or 
> revisiting the code paths from the {{Segment}} class.
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/Segment.java#L405



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4803) Simplify the client side of the cold standby

2016-09-15 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490303#comment-15490303
 ] 

Francesco Mari edited comment on OAK-4803 at 9/15/16 9:11 AM:
--

The patch implements everything that is described in the issue and adds unit 
tests for the newly introduced components. Some unnecessary components like 
{{StandbyStore}}, {{SegmentLoaderHandler}}, {{StandbyClientHandler}}, and other 
encoders and decoders have been removed as part of this patch.


was (Author: frm):
The patch implements everything that is describing in the issue and adds unit 
tests for the newly introduced components. Some unnecessary components like 
{{StandbyStore}}, {{SegmentLoaderHandler}}, {{StandbyClientHandler}}, and other 
encoders and decoders have been removed as part of this patch.

> Simplify the client side of the cold standby
> 
>
> Key: OAK-4803
> URL: https://issues.apache.org/jira/browse/OAK-4803
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.12
>
> Attachments: OAK-4803-01.patch
>
>
> The implementation of the cold standby client is overly and unnecessarily 
> complicated. It would be way clearer to separate the client code in two major 
> components: a simple client responsible for sending messages to and receive 
> responses from the standby server, and the synchronization algorithm used to 
> read data from the server and to save read data in the local {{FileStore}}.
> Moreover, the client simple client could be further modularised by 
> encapsulating request encoding, response decoding and message handling into 
> their own Netty handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4631) Simplify the format of segments and serialized records

2016-09-15 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492797#comment-15492797
 ] 

Alex Parvulescu commented on OAK-4631:
--

Another side effect I noticed is OAK-4812, we can followup there, so we don't 
overload this issue too much.

> Simplify the format of segments and serialized records
> --
>
> Key: OAK-4631
> URL: https://issues.apache.org/jira/browse/OAK-4631
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.10
>
> Attachments: OAK-4631-01.patch, OAK-4631-02.patch, OAK-4631-03.patch, 
> OAK-4631-04.patch
>
>
> As discussed in [this thread|http://markmail.org/thread/3oxp6ydboyefr4bg], it 
> might be beneficial to simplify both the format of the segments and the way 
> record IDs are serialised. A new strategy needs to be investigated to reach 
> the right compromise between performance, disk space utilization and 
> simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-4812) Reduce calls to SegmentStore#newSegmentId from the Segment class

2016-09-15 Thread Alex Parvulescu (JIRA)
Alex Parvulescu created OAK-4812:


 Summary: Reduce calls to SegmentStore#newSegmentId from the 
Segment class
 Key: OAK-4812
 URL: https://issues.apache.org/jira/browse/OAK-4812
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: segment-tar
Reporter: Alex Parvulescu
Priority: Minor


OAK-4631 introduced a change in records handling in a segment that will amplify 
the number of calls to {{SegmentStore#newSegmentId}} by the number of external 
references [0]. It usually is the case that there are a lot of record 
references that point to the same segment id, and the existing 
{{recordIdCache}} would not help much in this case.
The scenario I'm seeing for offline compaction (might be a bit biased) is a 
full traversal of segments that increases pressure on the {{SegmentIdTable}} by 
calling {{newSegmentId}} with a lot of already existing segments.
I'm creating this issue as an 'Improvement' as I think it is interesting to 
look into reducing this pressure. This might be by squeezing more out of the 
{{SegmentIdTable}} bits (I'd like to followup on this with a benchmark) or 
revisiting the code paths from the {{Segment}} class.







[0] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/Segment.java#L405



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-4811) MongoToMongoFbsTest fails

2016-09-15 Thread Marcel Reutegger (JIRA)
Marcel Reutegger created OAK-4811:
-

 Summary: MongoToMongoFbsTest fails
 Key: OAK-4811
 URL: https://issues.apache.org/jira/browse/OAK-4811
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: upgrade
Affects Versions: 1.4.7
Reporter: Marcel Reutegger
Assignee: Marcel Reutegger


The test fails in the current 1.4 branch and also with the 1.4.7 release when a 
local MongoDB is running.

{noformat}
validateMigration(org.apache.jackrabbit.oak.upgrade.cli.MongoToMongoFbsTest)  
Time elapsed: 5.628 sec  <<< ERROR!
java.lang.IllegalStateException: This builder does not exist: default
{noformat}

The test runs successfully with 1.4.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4810) FileDataStore: support SHA-2

2016-09-15 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492777#comment-15492777
 ] 

Chetan Mehrotra commented on OAK-4810:
--

bq. I think default for writing (if not configured explicitly) could still be 
SHA-1.

The change can be made anytime. It should not affect any other part much. So 
default value can be simply switched to SHA-256

Once a binary is added by any digest method we do not need the method details 
while doing a read as that would be purely on the basis of id. Still it would 
be good to encode the algo in the id which is passed back to NodeStore

> FileDataStore: support SHA-2
> 
>
> Key: OAK-4810
> URL: https://issues.apache.org/jira/browse/OAK-4810
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Thomas Mueller
>
> The FileDataStore currently uses SHA-1, but that algorithm is deprecated. We 
> should support other algorithms as well (mainly SHA-256).
> Migration should be painless (no long downtime). I think default for writing 
> (if not configured explicitly) could still be SHA-1. But when reading, 
> SHA-256 should also be supported (depending on the identifier). That way, the 
> new Oak version for all repositories (in a cluster + shared datastore) can be 
> installed "slowly".
> After all repositories are running with the new Oak version, the 
> configuration for SHA-256 can be enabled. That way, SHA-256 is used for new 
> binaries. Both SHA-1 and SHA-256 are supported for reading.
> One potential downside is deduplication would suffer a bit if a new Blob with 
> same content is added again as digest based match would fail. That can be 
> mitigated by computing 2 types of digest if need arises. The downsides are 
> some additional file operations and CPU, and slower migration to SHA-256.
> Some other open questions: 
> * While we are at it, it might makes senses to additionally support SHA-3 and 
> other algorithms (make it configurable). But the length of the identifier 
> alone might then not be enough information to know what algorithm is used, so 
> maybe add a prefix.
> * The number of subdirectory levels: should we keep it as is, or should we 
> reduce it (for example one level less).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4805) Misconfigured lucene index definition can render the whole system unusable

2016-09-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492722#comment-15492722
 ] 

Vikas Saurabh commented on OAK-4805:


Ack. Would try to see how quickly can I do the complete one. Else, I'd commit 
this patch and open another issue as you suggested.

> Misconfigured lucene index definition can render the whole system unusable
> --
>
> Key: OAK-4805
> URL: https://issues.apache.org/jira/browse/OAK-4805
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>  Labels: candidate_oak_1_0, candidate_oak_1_2, candidate_oak_1_4
> Fix For: 1.6
>
> Attachments: OAK-4805.patch
>
>
> Mis-configured index definition can throw an exception while collecting 
> plans. This causes any query (even unrelated ones) to not work as cost 
> calculation logic would consult a badly constructed index def. Overall a 
> mis-configured index definition can practically grind the whole system to 
> halt as the whole query framework stops working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4631) Simplify the format of segments and serialized records

2016-09-15 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492715#comment-15492715
 ] 

Francesco Mari commented on OAK-4631:
-

It's also interesting to take the average of the values above, because it helps 
putting these information in perspective.

- source, Oak 1.0
{noformat}
135  KB   per data segment
52   byte per map
0.28 byte per list
7byte per template
5byte per node
{noformat}
- upgraded instance, pre OAK-4631
{noformat}
33 KB   per data segment
46 byte per map
12 byte per list
7  byte per template
4  byte per node
{noformat}
- upgraded instance, post OAK-4631
{noformat}
251 KB   per data segment
182 byte per map
58  byte per list
22  byte per template
35  byte per node
{noformat}

Records got bigger, that's undeniable. But as a consequence of this change 
records are more easily parseable, segments are better utilised and 54% less 
segments are needed to store the same data. Less segments means a smaller size 
of book-keeping data structures used throughout the Segment Store, especially 
when it comes to compaction. This change traded space for simplicity, and I 
think there is some value in that.

> Simplify the format of segments and serialized records
> --
>
> Key: OAK-4631
> URL: https://issues.apache.org/jira/browse/OAK-4631
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.10
>
> Attachments: OAK-4631-01.patch, OAK-4631-02.patch, OAK-4631-03.patch, 
> OAK-4631-04.patch
>
>
> As discussed in [this thread|http://markmail.org/thread/3oxp6ydboyefr4bg], it 
> might be beneficial to simplify both the format of the segments and the way 
> record IDs are serialised. A new strategy needs to be investigated to reach 
> the right compromise between performance, disk space utilization and 
> simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2016-09-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492693#comment-15492693
 ] 

Vikas Saurabh commented on OAK-4804:


I couldn't find anything on the web. Single/double quotes didn't work :(. 
[~teofili], would you know?

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4810) FileDataStore: support SHA-2

2016-09-15 Thread Thomas Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-4810:

Component/s: blob

> FileDataStore: support SHA-2
> 
>
> Key: OAK-4810
> URL: https://issues.apache.org/jira/browse/OAK-4810
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Thomas Mueller
>
> The FileDataStore currently uses SHA-1, but that algorithm is deprecated. We 
> should support other algorithms as well (mainly SHA-256).
> Migration should be painless (no long downtime). I think default for writing 
> (if not configured explicitly) could still be SHA-1. But when reading, 
> SHA-256 should also be supported (depending on the identifier). That way, the 
> new Oak version for all repositories (in a cluster + shared datastore) can be 
> installed "slowly".
> After all repositories are running with the new Oak version, the 
> configuration for SHA-256 can be enabled. That way, SHA-256 is used for new 
> binaries. Both SHA-1 and SHA-256 are supported for reading.
> One potential downside is deduplication would suffer a bit if a new Blob with 
> same content is added again as digest based match would fail. That can be 
> mitigated by computing 2 types of digest if need arises. The downsides are 
> some additional file operations and CPU, and slower migration to SHA-256.
> Some other open questions: 
> * While we are at it, it might makes senses to additionally support SHA-3 and 
> other algorithms (make it configurable). But the length of the identifier 
> alone might then not be enough information to know what algorithm is used, so 
> maybe add a prefix.
> * The number of subdirectory levels: should we keep it as is, or should we 
> reduce it (for example one level less).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-4412) Lucene hybrid index

2016-09-15 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-4412.
--
   Resolution: Fixed
Fix Version/s: 1.5.11

Most of the required work is done now. Some pending work left for which tasks 
are opened (see linked issue)

Resolving the issue as completed. Specific issue can be created going forward

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh, 
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-4810) FileDataStore: support SHA-2

2016-09-15 Thread Thomas Mueller (JIRA)
Thomas Mueller created OAK-4810:
---

 Summary: FileDataStore: support SHA-2
 Key: OAK-4810
 URL: https://issues.apache.org/jira/browse/OAK-4810
 Project: Jackrabbit Oak
  Issue Type: New Feature
Reporter: Thomas Mueller


The FileDataStore currently uses SHA-1, but that algorithm is deprecated. We 
should support other algorithms as well (mainly SHA-256).

Migration should be painless (no long downtime). I think default for writing 
(if not configured explicitly) could still be SHA-1. But when reading, SHA-256 
should also be supported (depending on the identifier). That way, the new Oak 
version for all repositories (in a cluster + shared datastore) can be installed 
"slowly".

After all repositories are running with the new Oak version, the configuration 
for SHA-256 can be enabled. That way, SHA-256 is used for new binaries. Both 
SHA-1 and SHA-256 are supported for reading.

One potential downside is deduplication would suffer a bit if a new Blob with 
same content is added again as digest based match would fail. That can be 
mitigated by computing 2 types of digest if need arises. The downsides are some 
additional file operations and CPU, and slower migration to SHA-256.

Some other open questions: 

* While we are at it, it might makes senses to additionally support SHA-3 and 
other algorithms (make it configurable). But the length of the identifier alone 
might then not be enough information to know what algorithm is used, so maybe 
add a prefix.

* The number of subdirectory levels: should we keep it as is, or should we 
reduce it (for example one level less).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2016-09-15 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492646#comment-15492646
 ] 

Marcel Reutegger commented on OAK-4804:
---

Is there a way to define a phrase instead of individual words for the synonym? 
E.g: {{FTW, 'For the win'}}.

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4631) Simplify the format of segments and serialized records

2016-09-15 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492599#comment-15492599
 ] 

Alex Parvulescu commented on OAK-4631:
--

I think an important aspect of the impact of this patch was not fully tested, 
namely _disk space utilization_. I'm running some upgrade tests using the 
latest trunk now and I have some interesting results to share (I'm using 
{{oak-run debug}} to collect data):

 - source 8.7GB, oak 1.0
{noformat}
Total size:
7 GB in  54137 data segments
768 KB in  3 bulk segments
1 GB in maps (20650196 leaf and branch records)
113 MB in lists (3714097 list and bucket records)
3 GB in values (value and block records of 73489693 properties, 
3432/378779/0/1214488 small/medium/long/external blobs, 51059734/3318006/159 
small/medium/long strings)
120 MB in templates (16786491 template records)
1 GB in nodes (221232040 node records)
{noformat}

 - upgraded instance pre OAK-4631 (based on rev 1757389) 11GB
{noformat}
Total size:
10 GB in 321341 data segments
768 KB in  3 bulk segments
2 GB in maps (46451304 leaf and branch records)
619 MB in lists (55468842 list and bucket records)
3 GB in values (value and block records of 70764647 properties, 
3429/378684/0/1214419 small/medium/long/external blobs, 46258634/1862224/159 
small/medium/long strings)
113 MB in templates (16772763 template records)
1 GB in nodes (251592041 node records)
{noformat}

 - upgraded instance post OAK-4631 37GB
{noformat}
Total size:
36 GB in 150205 data segments
768 KB in  3 bulk segments
6 GB in maps (35228936 leaf and branch records)
3 GB in lists (55508867 list and bucket records)
4 GB in values (value and block records of 75853352 properties, 
3742/380719/0/1216770 small/medium/long/external blobs, 76087785/4765208/159 
small/medium/long strings)
712 MB in templates (33716018 template records)
13 GB in nodes (390207210 node records)
{noformat}

The size delta is pretty big, upgraded repo jumps from {{11GB}} to {{37GB}}.


> Simplify the format of segments and serialized records
> --
>
> Key: OAK-4631
> URL: https://issues.apache.org/jira/browse/OAK-4631
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: Segment Tar 0.0.10
>
> Attachments: OAK-4631-01.patch, OAK-4631-02.patch, OAK-4631-03.patch, 
> OAK-4631-04.patch
>
>
> As discussed in [this thread|http://markmail.org/thread/3oxp6ydboyefr4bg], it 
> might be beneficial to simplify both the format of the segments and the way 
> record IDs are serialised. A new strategy needs to be investigated to reach 
> the right compromise between performance, disk space utilization and 
> simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)