[jira] [Commented] (OAK-2498) Root record references provide too little context for parsing a segment

2016-04-26 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258518#comment-15258518
 ] 

Jukka Zitting commented on OAK-2498:


Another possible fix would be to break {{RecordType.LIST}} into a set of list 
types based on what the contained items are. This would avoid the size increase 
and could be done without explicit migration.

> Root record references provide too little context for parsing a segment
> ---
>
> Key: OAK-2498
> URL: https://issues.apache.org/jira/browse/OAK-2498
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-next
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>  Labels: tools
> Fix For: 1.6
>
>
> According to the [documentation | 
> http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html] the root 
> record references in a segment header provide enough context for parsing all 
> records within this segment without any external information. 
> Turns out this is not true: if a root record reference turns e.g. to a list 
> record. The items in that list are record ids of unknown type. So even though 
> those records might live in the same segment, we can't parse them as we don't 
> know their type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-1828) Improved SegmentWriter

2015-11-28 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030631#comment-15030631
 ] 

Jukka Zitting commented on OAK-1828:


Sweet, looks great!

> Improved SegmentWriter
> --
>
> Key: OAK-1828
> URL: https://issues.apache.org/jira/browse/OAK-1828
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>Priority: Minor
>  Labels: technical_debt
> Fix For: 1.3.12
>
> Attachments: record-writers-v0.patch, record-writers-v1.patch, 
> record-writers-v2.patch
>
>
> At about 1kLOC and dozens of methods, the SegmentWriter class currently a bit 
> too complex for one of the key components of the TarMK. It also uses a 
> somewhat non-obvious mix of synchronized and unsynchronized code to 
> coordinate multiple concurrent threads that may be writing content at the 
> same time. The synchronization blocks are also broader than what really would 
> be needed, which in some cases causes unnecessary lock contention in 
> concurrent write loads.
> To improve the readability and maintainability of the code, and to increase 
> performance of concurrent writes, it would be useful to split part of the 
> SegmentWriter functionality to a separate RecordWriter class that would be 
> responsible for writing individual records into a segment. The 
> SegmentWriter.prepare() method would return a new RecordWriter instance, and 
> the higher-level SegmentWriter methods would use the returned instance for 
> all the work that's currently guarded in synchronization blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2070) Segment corruption

2014-09-03 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120136#comment-14120136
 ] 

Jukka Zitting commented on OAK-2070:


bq. The segment id looks wrong, my guess is that the segment header got corrupt 
somehow.

Sounds correct. All the UUIDs created by the SegmentMK are of the form 
{{--4xxx-Txxx-}}, where {{T}} is either {{a}} or {{b}} 
depending on the type of the segment.

bq. I think the wrong byte sequence 00049b053927000498000493 is regular record 
data. 

Yep, looks like that.

It sounds like there is a bug in the {{SegmentWriter.flush()}} logic somewhere 
around 
https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.0.5/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentWriter.java#L182.
 When filling up a segment buffer, the SegmentWriter will write records 
backwards from the end of the buffer and the segment reference lookup table 
forwards from the beginning of the buffer. The buffer gets flushed into  a 
persisted segment whenever these two areas would overlap or when other size 
limits would be reached (see 
[prepare()|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.0.5/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentWriter.java#L267]).
 It might be that we're hitting some edge case that the current code doesn't 
handle correctly.

This is with latest Oak trunk, right?

> Segment corruption
> --
>
> Key: OAK-2070
> URL: https://issues.apache.org/jira/browse/OAK-2070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Thomas Mueller
> Fix For: 1.1
>
> Attachments: SearchByteSeq.java, SegmentDump.java
>
>
> I got a strange case of a corrupt segment. The error message is 
> "java.lang.IllegalStateException: Segment 
> b6e0cdbc-0004-9b05-3927-000498000493 not found". The segment id looks wrong, 
> my guess is that the segment header got corrupt somehow. The checksum of the 
> entry is correct however. I wrote a simple segment header dump tool (I will 
> attach it), which printed: 
> {noformat}
> segmentId: b6fec1b4-f321-4b64-aeab-31c27a939287.4c65e101
> length: 4
> maxRef: 1
> checksum: 4c65e101
> magic: 0aK
> version: 0
> id count: 66
> root count: 24
> blob ref count: 0
> reserved: 0
> segment #0: e44fec74-3bc9-4cee-a864-5f5fb4485cb3
> segment #1: b6480852-9af2-4f21-a542-933cd554e9ed
> segment #2: bec8216a-64e3-4683-a28a-08fbd7ed91f9
> segment #3: b8067f5a-16af-429f-a1b5-c2f1ed1d3c59
> segment #4: 7c2e6167-b292-444a-a318-4f43692f40f2
> segment #5: 3b707f6e-d1e8-4bc0-ace5-67499ac4c369
> segment #6: 0c15bf0a-770d-4057-af6b-baeaeffde613
> segment #7: 97814ad5-3fc3-436e-a7c6-2aed13bcf529
> segment #8: b4babf94-588f-4b32-a8d6-08ba3351abf2
> segment #9: 2eab77d7-ee41-4519-ad14-c31f4f72d2e2
> segment #10: 57949f56-15d1-4f65-a107-d517a222cb11
> segment #11: d25bb0f8-3948-4d33-a35a-50a60a1e9f27
> segment #12: 58066744-e2d6-4f1c-a962-caae337f48db
> segment #13: b195e811-a984-4b3d-a4c8-5fb33572bfce
> segment #14: 384b360d-69ce-4df4-a419-65b80d39b40c
> segment #15: 325b6083-de56-4a4b-a6e5-2eac83e9d4aa
> segment #16: e3839e53-c516-4dce-a39d-b6241d47ef1d
> segment #17: a56f0a75-625b-4684-af43-9936d4d0c4bd
> segment #18: b7428409-8dc0-4c5b-a050-78c838db0ef1
> segment #19: 6ccd5436-37a1-4b46-a5a4-63f939f2de76
> segment #20: 4f28ca6e-a52b-4a06-acb0-19b710366c96
> segment #21: 137ec136-bcfb-405b-a5e3-59b37c54229c
> segment #22: 8a53ffe4-6941-4b9b-a517-9ab59acf9b63
> segment #23: effd3f42-dff3-4082-a0fc-493beabe8d42
> segment #24: 173ef78d-9f41-49fb-ac25-70daba943b37
> segment #25: 9d3ab04d-d540-4256-a28f-c71134d1bfd0
> segment #26: d1966946-45c7-4539-a8f2-9c7979b698c5
> segment #27: b2a72be0-ee4e-4f9d-a957-f84908f05040
> segment #28: 8eead39e-62f7-4b6c-ac12-5caee295b6e5
> segment #29: 19832493-415c-46d2-aa43-0dd95982d312
> segment #30: 462cd84b-3638-4c8c-a273-42b1be5d3df3
> segment #31: 4b4f6a2d-1281-475c-adbe-34457b6de60d
> segment #32: e78b13e2-039d-4f73-a3a8-d25da3b78a93
> segment #33: 03b652ac-623b-4d3d-a5f0-2e7dcd23fd88
> segment #34: 3ff1c4cf-cbd7-45bb-a23a-7e033c64fe87
> segment #35: ed9fb9ad-0a8c-4d37-ab95-c2211cde6fee
> segment #36: 505f766d-509d-4f1b-ad0b-dfd59ef6b6a1
> segment #37: 398f1e80-e818-41b7-a055-ed6255f6e01b
> segment #38: 74cd7875-df17-43af-a2cc-f3256b2213b9
> segment #39: ad884c29-61c9-4bd5-ad57-76ffa32ae3e8
> segment #40: 29cfb8c8-f759-41b2-ac71-d35eac0910ad
> segment #41: af5032e1-b473-47ad-a8e2-a17e770982e8
> segment #42: 37263fa3-3331-4e89-af1f-38972b65e058
> segment #43: 81baf70d-5529-416f-a8bd-9858d62fd2cc
> segment #44: aa3abdfb-4fc7-4c49-a628-fb097f8db872
> segment #45: 81e958b4-0493-4a7b-ab92-5f3cc13119e7
> segment #46: f03d0200-fe8e-4e93-a94a-97b9052f13ab
> segment #47: d9931e67-cc8c-4f45-a1b5-129a0b07e1fb
> segment #48: 

[jira] [Commented] (OAK-2049) ArrayIndexOutOfBoundsException in Segment.getRefId()

2014-08-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112385#comment-14112385
 ] 

Jukka Zitting commented on OAK-2049:


It would be great if we could identify the segment where this happens (with the 
logging Alex added), so it can be inspected in more detail. I recall we had 
some problem in earlier Oak 1.0.x versions where a particular corner case ended 
up producing malformed records. I'll see if I can find the issue reference.

> ArrayIndexOutOfBoundsException in Segment.getRefId()
> 
>
> Key: OAK-2049
> URL: https://issues.apache.org/jira/browse/OAK-2049
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1, 1.0.4
>Reporter: Andrew Khoury
>Assignee: Jukka Zitting
>Priority: Critical
>
> It looks like there is some SegmentMK bug that causes the 
> {{Segment.getRefId()}} to throw an {{ArrayIndexOutOfBoundsException}} in some 
> fairly rare corner cases.
> The data was originally migrated into oak via the crx2oak tool mentioned 
> here: http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade.html
> That tool uses *oak-core-1.0.0* creating an oak instance.
> Similar to OAK-1566 this system was using FileDataStore with SegmentNodeStore.
> In this case the error is seen when running offline compaction using 
> oak-run-1.1-SNAPSHOT.jar (latest).
> {code:none}
> > java -Xmx4096m -jar oak-run-1.1-SNAPSHOT.jar compact 
> > /oak/crx-quickstart/repository/segmentstore
> Apache Jackrabbit Oak 1.1-SNAPSHOT
> Compacting /wcm/cq-author/crx-quickstart/repository/segmentstore
> before [data00055a.tar, data00064a.tar, data00045b.tar, data5a.tar, 
> data00018a.tar, data00022a.tar, data00047a.tar, data00037a.tar, 
> data00049a.tar, data00014a.tar, data00066a.tar, data00020a.tar, 
> data00058a.tar, data00065a.tar, data00069a.tar, data00012a.tar, 
> data9a.tar, data00060a.tar, data00041a.tar, data00016a.tar, 
> data00072a.tar, data00048a.tar, data00061a.tar, data00053a.tar, 
> data00038a.tar, data1a.tar, data00034a.tar, data3a.tar, 
> data00052a.tar, data6a.tar, data00027a.tar, data00031a.tar, 
> data00056a.tar, data00035a.tar, data00063a.tar, data00068a.tar, 
> data8v.tar, data00010a.tar, data00043b.tar, data00021a.tar, 
> data00017a.tar, data00024a.tar, data00054a.tar, data00051a.tar, 
> data00057a.tar, data00059a.tar, data00036a.tar, data00033a.tar, 
> data00019a.tar, data00046a.tar, data00067a.tar, data4a.tar, 
> data00044a.tar, data00013a.tar, data00070a.tar, data00026a.tar, 
> data2a.tar, data00011a.tar, journal.log, data00030a.tar, data00042a.tar, 
> data00025a.tar, data00062a.tar, data00023a.tar, data00071a.tar, 
> data00032b.tar, data00040a.tar, data00015a.tar, data00029a.tar, 
> data00050a.tar, data0a.tar, data7a.tar, data00028a.tar, 
> data00039a.tar]
> -> compacting
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 206
> at 
> org.apache.jackrabbit.oak.plugins.segment.Segment.getRefId(Segment.java:191)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Segment.internalReadRecordId(Segment.java:299)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Segment.readRecordId(Segment.java:295)
> at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplateId(SegmentNodeState.java:69)
> at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplate(SegmentNodeState.java:78)
> at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getProperties(SegmentNodeState.java:150)
> at 
> org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:154)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:124)
> at 
> org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:124)
> at 
> org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:124)
> at 
> org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:124)
> at 
> org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:124)
> at 
> org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> at 
> org.apache.jackrabbit.oak.plugins.segment.Comp

[jira] [Commented] (OAK-2019) Compact only if needed

2014-08-21 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105505#comment-14105505
 ] 

Jukka Zitting commented on OAK-2019:


bq. cleanup will now run even if the compaction check is negative

+1 Good point!

> Compact only if needed
> --
>
> Key: OAK-2019
> URL: https://issues.apache.org/jira/browse/OAK-2019
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Fix For: 1.1, 1.0.5
>
> Attachments: 0001-OAK-2019-Compact-only-if-needed.patch, 
> compact-if-needed.patch
>
>
> Add a verification before the TarMK compaction runs to see if there's at 
> least one tar file that needs cleanup. Otherwise skip compaction entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-2019) Compact only if needed

2014-08-20 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-2019:
---

Attachment: 0001-OAK-2019-Compact-only-if-needed.patch

The attached patch uses a rough estimate based on referenceable bulk segments 
to estimate whether running online compaction makes sense. The patch sets the 
threshold at 10% of estimated space savings.

> Compact only if needed
> --
>
> Key: OAK-2019
> URL: https://issues.apache.org/jira/browse/OAK-2019
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Fix For: 1.1, 1.0.5
>
> Attachments: 0001-OAK-2019-Compact-only-if-needed.patch, 
> compact-if-needed.patch
>
>
> Add a verification before the TarMK compaction runs to see if there's at 
> least one tar file that needs cleanup. Otherwise skip compaction entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-2005) Use separate Lucene index for performing property related queries

2014-08-18 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100767#comment-14100767
 ] 

Jukka Zitting commented on OAK-2005:


Is there a particular reason not to use a single Lucene index for both full 
text and property queries? There are often queries that combine both types of 
constraints, so having a single index that can evaluate both types would be 
beneficial.

> Use separate Lucene index for performing property related queries 
> --
>
> Key: OAK-2005
> URL: https://issues.apache.org/jira/browse/OAK-2005
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: oak-lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
> Attachments: OAK-2005-1.patch
>
>
> Oak Lucene has some support for working with multiple Lucene directories. 
> Currently Oak uses a single Lucene directory to store the full text index. It 
> would be worthwhile to investigate if we can use a separate Lucene index to 
> store specific properties only and use it to perform property related queries.
> * A separate Lucene directory would be used to store explicitly configured 
> list of properties
> * The properties would be stored with there type
> ** JCR Long - long
> ** JCR Double - double
> ** JCR Date - long - The data value can be stored in long but with lesser 
> precision say upto second or even minutes
> * The values would stored "as is" i.e. without tokenization
> Possible benefits of such an index would be (ofcourse need be validated!)
> * Compact storage - Less memory would be used to store the index
> * Native support for Order By
> * Improved performance for like query - Specifically 'foo%', '%foo'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-2019) Compact only if needed

2014-08-11 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092543#comment-14092543
 ] 

Jukka Zitting commented on OAK-2019:


The garbage identified by this patch can be reclaimed by a normal cleanup even 
without compaction, so as such the check isn't too useful (you could just run 
cleanup instead).

> Compact only if needed
> --
>
> Key: OAK-2019
> URL: https://issues.apache.org/jira/browse/OAK-2019
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Attachments: compact-if-needed.patch
>
>
> Add a verification before the TarMK compaction runs to see if there's at 
> least one tar file that needs cleanup. Otherwise skip compaction entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-2003) Avoid in javadoc

2014-07-30 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-2003.


   Resolution: Fixed
Fix Version/s: 1.1

I replaced the  tags with  in revision 1614582.

Additionally, in revision 1614583 I disabled the remaining Java 8 doclint 
checks to avoid the remaining problems (like missing \@return tags!) outlined 
in http://blog.joda.org/2014/02/turning-off-doclint-in-jdk-8-javadoc.html.

> Avoid  in javadoc
> -
>
> Key: OAK-2003
> URL: https://issues.apache.org/jira/browse/OAK-2003
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: doc
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Minor
> Fix For: 1.1
>
>
> As discussed in http://markmail.org/message/qpg4ufqqvb3gwwwx, using 
> XHTML-style  tags in javadoc is both wrong in principle and troublesome 
> in practice (since it causes parse errors with Java 8). We should convert 
> such tags to just .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-2003) Avoid in javadoc

2014-07-30 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-2003:
--

 Summary: Avoid  in javadoc
 Key: OAK-2003
 URL: https://issues.apache.org/jira/browse/OAK-2003
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: doc
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Minor


As discussed in http://markmail.org/message/qpg4ufqqvb3gwwwx, using XHTML-style 
 tags in javadoc is both wrong in principle and troublesome in practice 
(since it causes parse errors with Java 8). We should convert such tags to just 
.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-2002) TarMK: FileStore constructor loads all entries in the journal.log

2014-07-29 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-2002.


   Resolution: Fixed
Fix Version/s: 1.0.4
   1.1

Fixed in revisions 1614384 and 1614385, and merged to the 1.0 branch in 
revision 1614396.

> TarMK: FileStore constructor loads all entries in the journal.log
> -
>
> Key: OAK-2002
> URL: https://issues.apache.org/jira/browse/OAK-2002
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.3
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1, 1.0.4
>
>
> The FileStore constructor currently creates RecordId and SegmentId instances 
> for each line in the journal.log, which together with the SegmentId tracking 
> code in SegmentTracker ends up taking quite a while. Additionally loading all 
> those SegmentIds will make it harder for the garbage collector to release 
> old, unused segments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-2002) TarMK: FileStore constructor loads all entries in the journal.log

2014-07-29 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-2002:
--

 Summary: TarMK: FileStore constructor loads all entries in the 
journal.log
 Key: OAK-2002
 URL: https://issues.apache.org/jira/browse/OAK-2002
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segmentmk
Affects Versions: 1.0.3
Reporter: Jukka Zitting
Assignee: Jukka Zitting


The FileStore constructor currently creates RecordId and SegmentId instances 
for each line in the journal.log, which together with the SegmentId tracking 
code in SegmentTracker ends up taking quite a while. Additionally loading all 
those SegmentIds will make it harder for the garbage collector to release old, 
unused segments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-21 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068619#comment-14068619
 ] 

Jukka Zitting commented on OAK-1965:


Agreed, OAK-1617 would solve this issue in a more generic manner (the current 
solution only works with direct equality and IN constraints). No objections to 
replacing this solution with OAK-1617 once it's available.

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
> Attachments: oak-core-1.0.3-OAK-1965-SNAPSHOT.jar
>
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-18 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066308#comment-14066308
 ] 

Jukka Zitting commented on OAK-1965:


Yes, this is specific to OR constraints. The OAK-1871 issue is broader, it 
requires changes in the index structure whereas this improvement was possible 
by better utilizing the information that's already stored by the existing 
property indexes.

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
> Attachments: oak-core-1.0.3-OAK-1965-SNAPSHOT.jar
>
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1973) IndexUpdate traverses the data nodes under index nodes

2014-07-17 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065082#comment-14065082
 ] 

Jukka Zitting commented on OAK-1973:


+1 looks good

> IndexUpdate traverses the data nodes under index nodes
> --
>
> Key: OAK-1973
> URL: https://issues.apache.org/jira/browse/OAK-1973
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Chetan Mehrotra
> Attachments: OAK-1973.patch
>
>
> AsyncIndexUpdate uses IndexUpdate class as Editor and passes it to 
> EditorDiff. IndexUpdate insternally wraps all the IndexEditors with 
> VisibileEditor such that they do not traverse the invisible nodes (like 
> :data). However IndexUpdate itself is not wrapped with VisibileEditor due to 
> which it has to traverse all the Index data also for diff.
> Ideally IndexUpdate itself should be wrapped with VisibileEditor



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-17 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1965:
---

Attachment: oak-core-1.0.3-OAK-1965-SNAPSHOT.jar

I also attached an oak-core 1.0.3-SNAPSHOT jar with a backported version of 
this improvement (see 
https://github.com/jukka/jackrabbit-oak/compare/apache:1.0...OAK-1965 for 
details). If testing with that version works well, we should probably merge 
this change to 1.0.3.

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
> Attachments: oak-core-1.0.3-OAK-1965-SNAPSHOT.jar
>
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-17 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064988#comment-14064988
 ] 

Jukka Zitting commented on OAK-1965:


FTR, here's how to use this improvement:

* Assume you have queries with OR constraints containing equality or IN tests 
for more than a single property from one selector, like the {{[foo] = 'X' OR 
[bar] = 'Y'}} constraint from above. Previously it was not possible to use an 
index to speed up the evaluation of such constraints.
* Configure a property index whose {{propertyNames}} list includes all the 
properties you expect to use in such queries. For example, a "fooBarIndex" with 
{{propertyNames = ['foo', 'bar']}} for the query above.
* With this improvement the property index will automatically detect such an OR 
constraint and treat it as an "extended IN" constraint like {{([foo] OR [bar]) 
IN ('X', 'Y')}} that can be evaluated against the configured index since it 
covers the values of both properties.
* To verify that this functionality is indeed in place, you can use the 
{{EXPLAIN}} feature or debug logging of the query engine to check that the 
query plan mentions "property fooBarIndex IN (foo, bar)" as the index and 
constraints being used.

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-17 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1965.


   Resolution: Fixed
Fix Version/s: 1.1

Implemented in a sequence of commits leading up to revision 1611359.

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1973) IndexUpdate traverses the data nodes under index nodes

2014-07-16 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063677#comment-14063677
 ] 

Jukka Zitting commented on OAK-1973:


We should probably add the {{VisibleEditor}} wrapping also to the other places 
where {{IndexUpdate}} is being used.

> IndexUpdate traverses the data nodes under index nodes
> --
>
> Key: OAK-1973
> URL: https://issues.apache.org/jira/browse/OAK-1973
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Chetan Mehrotra
>
> AsyncIndexUpdate uses IndexUpdate class as Editor and passes it to 
> EditorDiff. IndexUpdate insternally wraps all the IndexEditors with 
> VisibileEditor such that they do not traverse the invisible nodes (like 
> :data). However IndexUpdate itself is not wrapped with VisibileEditor due to 
> which it has to traverse all the Index data also for diff.
> Ideally IndexUpdate itself should be wrapped with VisibileEditor



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1973) IndexUpdate traverses the data nodes under index nodes

2014-07-16 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063642#comment-14063642
 ] 

Jukka Zitting commented on OAK-1973:


Yes, good point! When applying the change, we can also drop the VisibleEditor 
wrapping of IndexEditors, as the higher-level wrapping already takes care of 
excluding hidden content from the diff.

> IndexUpdate traverses the data nodes under index nodes
> --
>
> Key: OAK-1973
> URL: https://issues.apache.org/jira/browse/OAK-1973
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Chetan Mehrotra
>
> AsyncIndexUpdate uses IndexUpdate class as Editor and passes it to 
> EditorDiff. IndexUpdate insternally wraps all the IndexEditors with 
> VisibileEditor such that they do not traverse the invisible nodes (like 
> :data). However IndexUpdate itself is not wrapped with VisibileEditor due to 
> which it has to traverse all the Index data also for diff.
> Ideally IndexUpdate itself should be wrapped with VisibileEditor



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1896) Move JR2 specific logic from oak-run to separate module

2014-07-15 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062141#comment-14062141
 ] 

Jukka Zitting commented on OAK-1896:


My main concern with the patch is the one I raised in 
http://markmail.org/message/jegs3suzvwaqznml. The more artifacts we have, the 
more complex using and documenting them becomes.

But since I don't have any better ideas here, +1 to proceeding with the patch.

> Move JR2 specific logic from oak-run to separate module
> ---
>
> Key: OAK-1896
> URL: https://issues.apache.org/jira/browse/OAK-1896
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Affects Versions: 1.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
> Attachments: OAK-1896.patch
>
>
> Currently oak-run module packages quite a few tools (benchmark, server, 
> console, upgrade etc). Some of these like benchmark and upgrade require JR2 
> binaries embeded within oak-run.
> This causes conflict with oak-lucene as it requires Lucene 4.x while JR2 
> requires Lucene 3.x. It would be simpler to move JR2 specific logic to a 
> different module (oak-jr2?) and let oak-run use all modules of oak
> [1] http://markmail.org/thread/ekyvuxxvdnbtsjkt



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1896) Move JR2 specific logic from oak-run to separate module

2014-07-15 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062101#comment-14062101
 ] 

Jukka Zitting commented on OAK-1896:


The Lucene part is indeed troublesome, and the upgrade code currently does need 
access to Jackrabbit internals. See 
http://markmail.org/message/lgvbvw742zryhlmp and 
http://markmail.org/message/mkzswwwde4zr454v for more background on those 
issues.

I don't have a very good suggestion on how to best address this. For the 
specific requirement of dumping the Lucene index files, we could actually skip 
the Lucene dependency and instead directly copy the content from the index 
subtree.

> Move JR2 specific logic from oak-run to separate module
> ---
>
> Key: OAK-1896
> URL: https://issues.apache.org/jira/browse/OAK-1896
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Affects Versions: 1.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
> Attachments: OAK-1896.patch
>
>
> Currently oak-run module packages quite a few tools (benchmark, server, 
> console, upgrade etc). Some of these like benchmark and upgrade require JR2 
> binaries embeded within oak-run.
> This causes conflict with oak-lucene as it requires Lucene 4.x while JR2 
> requires Lucene 3.x. It would be simpler to move JR2 specific logic to a 
> different module (oak-jr2?) and let oak-run use all modules of oak
> [1] http://markmail.org/thread/ekyvuxxvdnbtsjkt



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1968) Wrong time unit for async index lease time

2014-07-14 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061145#comment-14061145
 ] 

Jukka Zitting commented on OAK-1968:


+1 Good catch!

> Wrong time unit for async index lease time
> --
>
> Key: OAK-1968
> URL: https://issues.apache.org/jira/browse/OAK-1968
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.1, 1.0.2
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.0.3
>
> Attachments: OAK-1968.patch
>
>
> The JavaDoc for {{AsyncIndexUpdate.ASYNC_TIMEOUT}} says the time unit for the 
> timeout is minutes. Within the code the timeout value is rather interpreted 
> as milliseconds, which looks wrong. E.g. the new lease value is calculated as 
> {{lease = now + 2 * ASYNC_TIMEOUT}}.
> It is probably best to change the constant to reflect the timeout in 
> milliseconds.
> It looks like this issue was introduced with changes for OAK-1877 and affects 
> the 1.0.1 and 1.0.2 releases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1891) Regression: non-space whitespace not allowed in item names

2014-07-14 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061042#comment-14061042
 ] 

Jukka Zitting commented on OAK-1891:


bq. this used to work in jackrabbit

At least with Jackrabbit 2.6+ a call like {{node.addNode("con\tent")}} fails 
with a RepositoryException due to the invalid name. See also JCR-3582.

> Regression: non-space whitespace not allowed in item names
> --
>
> Key: OAK-1891
> URL: https://issues.apache.org/jira/browse/OAK-1891
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0
>Reporter: Tobias Bocanegra
>
> item names with a non-space whitespace are not allowed anymore in oak.
> https://github.com/apache/jackrabbit-oak/blob/1.0/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/name/Namespaces.java#L252
> this used to work in jackrabbit and can be a problem for customers trying to 
> migrate old content. also upgrading existing content might result in 
> unexpected results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-14 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting reassigned OAK-1965:
--

Assignee: Jukka Zitting

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-10 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1965:
--

 Summary: Support for constraints like: foo = 'X' OR bar = 'Y'
 Key: OAK-1965
 URL: https://issues.apache.org/jira/browse/OAK-1965
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, query
Reporter: Jukka Zitting


Consider the following query statement:

{noformat}
SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
{noformat}

Such a query could be fairly efficiently executed against a property index that 
indexes the values of both "foo" and "bar" properties. However, the query 
engine doesn't pass such OR constraints down to the index implementations, so 
we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1946) Restore: "Attempt to read external blob" error

2014-07-08 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055201#comment-14055201
 ] 

Jukka Zitting commented on OAK-1946:


The way the patch uses subclassing to get access to the NodeBuilder feels a bit 
awkward, but I don't have a better idea for now, so +1.

> Restore: "Attempt to read external blob" error
> --
>
> Key: OAK-1946
> URL: https://issues.apache.org/jira/browse/OAK-1946
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Attachments: FileStoreRestore.java.patch, OAK-1946.patch
>
>
> Same as OAK-1921 but for the restore parts



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1934) Optimize MutableTree.orderBefore for the common case

2014-07-08 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1934:
---

Fix Version/s: 1.0.2

Merged to the 1.0 branch in revision 1608857.

> Optimize MutableTree.orderBefore for the common case
> 
>
> Key: OAK-1934
> URL: https://issues.apache.org/jira/browse/OAK-1934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0.2, 1.1
>
>
> After OAK-850 and OAK-1584 we settled on an {{orderBefore()}} implementation 
> that always recreates the child order list based on the names of the child 
> nodes that are present in a parent. This is a somewhat expensive operation 
> with lots of child nodes as seen in JCR-3793.
> We could optimize the implementation further for the common case where the 
> child order list is in sync with the actual list of child nodes. For example 
> we could skip recreating the child order list when the name we're looking for 
> is already included in that list. Over time this approach should still detect 
> cases where the list becomes out of sync, and automatically repair the list 
> when that happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1932) TarMK compaction can create mixed segments

2014-07-07 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1932.


   Resolution: Fixed
Fix Version/s: 1.1
   1.0.2

Fixed in a sequence of commits to trunk, and merged to the 1.0 branch in 
revision 1608564.

> TarMK compaction can create mixed segments
> --
>
> Key: OAK-1932
> URL: https://issues.apache.org/jira/browse/OAK-1932
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0.2, 1.1
>
> Attachments: Compactor.java.patch, CompactorTest.java.patch
>
>
> As described in http://markmail.org/message/ujkqdlthudaortxf, commits that 
> occur while the compaction operation is running can make the compacted 
> segments contain references to older data segments, which prevents old data 
> from being reclaimed during cleanup.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1945) Unclear NodeStore.merge() contract

2014-07-07 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1945.


   Resolution: Fixed
Fix Version/s: 1.1
 Assignee: Jukka Zitting

I updated the {{merge()}} javadocs in revision 1608528. The updated javadoc 
states:

{noformat}
 * Merges the changes between the
 * {@link NodeBuilder#getBaseState() base} and
 * {@link NodeBuilder#getNodeState() head} states
 * of the given builder to this store.
{noformat}


> Unclear NodeStore.merge() contract
> --
>
> Key: OAK-1945
> URL: https://issues.apache.org/jira/browse/OAK-1945
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Alex Parvulescu
>Assignee: Jukka Zitting
> Fix For: 1.1
>
> Attachments: MergeTest.java.patch
>
>
> The SegmentMK doesn't respect the #merge contract when there is a builder 
> that has no local changes passed in.
> This popped-up while I was reworking the restore parts and with the new 
> compaction code, the only option is to pass in a NodeState#Builder builder 
> which is considered by the #merge method as a noop and therefore ignored.
> (update: See comments below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1945) Unclear NodeStore.merge() contract

2014-07-07 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1945:
---

Description: 
The SegmentMK doesn't respect the #merge contract when there is a builder that 
has no local changes passed in.
This popped-up while I was reworking the restore parts and with the new 
compaction code, the only option is to pass in a NodeState#Builder builder 
which is considered by the #merge method as a noop and therefore ignored.

(update: See comments below)

  was:
The SegmentMK doesn't respect the #merge contract when there is a builder that 
has no local changes passed in.
This popped-up while I was reworking the restore parts and with the new 
compaction code, the only option is to pass in a NodeState#Builder builder 
which is considered by the #merge method as a noop and therefore ignored.

Summary: Unclear NodeStore.merge() contract  (was: SegmentMK #merge 
inconsistency)

> Unclear NodeStore.merge() contract
> --
>
> Key: OAK-1945
> URL: https://issues.apache.org/jira/browse/OAK-1945
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Alex Parvulescu
> Attachments: MergeTest.java.patch
>
>
> The SegmentMK doesn't respect the #merge contract when there is a builder 
> that has no local changes passed in.
> This popped-up while I was reworking the restore parts and with the new 
> compaction code, the only option is to pass in a NodeState#Builder builder 
> which is considered by the #merge method as a noop and therefore ignored.
> (update: See comments below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1945) SegmentMK #merge inconsistency

2014-07-07 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053732#comment-14053732
 ] 

Jukka Zitting commented on OAK-1945:


Reclassified as an improvement issue based on the above comment.

> SegmentMK #merge inconsistency
> --
>
> Key: OAK-1945
> URL: https://issues.apache.org/jira/browse/OAK-1945
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Alex Parvulescu
> Attachments: MergeTest.java.patch
>
>
> The SegmentMK doesn't respect the #merge contract when there is a builder 
> that has no local changes passed in.
> This popped-up while I was reworking the restore parts and with the new 
> compaction code, the only option is to pass in a NodeState#Builder builder 
> which is considered by the #merge method as a noop and therefore ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1945) SegmentMK #merge inconsistency

2014-07-07 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1945:
---

Component/s: (was: segmentmk)
 core

> SegmentMK #merge inconsistency
> --
>
> Key: OAK-1945
> URL: https://issues.apache.org/jira/browse/OAK-1945
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Alex Parvulescu
> Attachments: MergeTest.java.patch
>
>
> The SegmentMK doesn't respect the #merge contract when there is a builder 
> that has no local changes passed in.
> This popped-up while I was reworking the restore parts and with the new 
> compaction code, the only option is to pass in a NodeState#Builder builder 
> which is considered by the #merge method as a noop and therefore ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1945) SegmentMK #merge inconsistency

2014-07-07 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1945:
---

Issue Type: Improvement  (was: Bug)

> SegmentMK #merge inconsistency
> --
>
> Key: OAK-1945
> URL: https://issues.apache.org/jira/browse/OAK-1945
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Alex Parvulescu
> Attachments: MergeTest.java.patch
>
>
> The SegmentMK doesn't respect the #merge contract when there is a builder 
> that has no local changes passed in.
> This popped-up while I was reworking the restore parts and with the new 
> compaction code, the only option is to pass in a NodeState#Builder builder 
> which is considered by the #merge method as a noop and therefore ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1945) SegmentMK #merge inconsistency

2014-07-07 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053731#comment-14053731
 ] 

Jukka Zitting commented on OAK-1945:


The {{merge()}} applies the transient changes *in the given builder* to the 
repository, so passing a builder with no changes (like in the test case) will 
result in a no-op commit. See OAK-659 for background.

I guess the best solution here is to just clarify the {{merge()}} javadocs to 
make the contract clearer.

> SegmentMK #merge inconsistency
> --
>
> Key: OAK-1945
> URL: https://issues.apache.org/jira/browse/OAK-1945
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Alex Parvulescu
> Attachments: MergeTest.java.patch
>
>
> The SegmentMK doesn't respect the #merge contract when there is a builder 
> that has no local changes passed in.
> This popped-up while I was reworking the restore parts and with the new 
> compaction code, the only option is to pass in a NodeState#Builder builder 
> which is considered by the #merge method as a noop and therefore ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1932) TarMK compaction can create mixed segments

2014-07-07 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053694#comment-14053694
 ] 

Jukka Zitting commented on OAK-1932:


bq. I beg to differ  it could happen on normal compact operations too.

If you unroll the loop, the sequence of calls is something like this:

{code}
Compactor compactor = new Compactor(...);
compactor.compact(EMPTY_NODE, B);
compactor.compact(A, B);
compactor.compact(B, C);
{code}

Since we use the same {{Compactor}} instance, the end result is the same 
regardless of whether a single {{NodeBuilder}} is used for all {{compact()}} 
calls (like it used to be) or a new NodeBuilder is instantiated for each call 
(which is what the patch did). The attached test case uses a different call 
pattern, for which the class originally wasn't designed for.

Anyway, as mentioned above, the change does make the {{Compactor}} easier to 
reuse for other cases, so it's clearly an improvement.

> TarMK compaction can create mixed segments
> --
>
> Key: OAK-1932
> URL: https://issues.apache.org/jira/browse/OAK-1932
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Attachments: Compactor.java.patch, CompactorTest.java.patch
>
>
> As described in http://markmail.org/message/ujkqdlthudaortxf, commits that 
> occur while the compaction operation is running can make the compacted 
> segments contain references to older data segments, which prevents old data 
> from being reclaimed during cleanup.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1932) TarMK compaction can create mixed segments

2014-07-03 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051527#comment-14051527
 ] 

Jukka Zitting commented on OAK-1932:


So far I only expected the compactor to always start the compaction process 
from the empty state, so the problem above shouldn't be an issue. But +1 on the 
patch, it makes it easier to adapt the code to other use cases like backup or 
to something like incremental compaction if we want to add a feature like that 
later on.

> TarMK compaction can create mixed segments
> --
>
> Key: OAK-1932
> URL: https://issues.apache.org/jira/browse/OAK-1932
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Attachments: Compactor.java.patch, CompactorTest.java.patch
>
>
> As described in http://markmail.org/message/ujkqdlthudaortxf, commits that 
> occur while the compaction operation is running can make the compacted 
> segments contain references to older data segments, which prevents old data 
> from being reclaimed during cleanup.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1927) TarMK compaction delays journal updates

2014-07-03 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1927.


   Resolution: Fixed
Fix Version/s: 1.1
   1.0.2
 Assignee: Jukka Zitting

Fixed in revisions 1607185 and 1607196.

I'm tagging this also for 1.0.2, but will only merge the changes once the 
related OAK-1932 changes are fully tested and ready for merging.

> TarMK compaction delays journal updates
> ---
>
> Key: OAK-1927
> URL: https://issues.apache.org/jira/browse/OAK-1927
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
> Fix For: 1.0.2, 1.1
>
>
> The compaction operation gets currently invoked from the TarMK flush thread, 
> which is a bit troublesome as the operation can take some while during which 
> the flush thread won't be able to persist the latest updates to the journal 
> file.
> To avoid this problem, the compaction operation should be performed in a 
> separate background thread.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1921) Backup: "Attempt to read external blob" error

2014-07-02 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051055#comment-14051055
 ] 

Jukka Zitting commented on OAK-1921:


In OAK-1932 I adjusted the compaction code so that it can use different 
SegmentWriter than the normal one. That should make it easier to directly reuse 
the compaction code for this purpose instead of having to maintain a separate 
copy of mostly the same functionality.

> Backup: "Attempt to read external blob" error
> -
>
> Key: OAK-1921
> URL: https://issues.apache.org/jira/browse/OAK-1921
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Affects Versions: 1.0, 1.0.1, 1.0.2
>Reporter: Thomas Mueller
>Assignee: Alex Parvulescu
> Attachments: OAK-1921-generic-backup.patch, OAK-1921.patch
>
>
> I tried to backup a segmentstore (with an external BlobStore) using
> {noformat}
> java -mx8g -jar oak-run-1.0.2-SNAPSHOT.jar backup segmentstore s2
> {noformat}
> and got:
> {noformat}
> Attempt to read external blob with blobId
> [c184d2a3f1dbc709004a45ae6c5df7624c2ae653#32768] without specifying BlobStore
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentBlob.getReference(SegmentBlob.java:118)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeBlob(SegmentWriter.java:706)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeProperty(SegmentWriter.java:808)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeProperty(SegmentWriter.java:796)
> {noformat}
> There are two options:
> 1) Adjust the backup code to work like compaction does, i.e. leave
> external blobs as-is and perhaps output a message that informs the
> user about the need to use a different mechanism to back up the
> BlobStore contents
> 2) Add command line options for configuring the BlobStore to be used
> for accessing external blobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1934) Optimize MutableTree.orderBefore for the common case

2014-07-01 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1934.


   Resolution: Fixed
Fix Version/s: 1.1

I implemented the optimization in revision 1607152.

With that change the ReorderTest output from JCR-3793 goes from:

{noformat}
created in 2930 ms
reodered in 99306 ms
{noformat}

to:

{noformat}
created in 2593 ms
reodered in 1965 ms
{noformat}


> Optimize MutableTree.orderBefore for the common case
> 
>
> Key: OAK-1934
> URL: https://issues.apache.org/jira/browse/OAK-1934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> After OAK-850 and OAK-1584 we settled on an {{orderBefore()}} implementation 
> that always recreates the child order list based on the names of the child 
> nodes that are present in a parent. This is a somewhat expensive operation 
> with lots of child nodes as seen in JCR-3793.
> We could optimize the implementation further for the common case where the 
> child order list is in sync with the actual list of child nodes. For example 
> we could skip recreating the child order list when the name we're looking for 
> is already included in that list. Over time this approach should still detect 
> cases where the list becomes out of sync, and automatically repair the list 
> when that happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1934) Optimize MutableTree.orderBefore for the common case

2014-07-01 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1934:
--

 Summary: Optimize MutableTree.orderBefore for the common case
 Key: OAK-1934
 URL: https://issues.apache.org/jira/browse/OAK-1934
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core
Reporter: Jukka Zitting
Assignee: Jukka Zitting


After OAK-850 and OAK-1584 we settled on an {{orderBefore()}} implementation 
that always recreates the child order list based on the names of the child 
nodes that are present in a parent. This is a somewhat expensive operation with 
lots of child nodes as seen in JCR-3793.

We could optimize the implementation further for the common case where the 
child order list is in sync with the actual list of child nodes. For example we 
could skip recreating the child order list when the name we're looking for is 
already included in that list. Over time this approach should still detect 
cases where the list becomes out of sync, and automatically repair the list 
when that happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1858) Segment Explorer

2014-07-01 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048845#comment-14048845
 ] 

Jukka Zitting commented on OAK-1858:


Agreed. Nice work!

> Segment Explorer
> 
>
> Key: OAK-1858
> URL: https://issues.apache.org/jira/browse/OAK-1858
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>  Labels: tools
> Fix For: 1.1
>
> Attachments: segmentexplorer.patch
>
>
> I'm thinking about working on a desktop tool that would allow browsing the 
> repository and would provide tarmk specific information: segment ids, tar 
> files, sizes, checkpoints and so on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1925) Use streamed io instead of RandomAccessFile in TarWriter

2014-07-01 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048816#comment-14048816
 ] 

Jukka Zitting commented on OAK-1925:


-0 Without more significant performance impact I'd be reluctant to add the 
extra complexity and memory overhead. New data segments are already cached in 
memory by the {{SegmentWriter}}, and in most cases access to new large binaries 
isn't too performance-critical. Older content is accessed through the more 
performance-optimized {{TarReader}} class.

> Use streamed io instead of RandomAccessFile in TarWriter
> 
>
> Key: OAK-1925
> URL: https://issues.apache.org/jira/browse/OAK-1925
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.1
>
> Attachments: OAK-1925.patch
>
>
> TarWriter currently uses RandomAccessFile to 
> * Write the tar entries
> * Read written entries
> The write however are currently sequential. It might be better to use 
> streamed buffered io for the write and maintain an in memory cache of written 
> tar entries to serve the reads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1932) TarMK compaction can create mixed segments

2014-06-30 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1932:
--

 Summary: TarMK compaction can create mixed segments
 Key: OAK-1932
 URL: https://issues.apache.org/jira/browse/OAK-1932
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segmentmk
Affects Versions: 1.0.1
Reporter: Jukka Zitting
Assignee: Jukka Zitting


As described in http://markmail.org/message/ujkqdlthudaortxf, commits that 
occur while the compaction operation is running can make the compacted segments 
contain references to older data segments, which prevents old data from being 
reclaimed during cleanup.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1927) TarMK compaction delays journal updates

2014-06-30 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1927:
--

 Summary: TarMK compaction delays journal updates
 Key: OAK-1927
 URL: https://issues.apache.org/jira/browse/OAK-1927
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segmentmk
Affects Versions: 1.0.1
Reporter: Jukka Zitting
Priority: Critical


The compaction operation gets currently invoked from the TarMK flush thread, 
which is a bit troublesome as the operation can take some while during which 
the flush thread won't be able to persist the latest updates to the journal 
file.

To avoid this problem, the compaction operation should be performed in a 
separate background thread.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1921) Backup: "Attempt to read external blob" error

2014-06-30 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047499#comment-14047499
 ] 

Jukka Zitting commented on OAK-1921:


+1 Patch looks good.

> Backup: "Attempt to read external blob" error
> -
>
> Key: OAK-1921
> URL: https://issues.apache.org/jira/browse/OAK-1921
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Affects Versions: 1.0, 1.0.1, 1.0.2
>Reporter: Thomas Mueller
> Attachments: OAK-1921-generic-backup.patch
>
>
> I tried to backup a segmentstore (with an external BlobStore) using
> {noformat}
> java -mx8g -jar oak-run-1.0.2-SNAPSHOT.jar backup segmentstore s2
> {noformat}
> and got:
> {noformat}
> Attempt to read external blob with blobId
> [c184d2a3f1dbc709004a45ae6c5df7624c2ae653#32768] without specifying BlobStore
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentBlob.getReference(SegmentBlob.java:118)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeBlob(SegmentWriter.java:706)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeProperty(SegmentWriter.java:808)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeProperty(SegmentWriter.java:796)
> {noformat}
> There are two options:
> 1) Adjust the backup code to work like compaction does, i.e. leave
> external blobs as-is and perhaps output a message that informs the
> user about the need to use a different mechanism to back up the
> BlobStore contents
> 2) Add command line options for configuring the BlobStore to be used
> for accessing external blobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-656) Large number of child nodes not working well with orderable node types

2014-06-26 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044676#comment-14044676
 ] 

Jukka Zitting commented on OAK-656:
---

bq. if Oak cannot efficiently deal with 20+ children it seems like its giving 
up on a large number of use cases

The performance degradation with many orderable child nodes is roughly similar 
to what you see in Jackrabbit 2.x, so you should be able to go up to thousands 
of child nodes before starting to see significant slowdown. Without 
orderability you can go up to millions.

> Large number of child nodes not working well with orderable node types
> --
>
> Key: OAK-656
> URL: https://issues.apache.org/jira/browse/OAK-656
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Thomas Mueller
>Priority: Minor
>
> When adding many child nodes to an orderable node, oak gets slower and slower 
> and eventually runs out of memory. The problem seems to be the property 
> ":childOrder" which gets larger and larger. The effect is the same as with 
> Jackrabbit 2.x storing the list of child nodes in a node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1916) NodeStoreKernel doesn't handle array properties correctly

2014-06-26 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044651#comment-14044651
 ] 

Jukka Zitting commented on OAK-1916:


bq. It looks like there's now a NPE happening in the PropertyIndex because 
there is a primary type registered as null. 

This behavior is triggered by a limitation in the proof-of-concept -style 
implementation of {{NodeStoreKernel}} (see comment in OAK-987). It currently 
makes no distinction between name and string values, so properties like 
{{jcr:primaryType}} end up stored as strings instead of names. This makes 
type-safe calls like {{NodeState.getName("jcr:primaryType")}} return {{null}}.

> NodeStoreKernel doesn't handle array properties correctly
> -
>
> Key: OAK-1916
> URL: https://issues.apache.org/jira/browse/OAK-1916
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mk
>Reporter: Michael Dürig
> Attachments: OAK-1916.patch
>
>
> {{NodeStoreKernel}} currently only supports array properties of type long. 
> For other types it will fail with an {{IllegalStateException}}. See also the 
> FIXME in the code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1917) FileNotFoundException during TarMK GC

2014-06-25 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1917.


   Resolution: Fixed
Fix Version/s: 1.0.2
   1.1

Fixed in revision 1605670, and merged to the 1.0 branch (for 1.0.2) in revision 
1605671.

> FileNotFoundException during TarMK GC
> -
>
> Key: OAK-1917
> URL: https://issues.apache.org/jira/browse/OAK-1917
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1, 1.0.2
>
>
> When running garbage collection on a TarMK repository, it's in certain cases 
> possible for the following {{FileNotFoundException}} to occur:
> {noformat}
> java.io.FileNotFoundException: /path/to/dataNNb.tar (No such file or 
> directory)
> at java.io.RandomAccessFile.open(Native Method) ~[na:1.7.0_55]
> at java.io.RandomAccessFile.(RandomAccessFile.java:241) 
> ~[na:1.7.0_55]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.TarReader.openFirstFileWithValidIndex(TarReader.java:186)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.TarReader.cleanup(TarReader.java:647)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.FileStore.flush(FileStore.java:375)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.FileStore.close(FileStore.java:465)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at org.apache.jackrabbit.oak.run.Main.compact(Main.java:177) 
> [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at org.apache.jackrabbit.oak.run.Main.main(Main.java:108) 
> [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> {noformat}
> I originally assumed this error to be some weird platform issue, based on 
> some online reports about a new file not being available for opening during a 
> brief period after it was created. However, the explanation for this issue is 
> more deterministic:
> If the tar file in question was created with an Oak 0.x version from before 
> OAK-1780, then it wouldn't contain the pre-compiled segment graph 
> information. Due to a slight bug in the OAK-1780 implementation, this would 
> prevent a tar file that's full of garbage from being simply removed. Instead 
> a new, empty tar file would get generated, and due to the lazy writing 
> implemented OAK-631 that file would actually never get created. Thus the 
> FileNotFoundException.
> To fix this problem, we need to make sure that a tar file that's full of 
> garbage will get cleanly removed even if it doesn't contain a pre-compiled 
> segment graph.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1917) FileNotFoundException during TarMK GC

2014-06-25 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1917:
--

 Summary: FileNotFoundException during TarMK GC
 Key: OAK-1917
 URL: https://issues.apache.org/jira/browse/OAK-1917
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segmentmk
Affects Versions: 1.0.1
Reporter: Jukka Zitting
Assignee: Jukka Zitting


When running garbage collection on a TarMK repository, it's in certain cases 
possible for the following {{FileNotFoundException}} to occur:

{noformat}
java.io.FileNotFoundException: /path/to/dataNNb.tar (No such file or 
directory)
at java.io.RandomAccessFile.open(Native Method) ~[na:1.7.0_55]
at java.io.RandomAccessFile.(RandomAccessFile.java:241) 
~[na:1.7.0_55]
at 
org.apache.jackrabbit.oak.plugins.segment.file.TarReader.openFirstFileWithValidIndex(TarReader.java:186)
 [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.segment.file.TarReader.cleanup(TarReader.java:647)
 [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.segment.file.FileStore.flush(FileStore.java:375)
 [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.segment.file.FileStore.close(FileStore.java:465)
 [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at org.apache.jackrabbit.oak.run.Main.compact(Main.java:177) 
[oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at org.apache.jackrabbit.oak.run.Main.main(Main.java:108) 
[oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
{noformat}

I originally assumed this error to be some weird platform issue, based on some 
online reports about a new file not being available for opening during a brief 
period after it was created. However, the explanation for this issue is more 
deterministic:

If the tar file in question was created with an Oak 0.x version from before 
OAK-1780, then it wouldn't contain the pre-compiled segment graph information. 
Due to a slight bug in the OAK-1780 implementation, this would prevent a tar 
file that's full of garbage from being simply removed. Instead a new, empty tar 
file would get generated, and due to the lazy writing implemented OAK-631 that 
file would actually never get created. Thus the FileNotFoundException.

To fix this problem, we need to make sure that a tar file that's full of 
garbage will get cleanly removed even if it doesn't contain a pre-compiled 
segment graph.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1916) NodeStoreKernel doesn't handle array properties correctly

2014-06-25 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043843#comment-14043843
 ] 

Jukka Zitting commented on OAK-1916:


I fixed the SegmentNodeState issue in revision 1605526. Non-name 
{{jcr:primaryType}} or {{jcr:mixinTypes}} properties were not being handled 
correctly.

> NodeStoreKernel doesn't handle array properties correctly
> -
>
> Key: OAK-1916
> URL: https://issues.apache.org/jira/browse/OAK-1916
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mk
>Reporter: Michael Dürig
> Attachments: OAK-1916.patch
>
>
> {{NodeStoreKernel}} currently only supports array properties of type long. 
> For other types it will fail with an {{IllegalStateException}}. See also the 
> FIXME in the code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1816) Oak#createContentRepository never unregisters some of its services

2014-06-25 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043554#comment-14043554
 ] 

Jukka Zitting commented on OAK-1816:


+0.5 on the patch.

I think fundamentally the Oak class is growing beyond it's original design 
here, and instead of adding features like this we might be better served by not 
trying to extend it any further. As seen here, a complex deployment with lots 
of moving pieces will need more explicit lifecycle management than what the Oak 
class provides. For example in an OSGi environment it might make more sense to 
just *avoid* using the Oak class and instead directly instantiate and manage 
the various required components using the existing lifecycle management 
functionality already provided by OSGi.

> Oak#createContentRepository never unregisters some of its services
> --
>
> Key: OAK-1816
> URL: https://issues.apache.org/jira/browse/OAK-1816
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Michael Dürig
> Attachments: OAK-1816.patch
>
>
> {{Oak#createContentRepository}} registers a bunch of services with the 
> {{Whiteboard}} (MBeans, Executor, Observer) that are never unregistered. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1912) The UUID index temporarily not available during Oak upgrade

2014-06-25 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043436#comment-14043436
 ] 

Jukka Zitting commented on OAK-1912:


Another alternative would be to avoid using dynamic OSGi service lookups to 
access the index and instead statically wire the index implementation with the 
repository.

> The UUID index temporarily not available during Oak upgrade
> ---
>
> Key: OAK-1912
> URL: https://issues.apache.org/jira/browse/OAK-1912
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Affects Versions: 1.0, 1.0.1
>Reporter: Thomas Mueller
>
> In an OSGi framework, when upgrading the Oak version, the UUID index is 
> temporarily not available, but queries with the condition "uuid=" are 
> still run, and therefore traverse the whole repository.
> This makes upgrading the Oak version very slow.
> I guess the problem is that the old index is stopped before the rest of 
> oak-core is stopped, or the new index is started after the rest of oak-core 
> is started.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1890) Concurrent System Login: slowdown for high concurrency levels

2014-06-25 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043423#comment-14043423
 ] 

Jukka Zitting commented on OAK-1890:


+1 Works for me.

> Concurrent System Login: slowdown for high concurrency levels
> -
>
> Key: OAK-1890
> URL: https://issues.apache.org/jira/browse/OAK-1890
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: jcr
>Reporter: angela
>Assignee: Michael Dürig
> Fix For: 1.1
>
>
> output of running the system login/logout test with profiling enabled:
> {quote}
> $ java -Dprofile=true -Xmx2048M org.apache.jackrabbit.oak.run.Main benchmark 
> LoginSystemTest Oak-Tar --concurrency 1,2,4,8,10,15,20,50
> Apache Jackrabbit Oak 1.1-SNAPSHOT
> # LoginSystemTest  C min 10% 50% 90% max  
>  N
> Oak-Tar1  12  13  19  24  42  
>266
> Oak-Tar2  12  15  20  24  32  
>496
> Oak-Tar4  20  23  30  37  60  
>660
> Oak-Tar8  41  67  75  85  95  
>532
> Oak-Tar   10  77  90  96 1135166  
>122
> Oak-Tar   15 109 127555956735701  
> 27
> Oak-Tar   2058685874592859435944  
> 20
> Oak-Tar   50   22116   22133   22151   22157   22162  
> 50
> Profiler: top 5 stack trace(s) of 70414 ms:
> 1865/21120 (8%):
> at 
> org.apache.jackrabbit.stats.RepositoryStatisticsImpl.getOrCreateRecorder(RepositoryStatisticsImpl.java:99)
> at 
> org.apache.jackrabbit.stats.RepositoryStatisticsImpl.getCounter(RepositoryStatisticsImpl.java:80)
> at 
> org.apache.jackrabbit.oak.stats.StatisticManager.getCounter(StatisticManager.java:81)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.getCounter(SessionContext.java:182)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.(SessionImpl.java:89)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.createSession(SessionContext.java:161)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.getSession(SessionContext.java:141)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:260)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:195)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:54)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:51)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAsPrivileged(Subject.java:515)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.runTest(LoginSystemTest.java:51)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:279)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.execute(LoginSystemTest.java:33)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:288)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.access$000(AbstractTest.java:42)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest$Executor.run(AbstractTest.java:215)
> 1704/21120 (8%):
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.(Throwable.java:196)
> at java.lang.Exception.(Exception.java:41)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionStats.(SessionStats.java:40)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.(SessionDelegate.java:154)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl$1.(RepositoryImpl.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.createSessionDelegate(RepositoryImpl.java:269)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:255)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:195)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:54)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:51)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAsPrivileged(Subject.java:515)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.runTest(LoginSystemTest.java:51)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:279)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.execute(LoginSystemTest.java:33)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:288)
> at 
> org.apache.jackrabbit.oak.

[jira] [Commented] (OAK-1890) Concurrent System Login: slowdown for hight concurrenty levels

2014-06-24 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042541#comment-14042541
 ] 

Jukka Zitting commented on OAK-1890:


Perhaps we should reconsider the alternative I [suggested 
earlier|https://issues.apache.org/jira/browse/OAK-941?focusedCommentId=13866748&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866748]:

bq. The fact that this introduces need for extra background tasks is a bit 
troublesome as discussed in http://markmail.org/message/ougp6nrzfthqylwx. An 
alternative would be to use just a single MBean with a TabularData field that 
lists the details of all currently active sessions.

> Concurrent System Login: slowdown for hight concurrenty levels
> --
>
> Key: OAK-1890
> URL: https://issues.apache.org/jira/browse/OAK-1890
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: jcr
>Reporter: angela
>Assignee: Michael Dürig
>
> output of running the system login/logout test with profiling enabled:
> {quote}
> $ java -Dprofile=true -Xmx2048M org.apache.jackrabbit.oak.run.Main benchmark 
> LoginSystemTest Oak-Tar --concurrency 1,2,4,8,10,15,20,50
> Apache Jackrabbit Oak 1.1-SNAPSHOT
> # LoginSystemTest  C min 10% 50% 90% max  
>  N
> Oak-Tar1  12  13  19  24  42  
>266
> Oak-Tar2  12  15  20  24  32  
>496
> Oak-Tar4  20  23  30  37  60  
>660
> Oak-Tar8  41  67  75  85  95  
>532
> Oak-Tar   10  77  90  96 1135166  
>122
> Oak-Tar   15 109 127555956735701  
> 27
> Oak-Tar   2058685874592859435944  
> 20
> Oak-Tar   50   22116   22133   22151   22157   22162  
> 50
> Profiler: top 5 stack trace(s) of 70414 ms:
> 1865/21120 (8%):
> at 
> org.apache.jackrabbit.stats.RepositoryStatisticsImpl.getOrCreateRecorder(RepositoryStatisticsImpl.java:99)
> at 
> org.apache.jackrabbit.stats.RepositoryStatisticsImpl.getCounter(RepositoryStatisticsImpl.java:80)
> at 
> org.apache.jackrabbit.oak.stats.StatisticManager.getCounter(StatisticManager.java:81)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.getCounter(SessionContext.java:182)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.(SessionImpl.java:89)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.createSession(SessionContext.java:161)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.getSession(SessionContext.java:141)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:260)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:195)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:54)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:51)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAsPrivileged(Subject.java:515)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.runTest(LoginSystemTest.java:51)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:279)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.execute(LoginSystemTest.java:33)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:288)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.access$000(AbstractTest.java:42)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest$Executor.run(AbstractTest.java:215)
> 1704/21120 (8%):
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.(Throwable.java:196)
> at java.lang.Exception.(Exception.java:41)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionStats.(SessionStats.java:40)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.(SessionDelegate.java:154)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl$1.(RepositoryImpl.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.createSessionDelegate(RepositoryImpl.java:269)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:255)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:195)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:54)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:51)
> at java.security.AccessController.doPrivileged(Native 

[jira] [Created] (OAK-1905) SegmentMK: Arch segment(s)

2014-06-20 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1905:
--

 Summary: SegmentMK: Arch segment(s)
 Key: OAK-1905
 URL: https://issues.apache.org/jira/browse/OAK-1905
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, segmentmk
Reporter: Jukka Zitting
Priority: Minor


There are a lot of constants and other commonly occurring name, values and 
other data in a typical repository. To optimize storage space and access speed, 
it would be useful to place such data in one or more constant "arch segments" 
that are always cached in memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1893) MBean to dump Lucene Index content and related stats

2014-06-17 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034843#comment-14034843
 ] 

Jukka Zitting commented on OAK-1893:


bq. So the method exposed here is on same pattern

Good point. Consider that objection overruled, despite my general dislike of 
the pattern.

> MBean to dump Lucene Index content and related stats
> 
>
> Key: OAK-1893
> URL: https://issues.apache.org/jira/browse/OAK-1893
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: oak-lucene
>Affects Versions: 1.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.1
>
>
> Currently the Lucene index is stored within NodeStore as a content. To enable 
> debugging and better understanding of Lucene index content it would be 
> helpful to provide a JMX Bean which can dump the index content to filesystem



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1892) OrderedIndexConcurrentClusterIT takes too long

2014-06-16 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032493#comment-14032493
 ] 

Jukka Zitting commented on OAK-1892:


BTW, I'm also seeing very slow progress when indexing larger amounts of 
content. Updating the ordered index appears to be much slower than updating the 
Lucene index, including text extraction, which seems troublesome.

> OrderedIndexConcurrentClusterIT takes too long
> --
>
> Key: OAK-1892
> URL: https://issues.apache.org/jira/browse/OAK-1892
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: jcr
> Environment: trunk and 1.0 branch
>Reporter: Marcel Reutegger
>Assignee: Davide Giannella
>
> The OrderedIndexConcurrentClusterIT takes too long and times out on travis. 
> See e.g. https://travis-ci.org/apache/jackrabbit-oak/builds/27445383



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1893) MBean to dump Lucene Index content and related stats

2014-06-16 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032475#comment-14032475
 ] 

Jukka Zitting commented on OAK-1893:


-0 I'm not too excited about a remotely accessible feature that can be used to 
write to an arbitrary location in the local file system.

In general I think a low-level feature like this would be better implemented as 
a debugging tool in oak-run, for example as a new console command. Implementing 
it like that would also remove the need to backport the feature to a 
maintenance branch.

> MBean to dump Lucene Index content and related stats
> 
>
> Key: OAK-1893
> URL: https://issues.apache.org/jira/browse/OAK-1893
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: oak-lucene
>Affects Versions: 1.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.1
>
>
> Currently the Lucene index is stored within NodeStore as a content. To enable 
> debugging and better understanding of Lucene index content it would be 
> helpful to provide a JMX Bean which can dump the index content to filesystem



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-06-15 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032127#comment-14032127
 ] 

Jukka Zitting commented on OAK-1804:


One more problem I found:

* On a large repository where the compacted content takes >2.5GB, the default 
compaction threshold leads to an infinite compaction loop where the repository 
just keeps making copies of itself as the compaction operation itself would 
trigger the next compaction.

I fixed the problem in revision 1602800 (merged to 1.0 in revision 1602801) by 
dropping automatic compaction based on the threshold. Instead the gc() method 
needs to be explicitly called every now and then, like how is done in AEM 6 
with an automatic maintenance task that triggers compaction by default at 2am 
every night.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1877) Hourly async reindexing on an idle instance

2014-06-15 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032122#comment-14032122
 ] 

Jukka Zitting commented on OAK-1877:


There was a related problem in that the indexing status properties written by 
the async indexer would end up triggering the next async indexing iteration, 
thus still causing a new checkpoint to be created once every five seconds even 
if no other changes were committed between successive async indexer 
invocations. I fixed that in revision 1602796 (and 1602797), and merged the 
changes to the 1.0 branch in revision 1602798.

> Hourly async reindexing on an idle instance
> ---
>
> Key: OAK-1877
> URL: https://issues.apache.org/jira/browse/OAK-1877
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
> Fix For: 1.0.1, 1.1
>
>
> OAK-1292 introduced the following interesting but not very nice behavior:
> On an idle system with no changes for an extended amount of time, the 
> OAK-1292 change blocks the async indexer from updating the reference to the 
> last indexed checkpoint. After one hour (the default checkpoint lifetime), 
> the referenced checkpoint will expire, and the indexer will fall back to full 
> reindexing.
> The result of this behavior is that once every hour, the size of an idle 
> instance will grow with dozens or hundreds of megabytes of new index data 
> generated by reindexing. Older index data becomes garbage, but the compaction 
> code from OAK-1804 is needed to make it collectable. A better solution would 
> be to prevent the reindexing from happening in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1775) Avoid lock contention in IndexTracker.getIndexNode()

2014-06-12 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1775:
---

Fix Version/s: 1.0.1

Merged to the 1.0 branch in revision 1602316.

> Avoid lock contention in IndexTracker.getIndexNode()
> 
>
> Key: OAK-1775
> URL: https://issues.apache.org/jira/browse/OAK-1775
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: oak-lucene
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0.1, 1.1
>
>
> It turns out that the approach in OAK-1722 of keeping the 
> {{IndexTracker.getIndexNode()}} method synchronized while the more expensive 
> {{update()}} method is unsynchronized suffers from lock contention in cases 
> where lots of queries are executed concurrently. Thus we should go with 
> Chetan's original suggestion of avoiding exclusive synchronization of the 
> {{getIndexNode()}} method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1804) TarMK compaction

2014-06-12 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1804.


Resolution: Fixed

Fixed the above two issues in revisions 1602256 and 1602261, and merged them to 
the 1.0 branch in revision 1602296.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (OAK-1804) TarMK compaction

2014-06-12 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting reopened OAK-1804:



There's two more problems:

* On a really large repository with hundreds of millions of nodes, the 
uncompressed compaction map inside the Compactor class can become huge, up to a 
few gigabytes. It would be better if we could use the far more memory-efficient 
CompactionMap data structure instead, and perhaps further limit the number of 
entries we store in the map in the first place.
* The compaction checks in fastEquals() add up to some performance overhead 
since they get executed for all sorts of record comparisons, not just for nodes 
and blobs. It would be better to do the compaction checks only for those higher 
level comparisons.

I'll take a look at fixing the above issues.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-06-10 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026977#comment-14026977
 ] 

Jukka Zitting commented on OAK-1804:


I merged the latest revisions to the 1.0 branch.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-06-10 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026906#comment-14026906
 ] 

Jukka Zitting commented on OAK-1804:


I found a way to further squeeze the memory overhead of the compaction map; see 
revision 1601757. The mapping is now encapsulated in a separate 
{{CompactionMap}} class, and the amortized memory overhead of each mapping 
entry is just {{20/n + 8}} bytes, where {{n}} is the average numbers of 
compacted records within a source segment. See the {{CompactionMap}} javadocs 
for details on the new data structure.

AFAICT we're pretty much done here, especially with the re-enabled the RecordId 
cache flush from above. Is there anything that still needs to be done?

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1878) Backport OAK-1800 (configurable no. of rows) to branch 1.0

2014-06-09 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025465#comment-14025465
 ] 

Jukka Zitting commented on OAK-1878:


It is actually possible to update the fix version field of a closed issue. 
Closing an issue just prevents it from being reopened (which we don't need for 
backporting the change), but the issue can still be edited to add more 
information (like to add the maintenance releases to which the change has been 
backported).

See JCR-3603 for an example where the fix was originally introduced for 
Jackrabbit 2.7.1 and later backported to 2.6.5 and 2.4.5.


> Backport OAK-1800 (configurable no. of rows) to branch 1.0
> --
>
> Key: OAK-1878
> URL: https://issues.apache.org/jira/browse/OAK-1878
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: oak-solr
>Affects Versions: 1.0
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 1.0.1
>
>
> It'd be good to backport the fix for OAK-1800 to branch 1.0 in order to have 
> the no. of rows to be configurable (simplifying setup and improving 
> performances).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1878) Backport OAK-1800 (configurable no. of rows) to branch 1.0

2014-06-09 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025234#comment-14025234
 ] 

Jukka Zitting commented on OAK-1878:


It's better to use the original issue to track backporting, as then all the 
information about that topic is contained in that issue. Thus I'd resolve this 
as a duplicate and instead add the 1.0.1 fix version (in addition to the 
existing 1.1 version) to OAK-1800.

> Backport OAK-1800 (configurable no. of rows) to branch 1.0
> --
>
> Key: OAK-1878
> URL: https://issues.apache.org/jira/browse/OAK-1878
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: oak-solr
>Affects Versions: 1.0
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 1.0.1
>
>
> It'd be good to backport the fix for OAK-1800 to branch 1.0 in order to have 
> the no. of rows to be configurable (simplifying setup and improving 
> performances).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1877) Hourly async reindexing on an idle instance

2014-06-09 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1877.


   Resolution: Fixed
Fix Version/s: 1.1
   1.0.1

I ran a test with this change overnight, and the result is that an idle (AEM6) 
repository only grows about 1MB per hour (haven't yet figured out where that 
growth is coming from). That is two orders of magnitude less than before with 
no regressions I can find, so I merged the fix to the 1.0 branch in revision 
1601396. I think we can consider this issue resolved.

> Hourly async reindexing on an idle instance
> ---
>
> Key: OAK-1877
> URL: https://issues.apache.org/jira/browse/OAK-1877
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
> Fix For: 1.0.1, 1.1
>
>
> OAK-1292 introduced the following interesting but not very nice behavior:
> On an idle system with no changes for an extended amount of time, the 
> OAK-1292 change blocks the async indexer from updating the reference to the 
> last indexed checkpoint. After one hour (the default checkpoint lifetime), 
> the referenced checkpoint will expire, and the indexer will fall back to full 
> reindexing.
> The result of this behavior is that once every hour, the size of an idle 
> instance will grow with dozens or hundreds of megabytes of new index data 
> generated by reindexing. Older index data becomes garbage, but the compaction 
> code from OAK-1804 is needed to make it collectable. A better solution would 
> be to prevent the reindexing from happening in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1877) Hourly async reindexing on an idle instance

2014-06-08 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021566#comment-14021566
 ] 

Jukka Zitting commented on OAK-1877:


I committed an initial fix in revision 1601309. It increases the checkpoint 
lifetime to about 3 years (i.e. practically infinite) and only creates them if 
something actually has changed in the repository.

> Hourly async reindexing on an idle instance
> ---
>
> Key: OAK-1877
> URL: https://issues.apache.org/jira/browse/OAK-1877
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
>
> OAK-1292 introduced the following interesting but not very nice behavior:
> On an idle system with no changes for an extended amount of time, the 
> OAK-1292 change blocks the async indexer from updating the reference to the 
> last indexed checkpoint. After one hour (the default checkpoint lifetime), 
> the referenced checkpoint will expire, and the indexer will fall back to full 
> reindexing.
> The result of this behavior is that once every hour, the size of an idle 
> instance will grow with dozens or hundreds of megabytes of new index data 
> generated by reindexing. Older index data becomes garbage, but the compaction 
> code from OAK-1804 is needed to make it collectable. A better solution would 
> be to prevent the reindexing from happening in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1877) Hourly async reindexing on an idle instance

2014-06-08 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1877:
--

 Summary: Hourly async reindexing on an idle instance
 Key: OAK-1877
 URL: https://issues.apache.org/jira/browse/OAK-1877
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core
Affects Versions: 1.0
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Critical


OAK-1292 introduced the following interesting but not very nice behavior:

On an idle system with no changes for an extended amount of time, the OAK-1292 
change blocks the async indexer from updating the reference to the last indexed 
checkpoint. After one hour (the default checkpoint lifetime), the referenced 
checkpoint will expire, and the indexer will fall back to full reindexing.

The result of this behavior is that once every hour, the size of an idle 
instance will grow with dozens or hundreds of megabytes of new index data 
generated by reindexing. Older index data becomes garbage, but the compaction 
code from OAK-1804 is needed to make it collectable. A better solution would be 
to prevent the reindexing from happening in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1876) oak-run option to do diffs between TarMK revisions

2014-06-06 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1876.


   Resolution: Fixed
Fix Version/s: 1.1
 Assignee: Jukka Zitting

Done in revision 1601054.

> oak-run option to do diffs between TarMK revisions
> --
>
> Key: OAK-1876
> URL: https://issues.apache.org/jira/browse/OAK-1876
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> For improved debugging of what's going on in a repository, it would be useful 
> to have a low-level mechanism for outputting the JSOP diff between any two 
> TarMK revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1876) oak-run option to do diffs between TarMK revisions

2014-06-06 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1876:
--

 Summary: oak-run option to do diffs between TarMK revisions
 Key: OAK-1876
 URL: https://issues.apache.org/jira/browse/OAK-1876
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Jukka Zitting


For improved debugging of what's going on in a repository, it would be useful 
to have a low-level mechanism for outputting the JSOP diff between any two 
TarMK revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-06-03 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016796#comment-14016796
 ] 

Jukka Zitting commented on OAK-1804:


Looks good, +1 to commit.

Some improvements we could/should make after committing the patch:

* {{compact()}} should be private. It's better if we only publicly expose the 
{{gc()}} method for this purpose.
* The log message written by {{compact()}} should probably be at INFO level, 
it's IMO significant enough information.
* The {{writeNumber % compactThreshold == 0}} condition should also check for 
{{writeNumber > 0}}.
* It would be better if the {{gc()}} method set only the {{compactNeeded}} 
flag, which would trigger the {{compact()}} method to set {{cleanupNeeded}}. 
This way we can guarantee that we won't end up with a compaction that's not 
followed by the cleanup.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1803) Drop oak-mk-perf

2014-06-02 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1803.


   Resolution: Fixed
Fix Version/s: 1.1
 Assignee: Jukka Zitting

Done in revision 1599337.

> Drop oak-mk-perf
> 
>
> Key: OAK-1803
> URL: https://issues.apache.org/jira/browse/OAK-1803
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: mk
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> As discussed in http://markmail.org/message/neiolq4gd2kjseod, the oak-mk-perf 
> component from OAK-335 is a bit obsolete and has been excluded from the 
> normal build for quite a while now.
> To clean things up, I suggest we drop the component.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1869) TarMK: Incorrect tar entry verification in recovery mode

2014-06-02 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1869:
---

Fix Version/s: 1.0.1

Merged to the 1.0 branch for the 1.0.1 release in revision 1599306 as the 
impact of this bug is pretty bad even though it only occurs in an already 
abnormal situation.

> TarMK: Incorrect tar entry verification in recovery mode
> 
>
> Key: OAK-1869
> URL: https://issues.apache.org/jira/browse/OAK-1869
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
> Fix For: 1.0.1, 1.1
>
>
> When recovering segments from a forcibly closed tar file (i.e. one without a 
> proper tar index), the TarMK will scan all tar entries and verify their 
> checksums. Unfortunately that checksum verification is incorrect, which leads 
> to a lot of errors like the one below, and to the affected segments being 
> discarded.
> {noformat}
> Invalid entry checksum at offset N in tar file /path/to/dataNNx.tar, 
> skipping...
> {noformat}
> In practice this leads to the TarMK undoing a lot of recent changes until it 
> finds a segment that hasn't been discarded because of such an incorrect 
> checksum verification.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1869) TarMK: Incorrect tar entry verification in recovery mode

2014-06-02 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1869.


   Resolution: Fixed
Fix Version/s: 1.1

Fixed in revision 1599299.

> TarMK: Incorrect tar entry verification in recovery mode
> 
>
> Key: OAK-1869
> URL: https://issues.apache.org/jira/browse/OAK-1869
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
> Fix For: 1.1
>
>
> When recovering segments from a forcibly closed tar file (i.e. one without a 
> proper tar index), the TarMK will scan all tar entries and verify their 
> checksums. Unfortunately that checksum verification is incorrect, which leads 
> to a lot of errors like the one below, and to the affected segments being 
> discarded.
> {noformat}
> Invalid entry checksum at offset N in tar file /path/to/dataNNx.tar, 
> skipping...
> {noformat}
> In practice this leads to the TarMK undoing a lot of recent changes until it 
> finds a segment that hasn't been discarded because of such an incorrect 
> checksum verification.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1869) TarMK: Incorrect tar entry verification in recovery mode

2014-06-02 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1869:
--

 Summary: TarMK: Incorrect tar entry verification in recovery mode
 Key: OAK-1869
 URL: https://issues.apache.org/jira/browse/OAK-1869
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segmentmk
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Critical


When recovering segments from a forcibly closed tar file (i.e. one without a 
proper tar index), the TarMK will scan all tar entries and verify their 
checksums. Unfortunately that checksum verification is incorrect, which leads 
to a lot of errors like the one below, and to the affected segments being 
discarded.

{noformat}
Invalid entry checksum at offset N in tar file /path/to/dataNNx.tar, 
skipping...
{noformat}

In practice this leads to the TarMK undoing a lot of recent changes until it 
finds a segment that hasn't been discarded because of such an incorrect 
checksum verification.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1866) SegmentMK: Inefficient flat node comparisons

2014-06-02 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015460#comment-14015460
 ] 

Jukka Zitting commented on OAK-1866:


Merged to the 1.0 branch for 1.0.1 in revision 1599239.

> SegmentMK: Inefficient flat node comparisons
> 
>
> Key: OAK-1866
> URL: https://issues.apache.org/jira/browse/OAK-1866
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>  Labels: performance
> Fix For: 1.0.1, 1.1
>
>
> The SegmentMK has an optimization for the common case where only a single 
> child node among many has been updated. For the most part this code works 
> very well, but there's one code path where this optimization is currently not 
> applied and as a result a node comparison ends up traversing the full list of 
> child nodes.
> The troublesome code path gets triggered when a single child node is updated 
> in one commit and then another commit does some more complex changes (adds or 
> removes a node and/or modifies more than a single node). 
> Usually this isn't too big an issue since traversing even thousands of child 
> node entries is very fast with the SegmentMK, but things slow down a lot when 
> there are millions of children. Unfortunately that is exactly what happens 
> with the UUID index in a large repository with millions of referenceable 
> nodes...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1867) Optimize SegmentWriter.prepare()

2014-05-30 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1867.


   Resolution: Fixed
Fix Version/s: 1.1

Done in revision 1598799.

> Optimize SegmentWriter.prepare()
> 
>
> Key: OAK-1867
> URL: https://issues.apache.org/jira/browse/OAK-1867
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, segmentmk
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>  Labels: performance
> Fix For: 1.1
>
>
> A significant part of the time in writing new SegmentMK records is spent in 
> the {{SegmentWriter.prepare()}} method, especially in the part where the 
> exact set of segment references is computed. In most cases that computation 
> could be short-circuited to improve write performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1866) SegmentMK: Inefficient flat node comparisons

2014-05-30 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1866.


   Resolution: Fixed
Fix Version/s: 1.1

Fixed in revision 1598797.

> SegmentMK: Inefficient flat node comparisons
> 
>
> Key: OAK-1866
> URL: https://issues.apache.org/jira/browse/OAK-1866
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>  Labels: performance
> Fix For: 1.0.1, 1.1
>
>
> The SegmentMK has an optimization for the common case where only a single 
> child node among many has been updated. For the most part this code works 
> very well, but there's one code path where this optimization is currently not 
> applied and as a result a node comparison ends up traversing the full list of 
> child nodes.
> The troublesome code path gets triggered when a single child node is updated 
> in one commit and then another commit does some more complex changes (adds or 
> removes a node and/or modifies more than a single node). 
> Usually this isn't too big an issue since traversing even thousands of child 
> node entries is very fast with the SegmentMK, but things slow down a lot when 
> there are millions of children. Unfortunately that is exactly what happens 
> with the UUID index in a large repository with millions of referenceable 
> nodes...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1867) Optimize SegmentWriter.prepare()

2014-05-30 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1867:
--

 Summary: Optimize SegmentWriter.prepare()
 Key: OAK-1867
 URL: https://issues.apache.org/jira/browse/OAK-1867
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, segmentmk
Reporter: Jukka Zitting
Assignee: Jukka Zitting


A significant part of the time in writing new SegmentMK records is spent in the 
{{SegmentWriter.prepare()}} method, especially in the part where the exact set 
of segment references is computed. In most cases that computation could be 
short-circuited to improve write performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1866) SegmentMK: Inefficient flat node comparisons

2014-05-30 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1866:
--

 Summary: SegmentMK: Inefficient flat node comparisons
 Key: OAK-1866
 URL: https://issues.apache.org/jira/browse/OAK-1866
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, segmentmk
Affects Versions: 1.0
Reporter: Jukka Zitting
Assignee: Jukka Zitting
 Fix For: 1.0.1


The SegmentMK has an optimization for the common case where only a single child 
node among many has been updated. For the most part this code works very well, 
but there's one code path where this optimization is currently not applied and 
as a result a node comparison ends up traversing the full list of child nodes.

The troublesome code path gets triggered when a single child node is updated in 
one commit and then another commit does some more complex changes (adds or 
removes a node and/or modifies more than a single node). 

Usually this isn't too big an issue since traversing even thousands of child 
node entries is very fast with the SegmentMK, but things slow down a lot when 
there are millions of children. Unfortunately that is exactly what happens with 
the UUID index in a large repository with millions of referenceable nodes...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-05-29 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012918#comment-14012918
 ] 

Jukka Zitting commented on OAK-1804:


I implemented the deadlock fix outlined above, and followed up with a few more 
improvements. The compaction code is now located in a separate {{Compactor}} 
class in the main SegmentMK package, as it actually has no TarMK-specific bits.

I also added logic that keeps track of subtrees that have already been 
compacted so that things like checkpoints that share most of their content with 
other subtrees won't end up being compacted many times over. We should probably 
reuse that logic also in the backup code.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compaction.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1819) oak-solr-core test failures on Java 8

2014-05-29 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012392#comment-14012392
 ] 

Jukka Zitting commented on OAK-1819:


In revision 1598302 I disabled the tests when running on Java 8.

> oak-solr-core test failures on Java 8
> -
>
> Key: OAK-1819
> URL: https://issues.apache.org/jira/browse/OAK-1819
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: oak-solr
>Affects Versions: 1.0
> Environment: {noformat}
> Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 
> 22:15:32-0400)
> Maven home: c:\Program Files\apache-maven-3.1.0
> Java version: 1.8.0, vendor: Oracle Corporation
> Java home: c:\Program Files\Java\jdk1.8.0\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"
> {noformat}
>Reporter: Jukka Zitting
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: java8
>
> The following {{oak-solr-core}} test failures occur when building Oak with 
> Java 8:
> {noformat}
> Failed tests:
>   
> testNativeMLTQuery(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
>  expected: but was:
>   
> testNativeMLTQueryWithStream(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
>  expected: but was:
> {noformat}
> The cause of this might well be something as simple as the test case 
> incorrectly expecting a specific ordering of search results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-05-28 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011980#comment-14011980
 ] 

Jukka Zitting commented on OAK-1804:


I think we need to move the compaction operation from the flush method to a 
separate method (or even its own class), to be invoked either when explicitly 
requested or on schedule by the flush thread. That way the code won't interfere 
with the complex synchronization logic in flush(). I'll follow up with the 
required changes tomorrow unless you beat me to this or some other solution. 

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compaction.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-05-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010188#comment-14010188
 ] 

Jukka Zitting commented on OAK-1804:


In revision 1597854 I adjusted the compaction code to avoid memory problems 
when doing a deep compaction of an existing repository.

In revision 1597860 I committed the oak-run part of your earlier patch, apart 
from the compaction level option. I think we need to refactor the code a bit to 
make this part a bit more flexible.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compaction.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1804) TarMK compaction

2014-05-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010185#comment-14010185
 ] 

Jukka Zitting commented on OAK-1804:


I'd rather not do such bulk clearing of old checkpoints. There might be a valid 
use case for removing selected checkpoints (like an old backup that's no longer 
needed), but simply dropping all checkpoints older than a given timestamp seems 
like too blunt a tool to me.

> TarMK compaction
> 
>
> Key: OAK-1804
> URL: https://issues.apache.org/jira/browse/OAK-1804
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>  Labels: production, tools
> Fix For: 1.0.1, 1.1
>
> Attachments: SegmentNodeStore.java.patch, compaction.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1858) Segment Explorer

2014-05-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010006#comment-14010006
 ] 

Jukka Zitting commented on OAK-1858:


Note that there's no particular reason why this would need to be restricted to 
the SegmentMK. Instead I think we should rename the package from 
"segmentexplorer" to just "explorer" (similarly for class names) and only 
display the extra record/segment id information when dealing with the SegmentMK 
(much of that's already done with instanceof checks).

> Segment Explorer
> 
>
> Key: OAK-1858
> URL: https://issues.apache.org/jira/browse/OAK-1858
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>  Labels: tools
> Attachments: segmentexplorer.patch
>
>
> I'm thinking about working on a desktop tool that would allow browsing the 
> repository and would provide tarmk specific information: segment ids, tar 
> files, sizes, checkpoints and so on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1858) Segment Explorer

2014-05-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1401#comment-1401
 ] 

Jukka Zitting commented on OAK-1858:


I made a few tweaks to improve the content display a bit.

> Segment Explorer
> 
>
> Key: OAK-1858
> URL: https://issues.apache.org/jira/browse/OAK-1858
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>  Labels: tools
> Attachments: segmentexplorer.patch
>
>
> I'm thinking about working on a desktop tool that would allow browsing the 
> repository and would provide tarmk specific information: segment ids, tar 
> files, sizes, checkpoints and so on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1858) Segment Explorer

2014-05-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009786#comment-14009786
 ] 

Jukka Zitting commented on OAK-1858:


Looks great, thanks!

> Segment Explorer
> 
>
> Key: OAK-1858
> URL: https://issues.apache.org/jira/browse/OAK-1858
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>  Labels: tools
> Attachments: segmentexplorer.patch
>
>
> I'm thinking about working on a desktop tool that would allow browsing the 
> repository and would provide tarmk specific information: segment ids, tar 
> files, sizes, checkpoints and so on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1522) Provide PojoSR based RepositoryFactory implementation

2014-05-27 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009775#comment-14009775
 ] 

Jukka Zitting commented on OAK-1522:


On Java 8 I get the following compile error: {{ConfigInstaller.java:\[66,69] 
error: type DictionaryAsMap does not take parameters}}

> Provide PojoSR based RepositoryFactory implementation
> -
>
> Key: OAK-1522
> URL: https://issues.apache.org/jira/browse/OAK-1522
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
> Attachments: OAK-1522.patch
>
>
> This features provides support for configuring Repository using in built OSGi 
> support in non OSGi env using PojoSR [1].
> For more details refer to [2]. Note that PojoSR is being moved to Apache 
> Felix (FELIX-4445)
> [1] https://code.google.com/p/pojosr/
> [2] http://jackrabbit-oak.markmail.org/thread/7zpux64mj6vecwzf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1817) NPE in MarkSweepGarbageCollector.saveBatchToFile during Datastore GC with FileDataStore

2014-05-21 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1817.


   Resolution: Fixed
Fix Version/s: 1.1

Fixed in revision 1596534.

> NPE in MarkSweepGarbageCollector.saveBatchToFile during Datastore GC with 
> FileDataStore
> ---
>
> Key: OAK-1817
> URL: https://issues.apache.org/jira/browse/OAK-1817
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob
>Affects Versions: 0.20
>Reporter: Konrad Windszus
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
> Attachments: NodeAndDataStoreOsgiConfig.zip, OAK-1817.patch
>
>
> During running a datastore garbage collection on a Jackrabbit 2 FileDataStore 
> (org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore, see 
> http://jackrabbit.apache.org/oak/docs/osgi_config.html) an NPE is thrown
> {code}
> 13.05.2014 17:50:16.944 *ERROR* [qtp1416657193-147] 
> org.apache.jackrabbit.oak.management.ManagementOperation Blob garbage 
> collection failed
> java.lang.RuntimeException: Error in retrieving references
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector$1.addReference(MarkSweepGarbageCollector.java:395)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.Segment.collectBlobReferences(Segment.java:248)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentTracker.collectBlobReferences(SegmentTracker.java:178)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentBlobReferenceRetriever.collectReferences(SegmentBlobReferenceRetriever.java:38)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.iterateNodeTree(MarkSweepGarbageCollector.java:361)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.mark(MarkSweepGarbageCollector.java:201)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.markAndSweep(MarkSweepGarbageCollector.java:173)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.collectGarbage(MarkSweepGarbageCollector.java:149)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService$2.collectGarbage(SegmentNodeStoreService.java:185)
>   at org.apache.jackrabbit.oak.plugins.blob.BlobGC$1.call(BlobGC.java:68)
>   at org.apache.jackrabbit.oak.plugins.blob.BlobGC$1.call(BlobGC.java:64)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:192)
>   at com.google.common.base.Joiner.toString(Joiner.java:436)
>   at com.google.common.base.Joiner.appendTo(Joiner.java:108)
>   at com.google.common.base.Joiner.appendTo(Joiner.java:152)
>   at com.google.common.base.Joiner.join(Joiner.java:193)
>   at com.google.common.base.Joiner.join(Joiner.java:183)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.saveBatchToFile(MarkSweepGarbageCollector.java:317)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector$1.addReference(MarkSweepGarbageCollector.java:391)
>   ... 14 common frames omitted
> {code}
> Attached you find the OSGi config for both the nodestore and the datastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1817) NPE in MarkSweepGarbageCollector.saveBatchToFile during Datastore GC with FileDataStore

2014-05-21 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004556#comment-14004556
 ] 

Jukka Zitting commented on OAK-1817:


It looks like the bug here is in the way the SegmentWriter keeps recently 
flushed segments cached in memory. In some cases those segments are cached 
using the full 256kB byte buffer even if the segment size is slightly smaller 
than that. In those cases the segment still works correctly for normal access, 
but the blobref entries are not accessible, which I believe is causing the NPE 
here.

> NPE in MarkSweepGarbageCollector.saveBatchToFile during Datastore GC with 
> FileDataStore
> ---
>
> Key: OAK-1817
> URL: https://issues.apache.org/jira/browse/OAK-1817
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob
>Affects Versions: 0.20
>Reporter: Konrad Windszus
>Assignee: Chetan Mehrotra
> Attachments: NodeAndDataStoreOsgiConfig.zip, OAK-1817.patch
>
>
> During running a datastore garbage collection on a Jackrabbit 2 FileDataStore 
> (org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore, see 
> http://jackrabbit.apache.org/oak/docs/osgi_config.html) an NPE is thrown
> {code}
> 13.05.2014 17:50:16.944 *ERROR* [qtp1416657193-147] 
> org.apache.jackrabbit.oak.management.ManagementOperation Blob garbage 
> collection failed
> java.lang.RuntimeException: Error in retrieving references
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector$1.addReference(MarkSweepGarbageCollector.java:395)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.Segment.collectBlobReferences(Segment.java:248)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentTracker.collectBlobReferences(SegmentTracker.java:178)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentBlobReferenceRetriever.collectReferences(SegmentBlobReferenceRetriever.java:38)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.iterateNodeTree(MarkSweepGarbageCollector.java:361)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.mark(MarkSweepGarbageCollector.java:201)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.markAndSweep(MarkSweepGarbageCollector.java:173)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.collectGarbage(MarkSweepGarbageCollector.java:149)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService$2.collectGarbage(SegmentNodeStoreService.java:185)
>   at org.apache.jackrabbit.oak.plugins.blob.BlobGC$1.call(BlobGC.java:68)
>   at org.apache.jackrabbit.oak.plugins.blob.BlobGC$1.call(BlobGC.java:64)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:192)
>   at com.google.common.base.Joiner.toString(Joiner.java:436)
>   at com.google.common.base.Joiner.appendTo(Joiner.java:108)
>   at com.google.common.base.Joiner.appendTo(Joiner.java:152)
>   at com.google.common.base.Joiner.join(Joiner.java:193)
>   at com.google.common.base.Joiner.join(Joiner.java:183)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.saveBatchToFile(MarkSweepGarbageCollector.java:317)
>   at 
> org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector$1.addReference(MarkSweepGarbageCollector.java:391)
>   ... 14 common frames omitted
> {code}
> Attached you find the OSGi config for both the nodestore and the datastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1810) Incorrect TarMK graph metadata validation

2014-05-19 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1810:
---

Fix Version/s: 1.0.1

Merged to the 1.0 branch for the 1.0.1 release in revision 1595854.

> Incorrect TarMK graph metadata validation
> -
>
> Key: OAK-1810
> URL: https://issues.apache.org/jira/browse/OAK-1810
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Minor
> Fix For: 1.0.1, 1.1
>
>
> When reading the pre-compiled graph entries from OAK-1780, the TarMK uses a 
> few sanity checks to verify that the graph hasn't been corrupted and that 
> using it for the cleanup operation is safe.
> It turns out that one of these sanity checks ({{bytes >= count * 24 + 16}}) 
> is overly strict, as the minimum size limit for the graph entry is instead 
> {{count * 16 + 16}}. The "24" factor applies only when checking the sanity of 
> the the tar index entry.
> The effect of this bug is not very critical, as a graph entry that fails the 
> check will just be ignored with a warning and the cleanup code will fall back 
> to the slower algorithm of instead reading the segment graph directly from 
> the stored data segments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1828) Improved SegmentWriter

2014-05-16 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1828:
--

 Summary: Improved SegmentWriter
 Key: OAK-1828
 URL: https://issues.apache.org/jira/browse/OAK-1828
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, segmentmk
Reporter: Jukka Zitting
Priority: Minor


At about 1kLOC and dozens of methods, the SegmentWriter class currently a bit 
too complex for one of the key components of the TarMK. It also uses a somewhat 
non-obvious mix of synchronized and unsynchronized code to coordinate multiple 
concurrent threads that may be writing content at the same time. The 
synchronization blocks are also broader than what really would be needed, which 
in some cases causes unnecessary lock contention in concurrent write loads.

To improve the readability and maintainability of the code, and to increase 
performance of concurrent writes, it would be useful to split part of the 
SegmentWriter functionality to a separate RecordWriter class that would be 
responsible for writing individual records into a segment. The 
SegmentWriter.prepare() method would return a new RecordWriter instance, and 
the higher-level SegmentWriter methods would use the returned instance for all 
the work that's currently guarded in synchronization blocks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1810) Incorrect TarMK graph metadata validation

2014-05-16 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1810:
--

 Summary: Incorrect TarMK graph metadata validation
 Key: OAK-1810
 URL: https://issues.apache.org/jira/browse/OAK-1810
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: segmentmk
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Minor


When reading the pre-compiled graph entries from OAK-1780, the TarMK uses a few 
sanity checks to verify that the graph hasn't been corrupted and that using it 
for the cleanup operation is safe.

It turns out that one of these sanity checks ({{bytes >= count * 24 + 16}}) is 
overly strict, as the minimum size limit for the graph entry is instead {{count 
* 16 + 16}}. The "24" factor applies only when sanity of the the tar index 
entry.

The effect of this bug is not very critical, as a graph entry that fails the 
check will just be ignored with a warning and the cleanup code will fall back 
to the slower algorithm of instead reading the segment graph directly from the 
stored data segments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OAK-1819) oak-solr-core test failures on Java 8

2014-05-15 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated OAK-1819:
---

Description: 
The following {{oak-solr-core}} test failures occur when building Oak with Java 
8:

{noformat}
Failed tests:
  
testNativeMLTQuery(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
 expected: but was:
  
testNativeMLTQueryWithStream(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
 expected: but was:
{noformat}

The cause of this might well be something as simple as the test case 
incorrectly expecting a specific ordering of search results.

  was:
The following {{oak-solr-core}} test failures occur when building Oak with Java 
8:

{noformat}
Failed tests:   
testNativeMLTQuery(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
 expected: but was:
 
testNativeMLTQueryWithStream(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
 expected: but was:
{noformat}

The cause of this might well be something as simple as the test case 
incorrectly expecting a specific ordering of search results.


> oak-solr-core test failures on Java 8
> -
>
> Key: OAK-1819
> URL: https://issues.apache.org/jira/browse/OAK-1819
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: oak-solr
>Affects Versions: 1.0
> Environment: {noformat}
> Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 
> 22:15:32-0400)
> Maven home: c:\Program Files\apache-maven-3.1.0
> Java version: 1.8.0, vendor: Oracle Corporation
> Java home: c:\Program Files\Java\jdk1.8.0\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"
> {noformat}
>Reporter: Jukka Zitting
>Priority: Minor
>  Labels: java8
>
> The following {{oak-solr-core}} test failures occur when building Oak with 
> Java 8:
> {noformat}
> Failed tests:
>   
> testNativeMLTQuery(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
>  expected: but was:
>   
> testNativeMLTQueryWithStream(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
>  expected: but was:
> {noformat}
> The cause of this might well be something as simple as the test case 
> incorrectly expecting a specific ordering of search results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1819) oak-solr-core test failures on Java 8

2014-05-14 Thread Jukka Zitting (JIRA)
Jukka Zitting created OAK-1819:
--

 Summary: oak-solr-core test failures on Java 8
 Key: OAK-1819
 URL: https://issues.apache.org/jira/browse/OAK-1819
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: oak-solr
Affects Versions: 1.0
 Environment: {noformat}
Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 
22:15:32-0400)
Maven home: c:\Program Files\apache-maven-3.1.0
Java version: 1.8.0, vendor: Oracle Corporation
Java home: c:\Program Files\Java\jdk1.8.0\jre
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"
{noformat}
Reporter: Jukka Zitting
Priority: Minor


The following {{oak-solr-core}} test failures occur when building Oak with Java 
8:

{noformat}
Failed tests:   
testNativeMLTQuery(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
 expected: but was:
 
testNativeMLTQueryWithStream(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
 expected: but was:
{noformat}

The cause of this might well be something as simple as the test case 
incorrectly expecting a specific ordering of search results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (OAK-1810) Incorrect TarMK graph metadata validation

2014-05-14 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved OAK-1810.


   Resolution: Fixed
Fix Version/s: 1.1

Fixed in revision 1593554.

> Incorrect TarMK graph metadata validation
> -
>
> Key: OAK-1810
> URL: https://issues.apache.org/jira/browse/OAK-1810
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Affects Versions: 1.0
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Minor
> Fix For: 1.1
>
>
> When reading the pre-compiled graph entries from OAK-1780, the TarMK uses a 
> few sanity checks to verify that the graph hasn't been corrupted and that 
> using it for the cleanup operation is safe.
> It turns out that one of these sanity checks ({{bytes >= count * 24 + 16}}) 
> is overly strict, as the minimum size limit for the graph entry is instead 
> {{count * 16 + 16}}. The "24" factor applies only when sanity of the the tar 
> index entry.
> The effect of this bug is not very critical, as a graph entry that fails the 
> check will just be ignored with a warning and the cleanup code will fall back 
> to the slower algorithm of instead reading the segment graph directly from 
> the stored data segments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   6   7   8   9   10   >