Re: [VOTE] Release Apache Jackrabbit Oak 1.7.3
+1 Release this package as Apache Jackrabbit Oak 1.7.3
BUILD FAILURE: Jackrabbit Oak - Build # 503 - Still Failing
The Apache Jenkins build system has built Jackrabbit Oak (build #503) Status: Still Failing Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/503/ to view the results. Changes: [chetanm] OAK-6415 - Use dynamic service loader by default Reapplying reverted commit Added a test to check default behaviour which shows that it has not changed. Minor refactoring done in BinaryTextExtractor but no functional change done for this issue Test results: 1 tests failed. FAILED: org.apache.jackrabbit.j2ee.TomcatIT.testTomcat Error Message: org/apache/http/config/Lookup Stack Trace: java.lang.NoClassDefFoundError: org/apache/http/config/Lookup at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85) Caused by: java.lang.ClassNotFoundException: org.apache.http.config.Lookup at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85)
BUILD FAILURE: Jackrabbit Oak - Build # 502 - Still Failing
The Apache Jenkins build system has built Jackrabbit Oak (build #502) Status: Still Failing Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/502/ to view the results. Changes: [chetanm] OAK-5048 - Upgrade to Tika 1.15 version OAK-6414 - Use Tika config to determine non indexed mimeTypes -- Update Tika to 1.15 -- Use TikaParserConfig to check which all mimetypes have been configured with EmptyParser -- oak-webapp - Need to exclude the httpcomponents from tika-parser as it has a transitive dependency to an old version of http components which is in conflict with one used by htmlunit. Test results: 1 tests failed. FAILED: org.apache.jackrabbit.j2ee.TomcatIT.testTomcat Error Message: org/apache/http/config/Lookup Stack Trace: java.lang.NoClassDefFoundError: org/apache/http/config/Lookup at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85) Caused by: java.lang.ClassNotFoundException: org.apache.http.config.Lookup at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85)
Re: [DiSCUSS] - highly vs rarely used data
>From my experience working with customers, I can pretty much guarantee that sooner or later: (a) the implementation of an automatism is not *quite* what they need/want (b) they want to be able to manually select (or more likely override) whether a file can be archived Thus I suggest to come up with a pluggable "strategy" interface and provide a sensible default implementation. The default will be fine for most customers/users, but advanced use-cases can be implemented by substituting the implementation. Implementations could then also respect manually set flags (=properties) if desired. A much more important and difficult question to answer IMHO is how to deal with the slow retrieval of archived content. And if needed, how to expose the slow availability (i.e. unavailable now but available later) to the end user (or application layer). To me this sounds tricky if we want to stick to the JCR API. Regards Julian On Mon, Jul 3, 2017 at 4:33 PM, Tommaso Teofiliwrote: > I am sure there are both use cases for automatic vs manual/controlled > collection of unused data, however if I were a user I would personally not > want to care about this. While I'd be happy to know that my repo is faster > / smaller / cleaner / whatever it'd sound overly complex to deal with JCR > and Oak constraints and behaviours from the application layer. > IMHO if we want to have such a feature in Oak to save resources, it should > be the persistence responsibility to say "hey, this content is not being > accessed for ages, let's try to claim some resources from it" (which could > mean moving to cold storage, compress it or anything else). > > My 2 cents, > Tommaso > > > > Il giorno lun 3 lug 2017 alle ore 15:46 Thomas Mueller > ha scritto: > >> Hi, >> >> > a property on the node, e.g. "archiveState=toArchive" >> >> I wonder if we _can_ easily write to the version store? Also, some >> nodetypes don't allow such properties? It might need to be a hidden >> property, but then you can't use the JCR API. Or maintain this data in a >> "shadow" structure (not with the nodes), which would complicate move >> operations. >> >> If I was a customer, I wouldn't wan't to *manually* mark / unmark binaries >> to be moved to / from long time storage. I would probably just want to rely >> on automatic management. But I'm not a customer, so my opinion is not that >> relevant ( >> >> > Using a property directly specified for this purpose gives us more >> direct control over how it is being used I think. >> >> Sure, but it also comes with some complexities. >> >> Regards, >> Thomas >> >> >> >>
Re: Percentile implementation
I'll add the dependency. Thanks, Andrei 2017-07-04 13:10 GMT+03:00 Michael Dürig: > > > On 04.07.17 11:15, Francesco Mari wrote: > >> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu : >> >>> Now my question is this: do we have a simple percentile implementation in >>> Oak (I didn't find one)? >>> >> >> I'm not aware of a percentile implementation in Oak. >> >> If not, would you recommend writing my own or adapting/extracting an >>> existing one in a utility class? >>> >> >> In the past we copied and pasted source code from other projects in >> Oak. As long as the license allows it and proper attribution is given, >> it shouldn't be a problem. That said, I'm not a big fan of either >> rewriting an implementation from scratch or copying and pasting source >> code from other projects. Is exposing a percentile really necessary? >> If yes, how big of a problem is embedding of commons-math3? >> >> > We should avoid copy paste as we might miss important fixes in later > releases. I only did this once for some code where we needed a fix that > wasn't yet released. It was a hassle. > I would just add a dependency to commons-math3. Its a library exposing the > functionality we require, so let's use it. > > Michael >
Re: Percentile implementation
Hi Francesco, Is exposing a percentile really necessary? > To give you some background, I'm talking about OAK-4732 [2]. I don't know if we can achieve the same result without the percentile. > If yes, how big of a problem is embedding of commons-math3? > 2.1M commons-math3-3.6.1.jar I'd say it's too much to add it as a dependency to oak-segment-tar. [2] https://issues.apache.org/jira/browse/OAK-4732 Thanks, Andrei
Re: Percentile implementation
On 04.07.17 11:15, Francesco Mari wrote: 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu: Now my question is this: do we have a simple percentile implementation in Oak (I didn't find one)? I'm not aware of a percentile implementation in Oak. If not, would you recommend writing my own or adapting/extracting an existing one in a utility class? In the past we copied and pasted source code from other projects in Oak. As long as the license allows it and proper attribution is given, it shouldn't be a problem. That said, I'm not a big fan of either rewriting an implementation from scratch or copying and pasting source code from other projects. Is exposing a percentile really necessary? If yes, how big of a problem is embedding of commons-math3? We should avoid copy paste as we might miss important fixes in later releases. I only did this once for some code where we needed a fix that wasn't yet released. It was a hassle. I would just add a dependency to commons-math3. Its a library exposing the functionality we require, so let's use it. Michael
Re: Blobstore consistency check
Awesome - thank you ! Andrei > On Jul 4, 2017, at 10:59 AM, Andrei Dulceanu> wrote: > > Hi Andrei, > > >> If indexes are not part of backup/restore, when and how is the index >> recreated? >> > > Citing an old answer from Thomas: "The disadvantage is startup (after a > restore) is slightly slower, but not drastically (the index does not need > to be re-built, it just has to be extracted again)." > > Regards, > Andrei > > 2017-07-04 10:10 GMT+03:00 Andrei Kalfas : > >> Hi, >> >>> I tried something similar a while ago and I used s3 bucket versioning >> [1]. >>> It allows you to do point in time restores without forcing the "prevent >>> deletion policy". Is it suitable for your use case? >> >> I wish that this would have been on Microsofts Azure Storage feature list, >> its not, yet. >> >>> >>> No matter what approach you use for backup/restore, the backup of the >>> segment store should come first (before datastore) to avoid any >>> inconsistencies with binaries referenced in the segment store. It would >>> probably be good if the backup doesn't contain the index data to avoid >>> possible corruptions. >> >> If indexes are not part of backup/restore, when and how is the index >> recreated? >> >> Thanks, >> Andrei >> >> smime.p7s Description: S/MIME cryptographic signature
Re: Lots of debug output in oak-remote test
Fixed that with OAK-6417 Chetan Mehrotra On Tue, Jul 4, 2017 at 3:04 PM, Chetan Mehrotrawrote: > While running below in oak-remote module I am seeing lots of debug > output on console. > > mvn clean install -PintegrationTesting > > Is anyone else also observing that? Not sure why all debug logs are > getting enabled > > Chetan Mehrotra
Lots of debug output in oak-remote test
While running below in oak-remote module I am seeing lots of debug output on console. mvn clean install -PintegrationTesting Is anyone else also observing that? Not sure why all debug logs are getting enabled Chetan Mehrotra
BUILD FAILURE: Jackrabbit Oak - Build # 500 - Still Failing
The Apache Jenkins build system has built Jackrabbit Oak (build #500) Status: Still Failing Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/500/ to view the results. Changes: [chetanm] OAK-6414 - Use Tika config to determine non indexed mimeTypes Reverting changes in 1800726, 1800727 Test results: 1 tests failed. FAILED: org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.validateMigration[Copy references, no blobstores defined, document -> segment-tar] Error Message: Failed to copy content Stack Trace: javax.jcr.RepositoryException: Failed to copy content at org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183) Caused by: java.lang.IllegalStateException: Branch with failed reset at org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183) Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakOak0100: Branch reset failed at org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183) Caused by: org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: Empty branch cannot be reset at org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183)
Re: Percentile implementation
Oak has an optional dependency on dropwizard metric library which has a Histogram implementation [1] which can be used. So possibly we can use that. So far its optional and hidden behind the statistics api. Recently I also had to make use of that for rate estimation in indexing so used it in a way that Metrics based logic gets used if avialable otherwise fallback to a simple mean based implementation [2]. May be we can make it a required dependency? Chetan Mehrotra [1] http://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Histogram.html [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/progress/MetricRateEstimator.java On Tue, Jul 4, 2017 at 2:22 PM, Andrei Dulceanuwrote: > Hi all, > > I was working on an issue for which I needed to use *only* a percentile for > some recorded stats. Initially I used SynchronizedDescriptiveStatistics [0] > from commons-math3 [1], but then I thought that adding this dependency > would be too much for such a simple use case. > > Now my question is this: do we have a simple percentile implementation in > Oak (I didn't find one)? If not, would you recommend writing my own or > adapting/extracting an existing one in a utility class? > > Regards, > Andrei > > [0] > http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/descriptive/SynchronizedDescriptiveStatistics.html > [1] http://commons.apache.org/proper/commons-math/
Re: Percentile implementation
2017-07-04 10:52 GMT+02:00 Andrei Dulceanu: > Now my question is this: do we have a simple percentile implementation in > Oak (I didn't find one)? I'm not aware of a percentile implementation in Oak. > If not, would you recommend writing my own or adapting/extracting an > existing one in a utility class? In the past we copied and pasted source code from other projects in Oak. As long as the license allows it and proper attribution is given, it shouldn't be a problem. That said, I'm not a big fan of either rewriting an implementation from scratch or copying and pasting source code from other projects. Is exposing a percentile really necessary? If yes, how big of a problem is embedding of commons-math3?
Percentile implementation
Hi all, I was working on an issue for which I needed to use *only* a percentile for some recorded stats. Initially I used SynchronizedDescriptiveStatistics [0] from commons-math3 [1], but then I thought that adding this dependency would be too much for such a simple use case. Now my question is this: do we have a simple percentile implementation in Oak (I didn't find one)? If not, would you recommend writing my own or adapting/extracting an existing one in a utility class? Regards, Andrei [0] http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/descriptive/SynchronizedDescriptiveStatistics.html [1] http://commons.apache.org/proper/commons-math/
Re: Trunk doesn't compile
Reverted the commit with 1800742. The build should now pass Chetan Mehrotra On Tue, Jul 4, 2017 at 1:51 PM, Francesco Mariwrote: > Thanks for taking care of this. > > 2017-07-04 10:17 GMT+02:00 Chetan Mehrotra : > >> My fault. Looks like code used API from Tika 1.15 which I am yet >> testing. Would fix it now >> Chetan Mehrotra >> >> >> On Tue, Jul 4, 2017 at 1:34 PM, Francesco Mari >> wrote: >> > When compiling trunk (r1800739) I get the following error in the >> oak-lucene >> > module. >> > >> > [ERROR] Failed to execute goal >> > org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile >> > (default-compile) on project oak-lucene: Compilation failure >> > [ERROR] >> > /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/ >> apache/jackrabbit/oak/plugins/index/lucene/binary/ >> TikaParserConfig.java:[94,34] >> > cannot find symbol >> > [ERROR] symbol: method getDocumentBuilder() >> > [ERROR] location: class org.apache.tika.parser.ParseContext >> > >> > Can someone have a look at it? >>
Re: Trunk doesn't compile
Thanks for taking care of this. 2017-07-04 10:17 GMT+02:00 Chetan Mehrotra: > My fault. Looks like code used API from Tika 1.15 which I am yet > testing. Would fix it now > Chetan Mehrotra > > > On Tue, Jul 4, 2017 at 1:34 PM, Francesco Mari > wrote: > > When compiling trunk (r1800739) I get the following error in the > oak-lucene > > module. > > > > [ERROR] Failed to execute goal > > org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile > > (default-compile) on project oak-lucene: Compilation failure > > [ERROR] > > /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/ > apache/jackrabbit/oak/plugins/index/lucene/binary/ > TikaParserConfig.java:[94,34] > > cannot find symbol > > [ERROR] symbol: method getDocumentBuilder() > > [ERROR] location: class org.apache.tika.parser.ParseContext > > > > Can someone have a look at it? >
Re: Trunk doesn't compile
My fault. Looks like code used API from Tika 1.15 which I am yet testing. Would fix it now Chetan Mehrotra On Tue, Jul 4, 2017 at 1:34 PM, Francesco Mariwrote: > When compiling trunk (r1800739) I get the following error in the oak-lucene > module. > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile > (default-compile) on project oak-lucene: Compilation failure > [ERROR] > /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/binary/TikaParserConfig.java:[94,34] > cannot find symbol > [ERROR] symbol: method getDocumentBuilder() > [ERROR] location: class org.apache.tika.parser.ParseContext > > Can someone have a look at it?
Trunk doesn't compile
When compiling trunk (r1800739) I get the following error in the oak-lucene module. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project oak-lucene: Compilation failure [ERROR] /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/binary/TikaParserConfig.java:[94,34] cannot find symbol [ERROR] symbol: method getDocumentBuilder() [ERROR] location: class org.apache.tika.parser.ParseContext Can someone have a look at it?
Re: Blobstore consistency check
Hi Andrei, > If indexes are not part of backup/restore, when and how is the index > recreated? > Citing an old answer from Thomas: "The disadvantage is startup (after a restore) is slightly slower, but not drastically (the index does not need to be re-built, it just has to be extracted again)." Regards, Andrei 2017-07-04 10:10 GMT+03:00 Andrei Kalfas: > Hi, > > > I tried something similar a while ago and I used s3 bucket versioning > [1]. > > It allows you to do point in time restores without forcing the "prevent > > deletion policy". Is it suitable for your use case? > > I wish that this would have been on Microsofts Azure Storage feature list, > its not, yet. > > > > > No matter what approach you use for backup/restore, the backup of the > > segment store should come first (before datastore) to avoid any > > inconsistencies with binaries referenced in the segment store. It would > > probably be good if the backup doesn't contain the index data to avoid > > possible corruptions. > > If indexes are not part of backup/restore, when and how is the index > recreated? > > Thanks, > Andrei > >
Re: [VOTE] Release Apache Jackrabbit Oak 1.7.3
[X] +1 Release this package as Apache Jackrabbit Oak 1.7.3 2017-07-03 18:48 GMT+03:00 Julian Reschke: > On 2017-07-03 15:52, Davide Giannella wrote: > >> ... >> > > [X] +1 Release this package as Apache Jackrabbit Oak 1.7.3 > > > Best regards, Julian >
Re: Blobstore consistency check
Hi, > I tried something similar a while ago and I used s3 bucket versioning [1]. > It allows you to do point in time restores without forcing the "prevent > deletion policy". Is it suitable for your use case? I wish that this would have been on Microsofts Azure Storage feature list, its not, yet. > > No matter what approach you use for backup/restore, the backup of the > segment store should come first (before datastore) to avoid any > inconsistencies with binaries referenced in the segment store. It would > probably be good if the backup doesn't contain the index data to avoid > possible corruptions. If indexes are not part of backup/restore, when and how is the index recreated? Thanks, Andrei smime.p7s Description: S/MIME cryptographic signature
Re: Blobstore consistency check
Hi Andrei, Now, with large repos that spill over in aws s3 or azure storage it not > practical to backup terabytes of data each day for various reasons (costs > and time to backup/restore amount them), and I was fiddling with the idea > of setting up the datastore in such way that deletion is prevented - I mean > deletion from the s3 bucket/azure storage container. I tried something similar a while ago and I used s3 bucket versioning [1]. It allows you to do point in time restores without forcing the "prevent deletion policy". Is it suitable for your use case? > This way backup would be a matter of backing up the segment store - thats > a matter of a couple of gigs of data, and restore would be pretty much the > same. No matter what approach you use for backup/restore, the backup of the segment store should come first (before datastore) to avoid any inconsistencies with binaries referenced in the segment store. It would probably be good if the backup doesn't contain the index data to avoid possible corruptions. HTH, Andrei [1] http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html 2017-07-03 17:47 GMT+03:00 Andrei Kalfas: > Hi Andrei, > > Ok, here is the context, apologize for not starting with that first. > > I’m working on a PTR (point in time restore) proof of concept involving > large asset repositories. The easy way for small-ish repos would be to > backup everyday everything and when the customer pops by and says “I want > everything back to day YYY” just restore the file systems from backups and > thats it. Now, with large repos that spill over in aws s3 or azure storage > it not practical to backup terabytes of data each day for various reasons > (costs and time to backup/restore amount them), and I was fiddling with the > idea of setting up the datastore in such way that deletion is prevented - I > mean deletion from the s3 bucket/azure storage container. This way backup > would be a matter of backing up the segment store - thats a matter of a > couple of gigs of data, and restore would be pretty much the same. The > problem is that not allowing to delete things from s3 bucket/azure storage > container is that one will pack a lot of garbage over the time, so I’ll > need a way to figure out whats garbage so that I can move that data into > cheeper storages and eventually delete it completely. This is why I was > fishing for a easy way to get the inverted list that I mentioned bellow. > > Thanks, > Andrei > > > > On Jul 3, 2017, at 4:47 PM, Andrei Dulceanu > wrote: > > > > Hi Andrei, > > > > AFAIK, there isn't currently such an option for the consistency check. > What > > scenario do you have in mind for using it? > > > > Regards, > > Andrei > > > > 2017-07-03 16:32 GMT+03:00 Andrei Kalfas : > > > >> Hi, > >> > >> I’m reading about the consistency check tool thats available via oak-run > >> and if I got it right its gonna report missing blobs that are > referenced. > >> Is there a way to get the inverted list, i.e. things that are in the > >> datastore but not referenced from the segment store. > >> > >> Thank you, > >> Andrei > >> > >> > >
BUILD FAILURE: Jackrabbit Oak - Build # 499 - Failure
The Apache Jenkins build system has built Jackrabbit Oak (build #499) Status: Failure Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/499/ to view the results. Changes: [chetanm] OAK-6415 - Use dynamic service loader by default Added a test to check default behaviour which shows that it has not changed. Minor refactoring done in BinaryTextExtractor but no functional change done for this issue [chetanm] OAK-6414 - Use Tika config to determine non indexed mimeTypes Test results: 1 tests failed. FAILED: org.apache.jackrabbit.oak.segment.MapRecordTest.testOak1104 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.jackrabbit.oak.segment.MapRecordTest.testOak1104(MapRecordTest.java:93)