Re: [VOTE] Release Apache Jackrabbit Oak 1.7.3

2017-07-04 Thread Thomas Mueller
+1 Release this package as Apache Jackrabbit Oak 1.7.3
 



BUILD FAILURE: Jackrabbit Oak - Build # 503 - Still Failing

2017-07-04 Thread Apache Jenkins Server
The Apache Jenkins build system has built Jackrabbit Oak (build #503)

Status: Still Failing

Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/503/ to 
view the results.

Changes:
[chetanm] OAK-6415 - Use dynamic service loader by default

Reapplying reverted commit

Added a test to check default behaviour which shows that it has not
changed. Minor refactoring done in BinaryTextExtractor but no
functional change done for this issue

 

Test results:
1 tests failed.
FAILED:  org.apache.jackrabbit.j2ee.TomcatIT.testTomcat

Error Message:
org/apache/http/config/Lookup

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/http/config/Lookup
at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85)
Caused by: java.lang.ClassNotFoundException: org.apache.http.config.Lookup
at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85)

BUILD FAILURE: Jackrabbit Oak - Build # 502 - Still Failing

2017-07-04 Thread Apache Jenkins Server
The Apache Jenkins build system has built Jackrabbit Oak (build #502)

Status: Still Failing

Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/502/ to 
view the results.

Changes:
[chetanm] OAK-5048 - Upgrade to Tika 1.15 version
OAK-6414 - Use Tika config to determine non indexed mimeTypes

-- Update Tika to 1.15
-- Use TikaParserConfig to check which all mimetypes have been configured
   with EmptyParser

-- oak-webapp - Need to exclude the httpcomponents from tika-parser
   as it has a transitive dependency to an old version of http components
   which is in conflict with one used by htmlunit.

 

Test results:
1 tests failed.
FAILED:  org.apache.jackrabbit.j2ee.TomcatIT.testTomcat

Error Message:
org/apache/http/config/Lookup

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/http/config/Lookup
at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85)
Caused by: java.lang.ClassNotFoundException: org.apache.http.config.Lookup
at org.apache.jackrabbit.j2ee.TomcatIT.setUp(TomcatIT.java:85)

Re: [DiSCUSS] - highly vs rarely used data

2017-07-04 Thread Julian Sedding
>From my experience working with customers, I can pretty much guarantee
that sooner or later:

(a) the implementation of an automatism is not *quite* what they need/want
(b) they want to be able to manually select (or more likely override)
whether a file can be archived

Thus I suggest to come up with a pluggable "strategy" interface and
provide a sensible default implementation. The default will be fine
for most customers/users, but advanced use-cases can be implemented by
substituting the implementation. Implementations could then also
respect manually set flags (=properties) if desired.

A much more important and difficult question to answer IMHO is how to
deal with the slow retrieval of archived content. And if needed, how
to expose the slow availability (i.e. unavailable now but available
later) to the end user (or application layer). To me this sounds
tricky if we want to stick to the JCR API.

Regards
Julian



On Mon, Jul 3, 2017 at 4:33 PM, Tommaso Teofili
 wrote:
> I am sure there are both use cases for automatic vs manual/controlled
> collection of unused data, however if I were a user I would personally not
> want to care about this. While I'd be happy to know that my repo is faster
> / smaller / cleaner / whatever it'd sound overly complex to deal with JCR
> and Oak constraints and behaviours from the application layer.
> IMHO if we want to have such a feature in Oak to save resources, it should
> be the persistence responsibility to say "hey, this content is not being
> accessed for ages, let's try to claim some resources from it" (which could
> mean moving to cold storage, compress it or anything else).
>
> My 2 cents,
> Tommaso
>
>
>
> Il giorno lun 3 lug 2017 alle ore 15:46 Thomas Mueller
>  ha scritto:
>
>> Hi,
>>
>> > a property on the node, e.g. "archiveState=toArchive"
>>
>> I wonder if we _can_ easily write to the version store? Also, some
>> nodetypes don't allow such properties? It might need to be a hidden
>> property, but then you can't use the JCR API. Or maintain this data in a
>> "shadow" structure (not with the nodes), which would complicate move
>> operations.
>>
>> If I was a customer, I wouldn't wan't to *manually* mark / unmark binaries
>> to be moved to / from long time storage. I would probably just want to rely
>> on automatic management. But I'm not a customer, so my opinion is not that
>> relevant (
>>
>> > Using a property directly specified for this purpose gives us more
>> direct control over how it is being used I think.
>>
>> Sure, but it also comes with some complexities.
>>
>> Regards,
>> Thomas
>>
>>
>>
>>


Re: Percentile implementation

2017-07-04 Thread Andrei Dulceanu
I'll add the dependency.

Thanks,
Andrei

2017-07-04 13:10 GMT+03:00 Michael Dürig :

>
>
> On 04.07.17 11:15, Francesco Mari wrote:
>
>> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu :
>>
>>> Now my question is this: do we have a simple percentile implementation in
>>> Oak (I didn't find one)?
>>>
>>
>> I'm not aware of a percentile implementation in Oak.
>>
>> If not, would you recommend writing my own or adapting/extracting an
>>> existing one in a utility class?
>>>
>>
>> In the past we copied and pasted source code from other projects in
>> Oak. As long as the license allows it and proper attribution is given,
>> it shouldn't be a problem. That said, I'm not a big fan of either
>> rewriting an implementation from scratch or copying and pasting source
>> code from other projects. Is exposing a percentile really necessary?
>> If yes, how big of a problem is embedding of commons-math3?
>>
>>
> We should avoid copy paste as we might miss important fixes in later
> releases. I only did this once for some code where we needed a fix that
> wasn't yet released. It was a hassle.
> I would just add a dependency to commons-math3. Its a library exposing the
> functionality we require, so let's use it.
>
> Michael
>


Re: Percentile implementation

2017-07-04 Thread Andrei Dulceanu
Hi Francesco,

Is exposing a percentile really necessary?
>

To give you some background, I'm talking about OAK-4732 [2]. I don't know
if we can achieve the same result without the percentile.


> If yes, how big of a problem is embedding of commons-math3?
>

2.1M commons-math3-3.6.1.jar

I'd say it's too much to add it as a dependency to oak-segment-tar.

[2] https://issues.apache.org/jira/browse/OAK-4732

Thanks,
Andrei


Re: Percentile implementation

2017-07-04 Thread Michael Dürig



On 04.07.17 11:15, Francesco Mari wrote:

2017-07-04 10:52 GMT+02:00 Andrei Dulceanu :

Now my question is this: do we have a simple percentile implementation in
Oak (I didn't find one)?


I'm not aware of a percentile implementation in Oak.


If not, would you recommend writing my own or adapting/extracting an
existing one in a utility class?


In the past we copied and pasted source code from other projects in
Oak. As long as the license allows it and proper attribution is given,
it shouldn't be a problem. That said, I'm not a big fan of either
rewriting an implementation from scratch or copying and pasting source
code from other projects. Is exposing a percentile really necessary?
If yes, how big of a problem is embedding of commons-math3?



We should avoid copy paste as we might miss important fixes in later 
releases. I only did this once for some code where we needed a fix that 
wasn't yet released. It was a hassle.
I would just add a dependency to commons-math3. Its a library exposing 
the functionality we require, so let's use it.


Michael


Re: Blobstore consistency check

2017-07-04 Thread Andrei Kalfas
Awesome - thank you !
Andrei

> On Jul 4, 2017, at 10:59 AM, Andrei Dulceanu  
> wrote:
> 
> Hi Andrei,
> 
> 
>> If indexes are not part of backup/restore, when and how is the index
>> recreated?
>> 
> 
> Citing an old answer from Thomas: "The disadvantage is startup (after a
> restore) is slightly slower, but not drastically (the index does not need
> to be re-built, it just has to be extracted again)."
> 
> Regards,
> Andrei
> 
> 2017-07-04 10:10 GMT+03:00 Andrei Kalfas :
> 
>> Hi,
>> 
>>> I tried something similar a while ago and I used s3 bucket versioning
>> [1].
>>> It allows you to do point in time restores without forcing the "prevent
>>> deletion policy". Is it suitable for your use case?
>> 
>> I wish that this would have been on Microsofts Azure Storage feature list,
>> its not, yet.
>> 
>>> 
>>> No matter what approach you use for backup/restore, the backup of the
>>> segment store should come first (before datastore) to avoid any
>>> inconsistencies with binaries referenced in the segment store. It would
>>> probably be good if the backup doesn't contain the index data to avoid
>>> possible corruptions.
>> 
>> If indexes are not part of backup/restore, when and how is the index
>> recreated?
>> 
>> Thanks,
>> Andrei
>> 
>> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Lots of debug output in oak-remote test

2017-07-04 Thread Chetan Mehrotra
Fixed that with OAK-6417
Chetan Mehrotra


On Tue, Jul 4, 2017 at 3:04 PM, Chetan Mehrotra
 wrote:
> While running below in oak-remote module I am seeing lots of debug
> output on console.
>
> mvn clean install -PintegrationTesting
>
> Is anyone else also observing that? Not sure why all debug logs are
> getting enabled
>
> Chetan Mehrotra


Lots of debug output in oak-remote test

2017-07-04 Thread Chetan Mehrotra
While running below in oak-remote module I am seeing lots of debug
output on console.

mvn clean install -PintegrationTesting

Is anyone else also observing that? Not sure why all debug logs are
getting enabled

Chetan Mehrotra


BUILD FAILURE: Jackrabbit Oak - Build # 500 - Still Failing

2017-07-04 Thread Apache Jenkins Server
The Apache Jenkins build system has built Jackrabbit Oak (build #500)

Status: Still Failing

Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/500/ to 
view the results.

Changes:
[chetanm] OAK-6414 - Use Tika config to determine non indexed mimeTypes

Reverting changes in 1800726, 1800727

 

Test results:
1 tests failed.
FAILED:  
org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.validateMigration[Copy
 references, no blobstores defined, document -> segment-tar]

Error Message:
Failed to copy content

Stack Trace:
javax.jcr.RepositoryException: Failed to copy content
at 
org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183)
Caused by: java.lang.IllegalStateException: Branch with failed reset
at 
org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183)
Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakOak0100: 
Branch reset failed
at 
org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183)
Caused by: org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: 
Empty branch cannot be reset
at 
org.apache.jackrabbit.oak.upgrade.cli.blob.CopyBinariesTest.prepare(CopyBinariesTest.java:183)

Re: Percentile implementation

2017-07-04 Thread Chetan Mehrotra
Oak has an optional dependency on dropwizard metric library which has
a Histogram implementation [1] which can be used.

So possibly we can use that. So far its optional and hidden behind the
statistics api. Recently I also had to make use of that for rate
estimation in indexing so used it in a way that Metrics based logic
gets used if avialable otherwise fallback to a simple mean based
implementation [2].

May be we can make it a required dependency?

Chetan Mehrotra
[1] 
http://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Histogram.html
[2] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/progress/MetricRateEstimator.java

On Tue, Jul 4, 2017 at 2:22 PM, Andrei Dulceanu
 wrote:
> Hi all,
>
> I was working on an issue for which I needed to use *only* a percentile for
> some recorded stats. Initially I used SynchronizedDescriptiveStatistics [0]
> from commons-math3 [1], but then I thought that adding this dependency
> would be too much for such a simple use case.
>
> Now my question is this: do we have a simple percentile implementation in
> Oak (I didn't find one)? If not, would you recommend writing my own or
> adapting/extracting an existing one in a utility class?
>
> Regards,
> Andrei
>
> [0]
> http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/descriptive/SynchronizedDescriptiveStatistics.html
> [1] http://commons.apache.org/proper/commons-math/


Re: Percentile implementation

2017-07-04 Thread Francesco Mari
2017-07-04 10:52 GMT+02:00 Andrei Dulceanu :
> Now my question is this: do we have a simple percentile implementation in
> Oak (I didn't find one)?

I'm not aware of a percentile implementation in Oak.

> If not, would you recommend writing my own or adapting/extracting an
> existing one in a utility class?

In the past we copied and pasted source code from other projects in
Oak. As long as the license allows it and proper attribution is given,
it shouldn't be a problem. That said, I'm not a big fan of either
rewriting an implementation from scratch or copying and pasting source
code from other projects. Is exposing a percentile really necessary?
If yes, how big of a problem is embedding of commons-math3?


Percentile implementation

2017-07-04 Thread Andrei Dulceanu
Hi all,

I was working on an issue for which I needed to use *only* a percentile for
some recorded stats. Initially I used SynchronizedDescriptiveStatistics [0]
from commons-math3 [1], but then I thought that adding this dependency
would be too much for such a simple use case.

Now my question is this: do we have a simple percentile implementation in
Oak (I didn't find one)? If not, would you recommend writing my own or
adapting/extracting an existing one in a utility class?

Regards,
Andrei

[0]
http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/descriptive/SynchronizedDescriptiveStatistics.html
[1] http://commons.apache.org/proper/commons-math/


Re: Trunk doesn't compile

2017-07-04 Thread Chetan Mehrotra
Reverted the commit with 1800742. The build should now pass
Chetan Mehrotra


On Tue, Jul 4, 2017 at 1:51 PM, Francesco Mari  wrote:
> Thanks for taking care of this.
>
> 2017-07-04 10:17 GMT+02:00 Chetan Mehrotra :
>
>> My fault. Looks like code used API from Tika 1.15 which I am yet
>> testing. Would fix it now
>> Chetan Mehrotra
>>
>>
>> On Tue, Jul 4, 2017 at 1:34 PM, Francesco Mari 
>> wrote:
>> > When compiling trunk (r1800739) I get the following error in the
>> oak-lucene
>> > module.
>> >
>> > [ERROR] Failed to execute goal
>> > org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile
>> > (default-compile) on project oak-lucene: Compilation failure
>> > [ERROR]
>> > /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/
>> apache/jackrabbit/oak/plugins/index/lucene/binary/
>> TikaParserConfig.java:[94,34]
>> > cannot find symbol
>> > [ERROR]   symbol:   method getDocumentBuilder()
>> > [ERROR]   location: class org.apache.tika.parser.ParseContext
>> >
>> > Can someone have a look at it?
>>


Re: Trunk doesn't compile

2017-07-04 Thread Francesco Mari
Thanks for taking care of this.

2017-07-04 10:17 GMT+02:00 Chetan Mehrotra :

> My fault. Looks like code used API from Tika 1.15 which I am yet
> testing. Would fix it now
> Chetan Mehrotra
>
>
> On Tue, Jul 4, 2017 at 1:34 PM, Francesco Mari 
> wrote:
> > When compiling trunk (r1800739) I get the following error in the
> oak-lucene
> > module.
> >
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile
> > (default-compile) on project oak-lucene: Compilation failure
> > [ERROR]
> > /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/
> apache/jackrabbit/oak/plugins/index/lucene/binary/
> TikaParserConfig.java:[94,34]
> > cannot find symbol
> > [ERROR]   symbol:   method getDocumentBuilder()
> > [ERROR]   location: class org.apache.tika.parser.ParseContext
> >
> > Can someone have a look at it?
>


Re: Trunk doesn't compile

2017-07-04 Thread Chetan Mehrotra
My fault. Looks like code used API from Tika 1.15 which I am yet
testing. Would fix it now
Chetan Mehrotra


On Tue, Jul 4, 2017 at 1:34 PM, Francesco Mari  wrote:
> When compiling trunk (r1800739) I get the following error in the oak-lucene
> module.
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile
> (default-compile) on project oak-lucene: Compilation failure
> [ERROR]
> /Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/binary/TikaParserConfig.java:[94,34]
> cannot find symbol
> [ERROR]   symbol:   method getDocumentBuilder()
> [ERROR]   location: class org.apache.tika.parser.ParseContext
>
> Can someone have a look at it?


Trunk doesn't compile

2017-07-04 Thread Francesco Mari
When compiling trunk (r1800739) I get the following error in the oak-lucene
module.

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile
(default-compile) on project oak-lucene: Compilation failure
[ERROR]
/Users/mari/src/svn/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/binary/TikaParserConfig.java:[94,34]
cannot find symbol
[ERROR]   symbol:   method getDocumentBuilder()
[ERROR]   location: class org.apache.tika.parser.ParseContext

Can someone have a look at it?


Re: Blobstore consistency check

2017-07-04 Thread Andrei Dulceanu
Hi Andrei,


> If indexes are not part of backup/restore, when and how is the index
> recreated?
>

Citing an old answer from Thomas: "The disadvantage is startup (after a
restore) is slightly slower, but not drastically (the index does not need
to be re-built, it just has to be extracted again)."

Regards,
Andrei

2017-07-04 10:10 GMT+03:00 Andrei Kalfas :

> Hi,
>
> > I tried something similar a while ago and I used s3 bucket versioning
> [1].
> > It allows you to do point in time restores without forcing the "prevent
> > deletion policy". Is it suitable for your use case?
>
> I wish that this would have been on Microsofts Azure Storage feature list,
> its not, yet.
>
> >
> > No matter what approach you use for backup/restore, the backup of the
> > segment store should come first (before datastore) to avoid any
> > inconsistencies with binaries referenced in the segment store. It would
> > probably be good if the backup doesn't contain the index data to avoid
> > possible corruptions.
>
> If indexes are not part of backup/restore, when and how is the index
> recreated?
>
> Thanks,
> Andrei
>
>


Re: [VOTE] Release Apache Jackrabbit Oak 1.7.3

2017-07-04 Thread Andrei Dulceanu
[X] +1 Release this package as Apache Jackrabbit Oak 1.7.3

2017-07-03 18:48 GMT+03:00 Julian Reschke :

> On 2017-07-03 15:52, Davide Giannella wrote:
>
>> ...
>>
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.7.3
>
>
> Best regards, Julian
>


Re: Blobstore consistency check

2017-07-04 Thread Andrei Kalfas
Hi,

> I tried something similar a while ago and I used s3 bucket versioning [1].
> It allows you to do point in time restores without forcing the "prevent
> deletion policy". Is it suitable for your use case?

I wish that this would have been on Microsofts Azure Storage feature list, its 
not, yet.

> 
> No matter what approach you use for backup/restore, the backup of the
> segment store should come first (before datastore) to avoid any
> inconsistencies with binaries referenced in the segment store. It would
> probably be good if the backup doesn't contain the index data to avoid
> possible corruptions.

If indexes are not part of backup/restore, when and how is the index recreated?

Thanks,
Andrei



smime.p7s
Description: S/MIME cryptographic signature


Re: Blobstore consistency check

2017-07-04 Thread Andrei Dulceanu
Hi Andrei,

Now, with large repos that spill over in aws s3 or azure storage it not
> practical to backup terabytes of data each day for various reasons (costs
> and time to backup/restore amount them), and I was fiddling with the idea
> of setting up the datastore in such way that deletion is prevented - I mean
> deletion from the s3 bucket/azure storage container.


I tried something similar a while ago and I used s3 bucket versioning [1].
It allows you to do point in time restores without forcing the "prevent
deletion policy". Is it suitable for your use case?


> This way backup would be a matter of backing up the segment store - thats
> a matter of a couple of gigs of data, and restore would be pretty much the
> same.


No matter what approach you use for backup/restore, the backup of the
segment store should come first (before datastore) to avoid any
inconsistencies with binaries referenced in the segment store. It would
probably be good if the backup doesn't contain the index data to avoid
possible corruptions.

HTH,
Andrei

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html

2017-07-03 17:47 GMT+03:00 Andrei Kalfas :

> Hi Andrei,
>
> Ok, here is the context, apologize for not starting with that first.
>
> I’m working on a PTR (point in time restore) proof of concept involving
> large asset repositories. The easy way for small-ish repos would be to
> backup everyday everything and when the customer pops by and says “I want
> everything back to day YYY” just restore the file systems from backups and
> thats it. Now, with large repos that spill over in aws s3 or azure storage
> it not practical to backup terabytes of data each day for various reasons
> (costs and time to backup/restore amount them), and I was fiddling with the
> idea of setting up the datastore in such way that deletion is prevented - I
> mean deletion from the s3 bucket/azure storage container. This way backup
> would be a matter of backing up the segment store - thats a matter of a
> couple of gigs of data, and restore would be pretty much the same. The
> problem is that not allowing to delete things from s3 bucket/azure storage
> container is that one will pack a lot of garbage over the time, so I’ll
> need a way to figure out whats garbage so that I can move that data into
> cheeper storages and eventually delete it completely. This is why I was
> fishing for a easy way to get the inverted list that I mentioned bellow.
>
> Thanks,
> Andrei
>
>
> > On Jul 3, 2017, at 4:47 PM, Andrei Dulceanu 
> wrote:
> >
> > Hi Andrei,
> >
> > AFAIK, there isn't currently such an option for the consistency check.
> What
> > scenario do you have in mind for using it?
> >
> > Regards,
> > Andrei
> >
> > 2017-07-03 16:32 GMT+03:00 Andrei Kalfas :
> >
> >> Hi,
> >>
> >> I’m reading about the consistency check tool thats available via oak-run
> >> and if I got it right its gonna report missing blobs that are
> referenced.
> >> Is there a way to get the inverted list, i.e. things that are in the
> >> datastore but not referenced from the segment store.
> >>
> >> Thank you,
> >> Andrei
> >>
> >>
>
>


BUILD FAILURE: Jackrabbit Oak - Build # 499 - Failure

2017-07-04 Thread Apache Jenkins Server
The Apache Jenkins build system has built Jackrabbit Oak (build #499)

Status: Failure

Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/499/ to 
view the results.

Changes:
[chetanm] OAK-6415 - Use dynamic service loader by default

Added a test to check default behaviour which shows that it has not
changed. Minor refactoring done in BinaryTextExtractor but no
functional change done for this issue

[chetanm] OAK-6414 - Use Tika config to determine non indexed mimeTypes

 

Test results:
1 tests failed.
FAILED:  org.apache.jackrabbit.oak.segment.MapRecordTest.testOak1104

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.jackrabbit.oak.segment.MapRecordTest.testOak1104(MapRecordTest.java:93)