Re: minimize the impact when creating a new index (or re-indexing)

2017-06-08 Thread Alvaro Cabrerizo
Thanks Chetan,

Sorry, but that part is out of my reach. There is an IT team in charge of
managing the infrastructure and make optimizations, so It is difficult to
get that information. Basically what is was looking for is the way
to parallelize the indexing process. On the other hand, reducing the
indexing time would be fine (it was previously reduced from 7 to 2 days),
but I think that traversing more than 1 nodes is a pretty tough
operation and I'm not sure if there is much we can do. Anyway, any pointer
related to indexing optimization or any advice on how to design the repo
(e.g. use different paths to isolate different groups of assets, use
different nodetypes to differentiate content type, create different
repositories [is that possible?] for different groups of uses...) is
welcome.

Regards.

On Thu, Jun 8, 2017 at 12:44 PM, Chetan Mehrotra 
wrote:

> On Thu, Jun 8, 2017 at 4:04 PM, Alvaro Cabrerizo 
> wrote:
> > It is a DocumentNodeStore based instance. We don't extract data from
> binary
> > files, just indexing metadata stored on nodes.
>
> In that case 48 hrs is a long time. Can you share some details around
> how many nodes are being indexed as part of that index and the repo
> size in terms of Mongo stats if possible?
>
> Chetan Mehrotra
>


Oak 1.7.2 release plan

2017-06-08 Thread Davide Giannella
Hello team,

As there's the need to test the changes in
https://issues.apache.org/jira/browse/OAK-6321,  I'm planning to cut Oak
tomorrow, 9th of June, after making 1.7.1 public (according to the vote
situation).

If there are any objections please let me know. Otherwise I will
re-schedule any non-resolved issue for the next iteration.

Thanks
Davide




Re: backporting OAK-6317 until 1.2 branch

2017-06-08 Thread Thomas Mueller
+1

On 08.06.17, 11:29, "Tommaso Teofili"  wrote:

Hi all,

I'd like to backport the fix for a bug in LMSEstimator [1] (LMSEstimator is
used by oak-solr-core to estimate the no. of entries in the index without
issuing a query to Solr) until branch 1.2 (as it was observed on a 1.2.x
Oak instance).

Regards,
Tommaso

[1] : https://issues.apache.org/jira/browse/OAK-6317




[ANNOUNCE] Apache Jackrabbit Oak 1.4.16 released

2017-06-08 Thread Davide Giannella
The Apache Jackrabbit community is pleased to announce the release of
Apache Jackrabbit Oak. The release is available for download at:

http://jackrabbit.apache.org/downloads.html

See the full release notes below for details about this release:

Release Notes -- Apache Jackrabbit Oak -- Version 1.4.16

Introduction


Jackrabbit Oak is a scalable, high-performance hierarchical content
repository designed for use as the foundation of modern world-class
web sites and other demanding content applications.

Jackrabbit Oak 1.4.16 is a patch release that contains fixes and
improvements over Oak 1.4. Jackrabbit Oak 1.4.x releases are
considered stable and targeted for production use.

The Oak effort is a part of the Apache Jackrabbit project.
Apache Jackrabbit is a project of the Apache Software Foundation.

Changes in Oak 1.4.16
-

Technical task

[OAK-5652] - RDB*Store: update Oracle JDBC driver reference to
12.1.0.2.0
[OAK-5667] - RDBDocumentStore: remove support for DBs without
support for CASE statements in SELECT
[OAK-6134] - RDB*Store: update PostgreSQL JDBC
[OAK-6143] - RDB*store fixtures: shorten table name prefixes for
Oracle
[OAK-6226] - RDBDocumentStoreDB: missing @Override statements
[OAK-6244] - RDB*Store: update postgresql JDBC driver reference to
42.1.1
[OAK-6247] - RDB*Store: update Tomcat JDBC pool dependency to
7.0.78

Bug

[OAK-4390] - DocumentStoreStatsIT.update fails when RDB's append
mode is disabled
[OAK-5612] - Test failure:

org.apache.jackrabbit.oak.run.osgi.DocumentNodeStoreConfigTest.testRDBDocumentStoreRestart
[OAK-5651] - java.lang.IllegalStateException logged when migrating
Segment to Document
[OAK-5920] - Checkpoint migration will fail if the
MissingBlobStore is used
[OAK-5993] - Utils.isIdFromLongPath() may throw
StringIndexOutOfBoundsException
[OAK-6057] - incorrect system property check in blob/upgrade tests
[OAK-6086] - Incorrect usage of RDBDocumentStore.unwrap()
[OAK-6229] - NPE when running datastorecheck command with S3
[OAK-6233] - Typed properties not handled properly in the
initialization of DataStore in oak-run
[OAK-6266] - SolrQueryIndexProviderService should always have
NodeAggregator

Improvement

[OAK-4771] - Clarify exceptions in DocumentStore
[OAK-4863] - Reduce query batch size for deleted documents
[OAK-5666] - oak-upgrade should validate the paths
[OAK-5886] - Confusing log message from lease update
[OAK-6003] - Allow to migrate checkpoints for all type of
sidegrades
[OAK-6131] - No need to rebuild the counter/uuid index anymore
[OAK-6223] - Expose socket keep-alive option

New Feature

[OAK-5741] - DocumentStore UpdateOp: support removal of properties

Task

[OAK-5945] - update h2db dependency
[OAK-5997] - Update Oak 1.2 and 1.4 to Jackrabbit 2.12.7
[OAK-6159] - BlobReferenceIterator: improve test coverage for RDB

In addition to the above-mentioned changes, this release contains
all changes included up to the Apache Jackrabbit Oak 1.4.x release.

For more detailed information about all the changes in this and other
Oak releases, please see the Oak issue tracker at

  https://issues.apache.org/jira/browse/OAK

Release Contents


This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.md file for instructions on how to build this release.

The source archive is accompanied by SHA1 and MD5 checksums and a PGP
signature that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
http://www.apache.org/dist/jackrabbit/KEYS.

About Apache Jackrabbit Oak
---

Jackrabbit Oak is a scalable, high-performance hierarchical content
repository designed for use as the foundation of modern world-class
web sites and other demanding content applications.

The Oak effort is a part of the Apache Jackrabbit project. 
Apache Jackrabbit is a project of the Apache Software Foundation.

For more information, visit http://jackrabbit.apache.org/oak

About The Apache Software Foundation


Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 140 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 3,800+ contributors.

For more information, visit http://www.apache.org/



Re: minimize the impact when creating a new index (or re-indexing)

2017-06-08 Thread Chetan Mehrotra
On Thu, Jun 8, 2017 at 4:04 PM, Alvaro Cabrerizo  wrote:
> It is a DocumentNodeStore based instance. We don't extract data from binary
> files, just indexing metadata stored on nodes.

In that case 48 hrs is a long time. Can you share some details around
how many nodes are being indexed as part of that index and the repo
size in terms of Mongo stats if possible?

Chetan Mehrotra


[RESULT][VOTE] Release Apache Jackrabbit Oak 1.4.16

2017-06-08 Thread Davide Giannella
Hello Team,

the vote passes as follows:

+1 Julian Reschke
+1 Amit Jain
+1 Davide Giannella

Thanks for voting. I'll push the release out.

-- Davide



Re: [VOTE] Release Apache Jackrabbit Oak 1.7.1

2017-06-08 Thread Davide Giannella
[X] +1 Release this package as Apache Jackrabbit Oak 1.7.1

D.


Re: minimize the impact when creating a new index (or re-indexing)

2017-06-08 Thread Alvaro Cabrerizo
Hello,

It is a DocumentNodeStore based instance. We don't extract data from binary
files, just indexing metadata stored on nodes.

Regards.

On Wed, Jun 7, 2017 at 7:04 AM, Chetan Mehrotra 
wrote:

> > I'm not sure how to minimize the impact of performing a re-index (or new
> > index creation), that will take 48 hours (using oak 1.4). I mean, I don't
> > want to block other indexes update.
>
> Is this a SegmentNodeStore based setup or DocumentNodeStore based?
>
> The reindexing log would have some stats around time spent in indexing
> and time spent in text extraction. Can you check whats the part which
> takes most time. If its text extraction then you can reduce the time
> spent in that via using Pre-Extraction support [1]. This allow
> extracting text before hand and then using that at time of actual
> indexing
>
> Changing the "indexing lane" should help but is tricky to get right
> and something we are improving currently OAK-6246 and OAK-5553
>
> > indexes won't be updated. On the other hand, it seems that using the
> > *reindex-async* flag (see OAK-1456
> > ) could do the trick. I
>
> This mode is useful for property index as in the end it removes the
> async flag and makes the index synchronous which would cause issues
> for lucene based index
>
> Chetan Mehrotra
> [1] https://jackrabbit.apache.org/oak/docs/query/lucene.html#
> text-extraction
>
>
> On Tue, Jun 6, 2017 at 9:02 PM, Alvaro Cabrerizo 
> wrote:
> > Hello,
> >
> > I'm not sure how to minimize the impact of performing a re-index (or new
> > index creation), that will take 48 hours (using oak 1.4). I mean, I don't
> > want to block other indexes update.
> >
> > First, we have set the value of async as *fulltext-async* for the new
> > index. I guess, that at least, all the indexes managed by the *async*
> lane
> >  >
> > will not be affected (please, confirm if I'm right). Then we try to
> > minimize the impact on the fulltext-async lane. According to OAK-5553
> >  there isn't much we
> can do
> > while the indexing process is active for the new index, as the rest of
> > indexes won't be updated. On the other hand, it seems that using the
> > *reindex-async* flag (see OAK-1456
> > ) could do the trick. I
> > mean, setting reindex-async=true to the new index will allow other
> indexes
> > (in the same lane) being updated while it is being populated? If that is
> > true, we could create the index with that flag and then remove it.
> >
> > Regards.
>


backporting OAK-6317 until 1.2 branch

2017-06-08 Thread Tommaso Teofili
Hi all,

I'd like to backport the fix for a bug in LMSEstimator [1] (LMSEstimator is
used by oak-solr-core to estimate the no. of entries in the index without
issuing a query to Solr) until branch 1.2 (as it was observed on a 1.2.x
Oak instance).

Regards,
Tommaso

[1] : https://issues.apache.org/jira/browse/OAK-6317


Re: [VOTE] Release Apache Jackrabbit Oak 1.7.1

2017-06-08 Thread Amit Jain
On Tue, Jun 6, 2017 at 5:57 PM, Davide Giannella  wrote:

>
> Please vote on releasing this package as Apache Jackrabbit Oak 1.7.1.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.


+1 Release this package as Apache Jackrabbit Oak 1.7.1

Thanks
Amit