Odg: [DISCUSS] Upgrading to Lucene 7.1.0

2021-10-08 Thread Mario Kevo
Hi,

When the servers are in a mixed version state in our tests, we blocked queries 
until the servers are on the versions with the same Lucene version.
If we allow queries while the servers are in a mixed state we got errors like 
IndexFormatTooNewException if the query is executed on the server with an older 
version of Lucene(It cannot read the new Lucene index format).
In the previous mail thread, I got the following:
"The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded. This requires test changes to not try to validate 
using queries (since we prevent draining and repo creation, the query will just 
wait)"

So we changed tests to not do queries until all members are on the same version.

BR,
Mario



Šalje: Dan Smith 
Poslano: 28. rujna 2021. 19:41
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0

My understanding from our previous discussion about upgrading lucene was that 
we talked about pausing the asynchronous indexing process during the rolling 
upgrade. I don't remember a discussion that it was ok to not allow queries 
during the upgrade. But this is what we added to the docs:

"All cluster members must be running the same major Lucene version in order to 
execute Lucene queries."

What happens if a user runs a query during the rolling upgrade and why do we 
need to have this restriction? It seems to me like at a minimum we need to 
allow queries during the upgrade.

We also should consider what will happen to users with server-side query or 
indexing code - will they be able to upgrade or are they likely to hit breaking 
changes in the Lucene API?

-Dan

From: Nabarun Nag 
Sent: Tuesday, September 28, 2021 7:13 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0

But Mario, just for my clarification, if we re-enable the queries in the tests 
in the mixed version servers mode, it goes into a stackoverflow situation. That 
what i saw when i set hasLuceneVersionMismatch(host) to false in the test so 
that it does the query.

Regards
Naba


From: Mario Kevo 
Sent: Tuesday, September 28, 2021 4:49 AM
To: dev@geode.apache.org 
Subject: Odg: [DISCUSS] Upgrading to Lucene 7.1.0

Hi all,

Just a small clarification of the reverted PR.

There were a lot of changes between Lucene versions 6.x and 7.x. There is an 
article for that 
Upgrading+to+Lucene+7.1.0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FUpgrading%2Bto%2BLucene%2B7.1.0data=04%7C01%7Cdasmith%40vmware.com%7C1d4830d3975e4380893508d9828a4707%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637684352682888690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=QMiJUG1HtmOKNCXzn6KrBVM4gYdLoJeV8FjDFMkUL8I%3Dreserved=0>.

The first larger change was in the scoring mechanism. We adapt it to one that 
is correct for us. (verified by DistributedScoringJUnitTest)

The main change was in Lucene index format. There we come into a problem with 
our tests.
Lucene 6.x cannot read the index format of Lucene 7.x.
Through PRs we decided to include Lucene uplift in Geode 1.15.0 and add check 
if all members are on 1.15.0 version or higher (after uplift Lucene to a newer 
version with index format changes this should be changed). If a check is passed 
it will allow doing Lucene query, if not there will be a printed log that not 
all members on 1.15.0 or higher version.

Also, you can found a discussion on dev list from 2 years ago about Lucene 
upgrade: Lucene 
Upgrade<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fmessage%2Fqwooctuz7ekaezor%3Fq%3Dlist%3Aorg.apache.geode.dev%2Border%3Adate-backward%2BLucene%2Bupgrade%26page%3D4%23query%3Alist%253Aorg.apache.geode.dev%2520order%253Adate-backward%2520Lucene%2520upgrade%2Bpage%3A4%2Bmid%3Aygjhsuikdrbuihap%2Bstate%3Aresultsdata=04%7C01%7Cdasmith%40vmware.com%7C1d4830d3975e4380893508d9828a4707%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637684352682898695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ZyKGoWh6nhpWPTiNIJHzLicfjCmW0yoq1fZa9aLngbQ%3Dreserved=0>

BR,
Mario




Šalje: Udo Kohlmeyer 
Poslano: 28. rujna 2021. 1:44
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0

Might I propose something here.

There is currently a significant amount of work going into completing 
Geode-8705, which is the Classloader isolation. We are currently targeting to 
getting this release in Geode 1.16.

My proposal is, that we use the capability that Patrick demo’d at the Community 
meeting (on this topic) where one, at runtime, can unload /  load extensions 
(like our integration with Lucene). This means that one coul

Odg: [DISCUSS] Upgrading to Lucene 7.1.0

2021-09-28 Thread Mario Kevo
Hi all,

Just a small clarification of the reverted PR.

There were a lot of changes between Lucene versions 6.x and 7.x. There is an 
article for that 
Upgrading+to+Lucene+7.1.0.

The first larger change was in the scoring mechanism. We adapt it to one that 
is correct for us. (verified by DistributedScoringJUnitTest)

The main change was in Lucene index format. There we come into a problem with 
our tests.
Lucene 6.x cannot read the index format of Lucene 7.x.
Through PRs we decided to include Lucene uplift in Geode 1.15.0 and add check 
if all members are on 1.15.0 version or higher (after uplift Lucene to a newer 
version with index format changes this should be changed). If a check is passed 
it will allow doing Lucene query, if not there will be a printed log that not 
all members on 1.15.0 or higher version.

Also, you can found a discussion on dev list from 2 years ago about Lucene 
upgrade: Lucene 
Upgrade

BR,
Mario




Šalje: Udo Kohlmeyer 
Poslano: 28. rujna 2021. 1:44
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0

Might I propose something here.

There is currently a significant amount of work going into completing 
Geode-8705, which is the Classloader isolation. We are currently targeting to 
getting this release in Geode 1.16.

My proposal is, that we use the capability that Patrick demo’d at the Community 
meeting (on this topic) where one, at runtime, can unload /  load extensions 
(like our integration with Lucene). This means that one could possibly do a 
rolling upgrade on the existing system, and keep the versions of the Lucene 
integration stable.

Once the whole system has been upgraded, the existing Lucene extension 
component is then unloaded, and the newer version of the extension component is 
then loaded. What this means, is that at runtime, there will be a period of 
time where Lucene queries will not be available and as part of the “load” 
lifecycle of the extension, there needs to be an initialization step, which 
will initialize the extension component safely.

Once initialized, Lucene queries can then become available again, etc.

This if course requires some work around the lifecycles of extension components 
and making sure that I can add the extension on at runtime and safely 
initialize it.

I think this approach allows for a more seamless (lower downtime) upgrading of 
system and extension components.

Thoughts?

--Udo

From: Nabarun Nag 
Date: Tuesday, September 28, 2021 at 7:33 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0
The solution for preventing the query executions to occur in the mixed version 
mode also caused some problems where the query function executions get 
repeatedly executed and that results in stack overflow.



From: Nabarun Nag 
Sent: Monday, September 27, 2021 2:30 PM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0

In simple words,  if Lucene indexes were created by a new version (7.1.0), then 
replicated to others that are still in the older version, they won't understand 
the index, and the event processors start throwing exceptions.

This can be simply seen by just re-enabling the query execution in the DUnit 
tests and commenting out the check blocks: [develop SHA: 
68629356f561a932f5dfbace70b01d9971a42473]

In LuceneEventListener
if (cache.hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) {
  logger.info("Some members are older than " + 
KnownVersion.GEODE_1_15_0.getName());
  return false;
}

In IndexRepositoryFactory:
if (userRegion.getCache() != null
&& userRegion.getCache().hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) 
{
  logger.info("Some members are older than " + 
KnownVersion.GEODE_1_15_0.getName());
  return null;
}


This is the exception that will be encountered:

[Exception]

[vm2_v1.2.0] [warn 2021/09/27 14:24:42.251 PDT  tid=102] An Exception occurred. 
The dispatcher will continue.
[vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index 
repository
[vm2_v1.2.0] at 
org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118)
[vm2_v1.2.0] at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
[vm2_v1.2.0] at 
org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108)
[vm2_v1.2.0] at 
org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137)