subject:"\"Odg\\\: Odg\\\: Lucene upgrade\""

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

2020-01-07 Thread Mario Kevo

Hi all,

Please could someone review #4395<https://github.com/apache/geode/pull/4395>.

BR,
Mario

Šalje: Mario Kevo 
Poslano: 17. prosinca 2019. 14:30
Prima: Jason Huynh 
Kopija: geode 
Predmet: Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Jason,

Nice catch! I tried with larger number of retries(with your changes) and it 
passed.
I will try to make it time based.

Thanks for a help!

BR,
Mario

Šalje: Jason Huynh 
Poslano: 13. prosinca 2019. 23:10
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a bit 
off ( it expected reindex features to be complete by a certain release).  I 
have a PR on develop to adjust that calculation 
(https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data 
already in it) is enabled - any query will now throw the 
LuceneIndexingInProgressException instead of possibly waiting a very long time 
to receive a query result.  The tests themselves are coded to retry 10 times, 
knowing it will take awhile to reindex.  If you bump this number up or, better 
yet, make it time based (awaitility, etc), it should get you past this problem 
(once the pull request gets checked in and pulled into your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo  wrote:
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time 
or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing 
queries. Also tried to wait until isIndexingInProgress return false, but 
reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario


Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which 
one?
If it's a rolling upgrade test, then we might have to mark this as expected 
behavior and modify the tests to waitForFlush (wait until the queue is 
drained).  As long as the test is able to roll all the servers and not get 
stuck waiting for a queue to flush (which will only happen once all the servers 
are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, 
then we'd probably have to modify the test to not do the query in the middle or 
expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wro

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

2019-12-17 Thread Mario Kevo

Hi Jason,

Nice catch! I tried with larger number of retries(with your changes) and it 
passed.
I will try to make it time based.

Thanks for a help!

BR,
Mario

Šalje: Jason Huynh 
Poslano: 13. prosinca 2019. 23:10
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a bit 
off ( it expected reindex features to be complete by a certain release).  I 
have a PR on develop to adjust that calculation 
(https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data 
already in it) is enabled - any query will now throw the 
LuceneIndexingInProgressException instead of possibly waiting a very long time 
to receive a query result.  The tests themselves are coded to retry 10 times, 
knowing it will take awhile to reindex.  If you bump this number up or, better 
yet, make it time based (awaitility, etc), it should get you past this problem 
(once the pull request gets checked in and pulled into your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo  wrote:
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time 
or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing 
queries. Also tried to wait until isIndexingInProgress return false, but 
reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario


Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which 
one?
If it's a rolling upgrade test, then we might have to mark this as expected 
behavior and modify the tests to waitForFlush (wait until the queue is 
drained).  As long as the test is able to roll all the servers and not get 
stuck waiting for a queue to flush (which will only happen once all the servers 
are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, 
then we'd probably have to modify the test to not do the query in the middle or 
expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (need

Re: Odg: Odg: Odg: Odg: Lucene upgrade

2019-12-13 Thread Jason Huynh

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a
bit off ( it expected reindex features to be complete by a certain
release).  I have a PR on develop to adjust that calculation (
https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data
already in it) is enabled - any query will now throw the
LuceneIndexingInProgressException instead of possibly waiting a very long
time to receive a query result.  The tests themselves are coded to retry 10
times, knowing it will take awhile to reindex.  If you bump this number up
or, better yet, make it time based (awaitility, etc), it should get you
past this problem (once the pull request gets checked in and pulled into
your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo  wrote:

> Hi Jason,
>
> Yes, the same tests failed:
>
> RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled
>
> RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion
>
> Sometimes this tests passed but more times it failed.
> As I said when change tests to put lower number of entries it passed
> every time or set to wait for repo in LuceneQueryFunction.java.
>
> *waitUntilFlushed* is called by *verifyLuceneQueryResults* before
> executing queries. Also tried to wait until *isIndexingInProgress* return
> false, but reached timeout and failed.
> In tests it tried to execute a query after all members are rolled.
>
> BR,
> Mario
>
> --
> *Šalje:* Jason Huynh 
> *Poslano:* 11. prosinca 2019. 23:08
> *Prima:* Mario Kevo 
> *Kopija:* geode 
> *Predmet:* Re: Odg: Odg: Odg: Lucene upgrade
>
> Hi Mario,
>
> Is the same test failing?  If it's a different test, could you tell us
> which one?
> If it's a rolling upgrade test, then we might have to mark this as
> expected behavior and modify the tests to waitForFlush (wait until the
> queue is drained).  As long as the test is able to roll all the servers and
> not get stuck waiting for a queue to flush (which will only happen once all
> the servers are rolled now).
>
> If the test hasn't rolled all the servers and is trying to execute a
> query, then we'd probably have to modify the test to not do the query in
> the middle or expect that exception to occur.
>
> Thanks,
> -Jason
>
> On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
>
> Hi Jason,
>
> This change fix IndexFormatTooNewException, but now we have
>
>  org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
> available, currently indexing
>
>
> So this means that query doesn't wait until all indexes are created.
> In *LuceneQueryFunction.java* it is set to not wait for repo 
> [*execute(context,
> false)*]. If we have a bigger queue(like in the test) it will failed as
> it will not wait until indexes are created. I also tried to put just few
> objects and it passed as it had enough time to create indexes.
> Do we need to change this part to wait for repo, or put a lower number of
> entries in tests?
>
> BR,
> Mario
>
>
>
> --
> *Šalje:* Jason Huynh 
> *Poslano:* 6. prosinca 2019. 20:53
> *Prima:* Mario Kevo 
> *Kopija:* geode 
> *Predmet:* Re: Odg: Odg: Lucene upgrade
>
> Hi Mario,
>
> I made a PR against your branch for some of the changes I had to do to get
> past the Index too new exception.  Summary - repo creation, even if no
> writes occur, appear to create some meta data that the old node attempts to
> read and blow up on.
>
> The pr against your branch just prevents the repo from being constructed
> until all old members are upgraded.
> This requires test changes to not try to validate using queries (since we
> prevent draining and repo creation, the query will just wait)
>
> The reason why you probably were seeing unsuccessful dispatches, is
> because we kind of intended for that with the oldMember check.  In-between
> the server rolls, the test was trying to verify, but because not all
> servers had upgraded, the LuceneEventListener wasn't allowing the queue to
> drain on the new member.
>
> I am not sure if the changes I added are acceptable or not -maybe if this
> ends up working then we can discuss on the dev list.
>
> There will probably be other "gotcha's" along the way...
>
>
> On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
>
> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
> bet

Odg: Odg: Odg: Odg: Lucene upgrade

2019-12-12 Thread Mario Kevo

Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time 
or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing 
queries. Also tried to wait until isIndexingInProgress return false, but 
reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario


Šalje: Jason Huynh 
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which 
one?
If it's a rolling upgrade test, then we might have to mark this as expected 
behavior and modify the tests to waitForFlush (wait until the queue is 
drained).  As long as the test is able to roll all the servers and not get 
stuck waiting for a queue to flush (which will only happen once all the servers 
are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, 
then we'd probably have to modify the test to not do the query in the middle or 
expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario




Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why s

Re: Odg: Odg: Odg: Lucene upgrade

2019-12-11 Thread Jason Huynh

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us
which one?
If it's a rolling upgrade test, then we might have to mark this as expected
behavior and modify the tests to waitForFlush (wait until the queue is
drained).  As long as the test is able to roll all the servers and not get
stuck waiting for a queue to flush (which will only happen once all the
servers are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query,
then we'd probably have to modify the test to not do the query in the
middle or expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo  wrote:

> Hi Jason,
>
> This change fix IndexFormatTooNewException, but now we have
>
>  org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
> available, currently indexing
>
>
> So this means that query doesn't wait until all indexes are created.
> In * LuceneQueryFunction.java* it is set to not wait for repo 
> [*execute(context,
> false)*]. If we have a bigger queue(like in the test) it will failed as
> it will not wait until indexes are created. I also tried to put just few
> objects and it passed as it had enough time to create indexes.
> Do we need to change this part to wait for repo, or put a lower number of
> entries in tests?
>
> BR,
> Mario
>
>
>
> --
> *Šalje:* Jason Huynh 
> *Poslano:* 6. prosinca 2019. 20:53
> *Prima:* Mario Kevo 
> *Kopija:* geode 
> *Predmet:* Re: Odg: Odg: Lucene upgrade
>
> Hi Mario,
>
> I made a PR against your branch for some of the changes I had to do to get
> past the Index too new exception.  Summary - repo creation, even if no
> writes occur, appear to create some meta data that the old node attempts to
> read and blow up on.
>
> The pr against your branch just prevents the repo from being constructed
> until all old members are upgraded.
> This requires test changes to not try to validate using queries (since we
> prevent draining and repo creation, the query will just wait)
>
> The reason why you probably were seeing unsuccessful dispatches, is
> because we kind of intended for that with the oldMember check.  In-between
> the server rolls, the test was trying to verify, but because not all
> servers had upgraded, the LuceneEventListener wasn't allowing the queue to
> drain on the new member.
>
> I am not sure if the changes I added are acceptable or not -maybe if this
> ends up working then we can discuss on the dev list.
>
> There will probably be other "gotcha's" along the way...
>
>
> On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
>
> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
> between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 
> 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
> expectedRegionSize, 5,15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 
> entries). The problem is while executing verifyLuceneQueryResults, for 
> VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET  GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET  GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
> dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> --
> *Šalje:* Jason Huynh 
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode 
> *Predmet:* Re: Odg: Lu

Odg: Odg: Odg: Lucene upgrade

2019-12-11 Thread Mario Kevo

Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not 
available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, 
false)]. If we have a bigger queue(like in the test) it will failed as it will 
not wait until indexes are created. I also tried to put just few objects and it 
passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of 
entries in tests?

BR,
Mario

Šalje: Jason Huynh 
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo 
Kopija: geode 
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past 
the Index too new exception.  Summary - repo creation, even if no writes occur, 
appear to create some meta data that the old node attempts to read and blow up 
on.

The pr against your branch just prevents the repo from being constructed until 
all old members are upgraded.
This requires test changes to not try to validate using queries (since we 
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we 
kind of intended for that with the oldMember check.  In-between the server 
rolls, the test was trying to verify, but because not all servers had upgraded, 
the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends 
up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...

On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario

Šalje: Jason Huynh mailto:jhu...@pivotal.io>>
Poslano: 2. prosinca 2019. 18:32
Prima: geode mailto:dev@geode.apache.org>>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.

On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgr

Re: Odg: Odg: Lucene upgrade

2019-12-06 Thread Jason Huynh

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get
past the Index too new exception.  Summary - repo creation, even if no
writes occur, appear to create some meta data that the old node attempts to
read and blow up on.

The pr against your branch just prevents the repo from being constructed
until all old members are upgraded.
This requires test changes to not try to validate using queries (since we
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because
we kind of intended for that with the oldMember check.  In-between the
server rolls, the test was trying to verify, but because not all servers
had upgraded, the LuceneEventListener wasn't allowing the queue to drain on
the new member.

I am not sure if the changes I added are acceptable or not -maybe if this
ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo  wrote:

> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
> between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 
> 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
> expectedRegionSize, 5,15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 
> entries). The problem is while executing verifyLuceneQueryResults, for 
> VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET  GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET  GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
> dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> --
> *Šalje:* Jason Huynh 
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode 
> *Predmet:* Re: Odg: Lucene upgrade
>
> Hi Mario,
>
> Sorry I reread the original email and see that the exception points to a
> different problem.. I think your fix addresses an old version seeing an
> unknown new lucene format, which looks good.  The following exception looks
> like it's the new lucene library not being able to read the older files
> (Just a guess from the message)...
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>
> The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
> incorrect (stating needs to be release 6.0 and later) or if it requires an
> intermediate upgrade between 6.6.2 -> 7.x -> 8.
>
>
>
>
>
> On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:
>
> >
> > I started with implementation of Option-1.
> > As I understood the idea is to block all puts(put them in the queue)
> until
> > all members are upgraded. After that it will process all queued events.
> >
> > I tried with Dan's proposal to check on start of
> > LuceneEventListener.process() if all members are upgraded, also changed
> > test to verify lucene indexes only after all members are upgraded, but
> got
> > the same error with incompatibilities between lucene versions.
> > Changes are visible on https://github.com/apache/geode/pull/4198.
> >
> > Please add comments and suggestions.
> >
> > BR,
> > Mario
> >
> >
> > 
> > Šalje: Xiaojian Zhou 
> > Poslano: 7. studenog 2019. 18:27
> > Prima: geode 
> > Predmet: Re: Lucene upgrade
> >
> > Oh, I misunderstood option-1 and option-2. What I vote is Jason's
> option-1.
> >
> > On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:
> >
> > > Gester, I don't think we need to write in the old format, we just need
> > the
> > > new format not to be written while old members can potentially read the

Odg: Odg: Lucene upgrade

2019-12-06 Thread Mario Kevo

Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
between 4 and 6)

It looks like the fix is not good.

What I see (from 
RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java)
 is when it doing upgrade of a locator it will shutdown and started on the 
newer version. The problem is that server2 become a lead and cannot read lucene 
index on the newer version(Lucene index format has changed between 6 and 7 
versions).

Another problem is after the rolling upgrade of locator and server1 when 
verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
expectedRegionSize, 5,
15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 
entries). The problem is while executing verifyLuceneQueryResults, for 
VM1(server2) it has 13 entries and assertion failed.
>From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET  tid=0x42] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET  tid=0x46] During normal 
processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario

Šalje: Jason Huynh 
Poslano: 2. prosinca 2019. 18:32
Prima: geode 
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.

On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo  wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> 
> Šalje: Xiaojian Zhou 
> Poslano: 7. studenog 2019. 18:27
> Prima: geode 
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh  wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou  wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Re: Odg: Odg: Odg: Odg: Lucene upgrade

Odg: Odg: Odg: Odg: Lucene upgrade

Re: Odg: Odg: Odg: Lucene upgrade

Odg: Odg: Odg: Lucene upgrade

Re: Odg: Odg: Lucene upgrade

Odg: Odg: Lucene upgrade

8 matches

Site Navigation

Mail list logo

Footer information