Odg: Odg: Odg: Odg: Odg: Lucene upgrade
Hi all, Please could someone review #4395<https://github.com/apache/geode/pull/4395>. BR, Mario Šalje: Mario Kevo Poslano: 17. prosinca 2019. 14:30 Prima: Jason Huynh Kopija: geode Predmet: Odg: Odg: Odg: Odg: Odg: Lucene upgrade Hi Jason, Nice catch! I tried with larger number of retries(with your changes) and it passed. I will try to make it time based. Thanks for a help! BR, Mario Šalje: Jason Huynh Poslano: 13. prosinca 2019. 23:10 Prima: Mario Kevo Kopija: geode Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade Hi Mario, I think I see what is going on here. The logic for "reindex" code was a bit off ( it expected reindex features to be complete by a certain release). I have a PR on develop to adjust that calculation (https://github.com/apache/geode/pull/4466) The expectation is that when lucene reindex (indexing a region with a data already in it) is enabled - any query will now throw the LuceneIndexingInProgressException instead of possibly waiting a very long time to receive a query result. The tests themselves are coded to retry 10 times, knowing it will take awhile to reindex. If you bump this number up or, better yet, make it time based (awaitility, etc), it should get you past this problem (once the pull request gets checked in and pulled into your branch) Thanks! -Jason On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo wrote: Hi Jason, Yes, the same tests failed: RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion Sometimes this tests passed but more times it failed. As I said when change tests to put lower number of entries it passed every time or set to wait for repo in LuceneQueryFunction.java. waitUntilFlushed is called by verifyLuceneQueryResults before executing queries. Also tried to wait until isIndexingInProgress return false, but reached timeout and failed. In tests it tried to execute a query after all members are rolled. BR, Mario Šalje: Jason Huynh mailto:jhu...@pivotal.io>> Poslano: 11. prosinca 2019. 23:08 Prima: Mario Kevo Kopija: geode mailto:dev@geode.apache.org>> Predmet: Re: Odg: Odg: Odg: Lucene upgrade Hi Mario, Is the same test failing? If it's a different test, could you tell us which one? If it's a rolling upgrade test, then we might have to mark this as expected behavior and modify the tests to waitForFlush (wait until the queue is drained). As long as the test is able to roll all the servers and not get stuck waiting for a queue to flush (which will only happen once all the servers are rolled now). If the test hasn't rolled all the servers and is trying to execute a query, then we'd probably have to modify the test to not do the query in the middle or expect that exception to occur. Thanks, -Jason On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo wrote: Hi Jason, This change fix IndexFormatTooNewException, but now we have org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing So this means that query doesn't wait until all indexes are created. In LuceneQueryFunction.java it is set to not wait for repo [execute(context, false)]. If we have a bigger queue(like in the test) it will failed as it will not wait until indexes are created. I also tried to put just few objects and it passed as it had enough time to create indexes. Do we need to change this part to wait for repo, or put a lower number of entries in tests? BR, Mario Šalje: Jason Huynh mailto:jhu...@pivotal.io>> Poslano: 6. prosinca 2019. 20:53 Prima: Mario Kevo Kopija: geode mailto:dev@geode.apache.org>> Predmet: Re: Odg: Odg: Lucene upgrade Hi Mario, I made a PR against your branch for some of the changes I had to do to get past the Index too new exception. Summary - repo creation, even if no writes occur, appear to create some meta data that the old node attempts to read and blow up on. The pr against your branch just prevents the repo from being constructed until all old members are upgraded. This requires test changes to not try to validate using queries (since we prevent draining and repo creation, the query will just wait) The reason why you probably were seeing unsuccessful dispatches, is because we kind of intended for that with the oldMember check. In-between the server rolls, the test was trying to verify, but because not all servers had upgraded, the LuceneEventListener wasn't allowing the queue to drain on the new member. I am not sure if the changes I added are acceptable or not -maybe if this ends up working then we can discuss on the dev list. There will probably be other "gotcha's" along the way... On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo wrote: Hi Jason, I tried to upgrade f
Odg: Odg: Odg: Odg: Odg: Lucene upgrade
Hi Jason, Nice catch! I tried with larger number of retries(with your changes) and it passed. I will try to make it time based. Thanks for a help! BR, Mario Šalje: Jason Huynh Poslano: 13. prosinca 2019. 23:10 Prima: Mario Kevo Kopija: geode Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade Hi Mario, I think I see what is going on here. The logic for "reindex" code was a bit off ( it expected reindex features to be complete by a certain release). I have a PR on develop to adjust that calculation (https://github.com/apache/geode/pull/4466) The expectation is that when lucene reindex (indexing a region with a data already in it) is enabled - any query will now throw the LuceneIndexingInProgressException instead of possibly waiting a very long time to receive a query result. The tests themselves are coded to retry 10 times, knowing it will take awhile to reindex. If you bump this number up or, better yet, make it time based (awaitility, etc), it should get you past this problem (once the pull request gets checked in and pulled into your branch) Thanks! -Jason On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo wrote: Hi Jason, Yes, the same tests failed: RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion Sometimes this tests passed but more times it failed. As I said when change tests to put lower number of entries it passed every time or set to wait for repo in LuceneQueryFunction.java. waitUntilFlushed is called by verifyLuceneQueryResults before executing queries. Also tried to wait until isIndexingInProgress return false, but reached timeout and failed. In tests it tried to execute a query after all members are rolled. BR, Mario Šalje: Jason Huynh mailto:jhu...@pivotal.io>> Poslano: 11. prosinca 2019. 23:08 Prima: Mario Kevo Kopija: geode mailto:dev@geode.apache.org>> Predmet: Re: Odg: Odg: Odg: Lucene upgrade Hi Mario, Is the same test failing? If it's a different test, could you tell us which one? If it's a rolling upgrade test, then we might have to mark this as expected behavior and modify the tests to waitForFlush (wait until the queue is drained). As long as the test is able to roll all the servers and not get stuck waiting for a queue to flush (which will only happen once all the servers are rolled now). If the test hasn't rolled all the servers and is trying to execute a query, then we'd probably have to modify the test to not do the query in the middle or expect that exception to occur. Thanks, -Jason On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo wrote: Hi Jason, This change fix IndexFormatTooNewException, but now we have org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing So this means that query doesn't wait until all indexes are created. In LuceneQueryFunction.java it is set to not wait for repo [execute(context, false)]. If we have a bigger queue(like in the test) it will failed as it will not wait until indexes are created. I also tried to put just few objects and it passed as it had enough time to create indexes. Do we need to change this part to wait for repo, or put a lower number of entries in tests? BR, Mario Šalje: Jason Huynh mailto:jhu...@pivotal.io>> Poslano: 6. prosinca 2019. 20:53 Prima: Mario Kevo Kopija: geode mailto:dev@geode.apache.org>> Predmet: Re: Odg: Odg: Lucene upgrade Hi Mario, I made a PR against your branch for some of the changes I had to do to get past the Index too new exception. Summary - repo creation, even if no writes occur, appear to create some meta data that the old node attempts to read and blow up on. The pr against your branch just prevents the repo from being constructed until all old members are upgraded. This requires test changes to not try to validate using queries (since we prevent draining and repo creation, the query will just wait) The reason why you probably were seeing unsuccessful dispatches, is because we kind of intended for that with the oldMember check. In-between the server rolls, the test was trying to verify, but because not all servers had upgraded, the LuceneEventListener wasn't allowing the queue to drain on the new member. I am not sure if the changes I added are acceptable or not -maybe if this ends up working then we can discuss on the dev list. There will probably be other "gotcha's" along the way... On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo wrote: Hi Jason, I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6) It looks like the fix is not good. What I see (from