Re: debugging query execution plan

2021-06-09 Thread Adrien Grand
FYI this got just checked in:
https://issues.apache.org/jira/browse/LUCENE-9965.

I'd be curious to know if it helps with your problem, Mike.

On Wed, May 12, 2021 at 1:54 PM Adrien Grand  wrote:

> Indeed this is code is ASL2 pre-7.10, but I wouldn't have expected any
> concerns regardless. Jack volunteered to bring this code to Lucene by
> removing the Elasticsearch-specific bits.
>
> On Mon, May 10, 2021 at 4:55 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> +1 to start from the Elasticsearch implementation for low-level query
>> execution tracing, which I think is from (pre-7.10) ASL2 licensed code?
>>
>> That sounds helpful, even with the Heisenberg caveats.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, May 6, 2021 at 4:24 PM Adrien Grand  wrote:
>>
>>> We have something like that in Elasticsearch that wraps queries in order
>>> to be able to report cost, matchCost and the number of calls to
>>> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in
>>> the query tree.
>>>
>>> It's not perfect as it needs to disable some optimizations in order to
>>> work properly. For instance bulk scorers are disabled and conjunctions are
>>> not inlined, which means that clauses may run in a different order. So
>>> results need to be interpreted carefully as the way the query gets executed
>>> when observed may differ a bit from how it gets executed normally. That
>>> said it has still been useful in a number of cases. I don't think our
>>> implementation works when IndexSearcher is configured with an executor but
>>> we could maybe put it in sandbox and iterate from there?
>>>
>>> For your case, do you think it could be attributed to deleted docs?
>>> Deleted docs are checked before two-phase confirmation and collectors but
>>> after disjunctions/conjunctions of postings.
>>>
>>> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov  a
>>> écrit :
>>>
 Do we have a way to understand how BooleanQuery (and other composite
 queries) are advancing their child queries? For example, a simple
 conjunction of two queries advances the more restrictive (lower
 cost()) query first, enabling the more costly query to skip over more
 documents. But we may not be making the best choice in every case, and
 I would like to know, for some query, how we are doing. For example,
 we could execute in a debugging mode, interposing something that wraps
 or observes the Scorers in some way, gathering statistics about how
 many documents are visited by each Scorer, which can be aggregated for
 later analysis.

 This is motivated by a use case we have in which we currently
 post-filter our query results in a custom collector using some filters
 that we know to be expensive (they must be evaluated on every
 document), but we would rather express these post-filters as Queries
 and have them advanced during the main Query execution. However when
 we tried to do that, we saw some slowdowns (in spite of marking these
 Queries as high-cost) and I suspect it is due to the iteration order,
 but I'm not sure how to debug.

 Suggestions welcome!

 -Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


>
> --
> Adrien
>


-- 
Adrien


RE: [lucene] branch main updated: LUCENE-9995: JDK17 generates wbr tags which make javadocs checker angry.

2021-06-09 Thread Uwe Schindler
Oh my god.  is an invention going back to Netscape 4. I have no idea how 
it came into HTML5, it has nothing to do with structuring documents in HTML 
sense, it's from the time before there was Unicode.

The correct replacement is a zero-width-space (Unicode U+200B).

AMEN!
Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: dwe...@apache.org 
> Sent: Wednesday, June 9, 2021 10:45 AM
> To: comm...@lucene.apache.org
> Subject: [lucene] branch main updated: LUCENE-9995: JDK17 generates wbr
> tags which make javadocs checker angry.
> 
> This is an automated email from the ASF dual-hosted git repository.
> 
> dweiss pushed a commit to branch main
> in repository https://gitbox.apache.org/repos/asf/lucene.git
> 
> 
> The following commit(s) were added to refs/heads/main by this push:
>  new 332405e  LUCENE-9995: JDK17 generates wbr tags which make
> javadocs checker angry.
> 332405e is described below
> 
> commit 332405e7ada458a4df1a1226fa97c4193d7975b1
> Author: Dawid Weiss 
> AuthorDate: Wed Jun 9 10:45:01 2021 +0200
> 
> LUCENE-9995: JDK17 generates wbr tags which make javadocs checker
> angry.
> ---
>  gradle/documentation/check-broken-links/checkJavadocLinks.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gradle/documentation/check-broken-links/checkJavadocLinks.py
> b/gradle/documentation/check-broken-links/checkJavadocLinks.py
> index b7efd3d..768a741 100644
> --- a/gradle/documentation/check-broken-links/checkJavadocLinks.py
> +++ b/gradle/documentation/check-broken-links/checkJavadocLinks.py
> @@ -41,7 +41,7 @@ class FindHyperlinks(HTMLParser):
>def handle_starttag(self, tag, attrs):
>  # NOTE: I don't think 'a' should be in here. But try debugging
>  # NumericRangeQuery.html. (Could be javadocs bug, it's a generic type...)
> -if tag not in ('link', 'meta', 'frame', 'br', 'hr', 'p', 'li', 'img', 
> 'col', 'a', 'dt', 'dd'):
> +if tag not in ('link', 'meta', 'frame', 'br', 'wbr', 'hr', 'p', 'li', 
> 'img', 'col', 'a', 'dt',
> 'dd'):
>self.stack.append(tag)
>  if tag == 'a':
>id = None


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene » Lucene-Solr-SmokeRelease-8.x - Build # 220 - Still Failing!

2021-06-09 Thread Jason Gerlowski
Hey all,

Quick update: I've removed the javax.annotation dependency declaration
that the build didn't like.  Precommit and tests pass.  I've pushed
the fix to branch_8_9.  I'm currently running buildAndPushRelease.py +
smoke-tester and will give the all-clear once I see that pass.

Sorry to hold things up on the release!

Jason

On Tue, Jun 8, 2021 at 4:17 AM Christine Poerschke (BLOOMBERG/ LONDON)
 wrote:
>
> I've opened https://github.com/apache/lucene-solr/pull/2509 for the v8.10.0 > 
> v8.9.0 comparison issue. Hope that helps.
>
> From: dev@lucene.apache.org At: 06/08/21 01:28:25 UTC+1:00
> To: dev@lucene.apache.org
> Subject: Re: [JENKINS] Lucene » Lucene-Solr-SmokeRelease-8.x - Build # 220 - 
> Still Failing!
>
> Thanks Robert for the explanation.
>
> Jason, thank you for your reply and investigations.
>
> > Do you happen to have the command that reproduced this for you?
> You can see the build failures in Jenkins.  The last 3 failures were after 
> June 2nd, and all contained similar failures about sheisty classes in solr.
>
> About specific RC for the smoke-tester, I think you can first build a release 
> locally with buildAndPushRelease.py script, and then use this local release 
> for the smoke tester.
>
> I've also looked at the failure on 8.x branch: "Future release 8.9.0 is 
> greater than 8.10.0", and it looks like smokeTestRelease.py fails to 
> correctly compare v8.10.0 > v8.9.0, as (('8', '10', '0') < ('8', '9', '0')).
>
>
>
>
>
> On Mon, Jun 7, 2021 at 3:24 PM Robert Muir  wrote:
>>
>> Sheisty means classes that collide with jdk packages. E.g. this
>> javax.annotation looks like a problem, as it collides with existing
>> jdk package in an xml module:
>> https://docs.oracle.com/javase/8/docs/api/javax/annotation/package-summary.html
>>
>> Looks like the whole jar being used here is archived/deprecated [1] in
>> favor of "jakarta annotations" [2] which uses a new
>> non-colliding/jar-helling "jakarta.annotation" package/module:
>>
>> 1. https://github.com/javaee/javax.annotation/
>> 2. https://github.com/eclipse-ee4j/common-annotations-api
>>
>>
>>
>>
>> M
>>
>> On Mon, Jun 7, 2021 at 2:59 PM Jason Gerlowski  wrote:
>> >
>> > Hey Mayya,
>> >
>> > My "fix" on branch_8x is already present in 8.9: see
>> > c461c506ffc02d4f3d16c7a0b0ec4250ba79fb7d from June 2nd.  Evidently
>> > there's still other problems though.
>> >
>> > I'll look into this on my end for sure.  Do you happen to have the
>> > command that reproduced this for you?  Or understand what "sheisty"
>> > means here haha?
>> >
>> > Jason
>> >
>> > On Mon, Jun 7, 2021 at 1:12 PM Mayya Sharipova
>> >  wrote:
>> > >
>> > > Hello Jason,
>> > > thanks for the update.
>> > >
>> > > While trying to build a release candidate and doing  the smoke test, I 
>> > > am getting the following error:
>> > > RuntimeError: JAR file 
>> > > "../.lucene-releases/8.9.0/RC1/smoketest/unpack/solr-8.9.0/contrib/gcs-repository/lib/javax.annotation-api-1.3.2.jar"
>> > >  contains sheisty class 
>> > > "javax/annotation/sql/DataSourceDefinitions.class"
>> > >
>> > > I guess we would need to backport your changes to branch_8_9 as well. 
>> > > Would it be possible for you to do this?
>> > >
>> > > I will look into how to fix another error about "Future release 8.9.0 is 
>> > > greater than 8.10.0".
>> > >
>> > > On Thu, Jun 3, 2021 at 7:54 AM Jason Gerlowski  
>> > > wrote:
>> > >>
>> > >> I pushed a commit to branch_8x yesterday that I expect has fixed this.
>> > >> The 'generate-maven-artifacts' task now succeeds for me locally at
>> > >> least.
>> > >>
>> > >> The Jenkins job failed in the overnight run, but it looks to be a
>> > >> consequence of the release process that Mayya has in flight:
>> > >>
>> > >> "RuntimeError: Future release 8.9.0 is greater than 8.10.0 in
>> > >> file:///home/jenkins/jenkins-slave/workspace/Lucene/Lucene-Solr-SmokeRelease-8.x/lucene/build/smokeTestRelease/dist/lucene/changes/Changes.html"
>> > >>
>> > >> If anyone sees the "gcs-repository" related error message crop up
>> > >> anywhere else, please let me know.
>> > >>
>> > >> Jason
>> > >>
>> > >> On Tue, Jun 1, 2021 at 7:55 AM Jason Gerlowski  
>> > >> wrote:
>> > >> >
>> > >> > Hey all,
>> > >> >
>> > >> > This is my fault.  Will look at fixing it this morning.  Sorry for the
>> > >> > disruption!
>> > >> >
>> > >> > If this is blocking anyone, let me know and I'll revert the offending
>> > >> > commit while I investigate the cause.  Otherwise I'll just leave it
>> > >> > as-is and push to have a fix as soon as I can.
>> > >> >
>> > >> > Jason
>> > >> >
>> > >> > On Sat, May 29, 2021 at 8:35 PM Robert Muir  wrote:
>> > >> > >
>> > >> > > The latest 8.x failure seems to be related to POM files from solr 
>> > >> > > gcs
>> > >> > > repository. I don't currently know what is needed to move this 
>> > >> > > along.
>> > >> > >
>> > >> > > On Sat, May 29, 2021 at 8:28 PM Apache Jenkins Server
>> > >> > >  wrote:
>> > >> > > >
>> > >> > > > Build: 
>> > >> > > > 

Re: Analyzer lifecycles

2021-06-09 Thread Robert Muir
Alan, I'd also like to comment on this:

The reason we have TokenStreamComponents and ReuseStrategies (as I
understand it) is not because they may have to load large resource
files or dictionaries or whatever, but it’s because building a
TokenStream is itself quite a heavy operation due to
AttributeFactories and reflection.

That's actually not my major concern. There are plenty of
TokenStream.'s doing a fair amount of work, creating objects,
setting up buffers, anything to make the actual processing fast. This
makes sense today because they are reused. But if we *sometimes* reuse
tokenstreams (indexwriter) and *other times dont* (query time), it
just adds more pain to keeping the analyzers efficient. Now they have
to optimize for 2 cases.

I also don't want all this stuff added up to increased garbage for
users that actually are search-heavy. Some of these users don't care
about index speed at all and are more concerned with QPS, latencies,
etc. This is very different from e.g. logging use-cases where people
are just indexing all day and maybe rarely searching. Current analyzer
design is efficient for both use-cases.

On Wed, Jun 9, 2021 at 9:03 AM Alan Woodward  wrote:
>
> Hey Robert,
>
> Analyzers themselves can be heavy and load large data files, etc, I agree, 
> but I’m really talking about token stream construction.  The way things are 
> set up, we expect the heavy lifting to be done when the Analyzer is 
> constructed, but these heavy resources should then be shared between token 
> streams (I fixed a bug in the Ukrainian analyzer a while back that was 
> getting this wrong, see LUCENE-9930).  So you’d build your Analyzers once and 
> use the same instances at query and at index time, and there’s no worry about 
> reloading large dictionaries on every use.
>
> But re-using token streams is different. The reason we have 
> TokenStreamComponents and ReuseStrategies (as I understand it) is not because 
> they may have to load large resource files or dictionaries or whatever, but 
> it’s because building a TokenStream is itself quite a heavy operation due to 
> AttributeFactories and reflection.  My argument is that this is only heavy 
> relative to the cost of indexing a single field, and that this only really 
> matters when you have documents with lots of small fields in them.  For query 
> building or highlighting or MoreLlikeThis, the cost of building a small 
> number of token streams is tiny compared to all the other heavy lifting and 
> IO going on.  And so if we pushed this *TokenStream* reuse into IndexWriter 
> we wouldn’t have to have a close() method on Analyzer (because the thread 
> locals go away, and we expect file resources etc to be closed once the 
> analyzer has finished building itself), and delegating or wrapping analyzers 
> becomes much simpler.
>
> Does that make more sense?
>
> (I agree on the thread pool stuff, but we need to be careful about not 
> blowing up users systems even if they are implementing anti-patterns!)
>
> > On 8 Jun 2021, at 16:12, Robert Muir  wrote:
> >
> > Alan: a couple thoughts:
> >
> > Analyzers are not just used for formulating queries, but also may be
> > used by highlighters and other things on document results at query
> > time.
> > Some analyzers may do too-expensive/garbage-creating stuff on
> > construction, that you wouldn't want to do at query-time.
> > Separately, I think Analyzer being closable makes sense.
> > Users still need to carefully consider the lifecycle of this thing for
> > performance, and may want to return their own resources for some
> > reason (close() is a non-final method today)
> > Analyzers might require large amounts of resources (such as parsing
> > files/lists, ml models, who knows what).
> > For the built-in minimal resources that we ship, we try to make
> > construction cheap and use static holder classes, and so on. I'm
> > concerned some of these are costly.
> > But I'm definitely worried about longer files and stuff that many
> > users might use.
> >
> > I feel like some of this "large threadpool" stuff is just a java
> > antipattern for search. I configure servers with fixed threadpools
> > matching the number of CPU cores, and tell my load balancer about that
> > number (e.g. haproxy maxconn), so that it can effectively queue and
> > not overload search servers.
> >
> > On Tue, Jun 8, 2021 at 10:23 AM Alan Woodward  wrote:
> >>
> >> Hi all,
> >>
> >> I’ve been on holiday and away from a keyboard for a week, so that means I 
> >> of course spent my time thinking about lucene Analyzers and specifically 
> >> their ReuseStrategies…
> >>
> >> Building a TokenStream can be quite a heavy operation, and so we try and 
> >> reuse already-constructed token streams as much as possible.  This is 
> >> particularly important at index time, as having to create lots and lots of 
> >> very short-lived token streams for documents with many short text fields 
> >> could mean that we spend longer building these objects than we 

Re: Analyzer lifecycles

2021-06-09 Thread Robert Muir
Yes I'm using the term "Analyzer" in a generic sense, also concerned
about TokenStream init costs, garbage, etc.

There are a ton of uses here other than indexwriter,
AnalyzingSuggesters building FSTs, etc etc.

I don't think we need to try to add even more complexity because of
users implementing these anti-patterns on their end. This problem of
using hundreds/thousands of threads is a uniquely "java-developer"
problem. I don't see these issues with applications written in other
programming languages. We really can't shield them from it. If we stop
reusing, they will just get different _symptoms_ (slow performance, GC
issues, etc), but the underlying problem is using all those
unnecessary threads. That's what should get fixed.

On Wed, Jun 9, 2021 at 9:03 AM Alan Woodward  wrote:
>
> Hey Robert,
>
> Analyzers themselves can be heavy and load large data files, etc, I agree, 
> but I’m really talking about token stream construction.  The way things are 
> set up, we expect the heavy lifting to be done when the Analyzer is 
> constructed, but these heavy resources should then be shared between token 
> streams (I fixed a bug in the Ukrainian analyzer a while back that was 
> getting this wrong, see LUCENE-9930).  So you’d build your Analyzers once and 
> use the same instances at query and at index time, and there’s no worry about 
> reloading large dictionaries on every use.
>
> But re-using token streams is different. The reason we have 
> TokenStreamComponents and ReuseStrategies (as I understand it) is not because 
> they may have to load large resource files or dictionaries or whatever, but 
> it’s because building a TokenStream is itself quite a heavy operation due to 
> AttributeFactories and reflection.  My argument is that this is only heavy 
> relative to the cost of indexing a single field, and that this only really 
> matters when you have documents with lots of small fields in them.  For query 
> building or highlighting or MoreLlikeThis, the cost of building a small 
> number of token streams is tiny compared to all the other heavy lifting and 
> IO going on.  And so if we pushed this *TokenStream* reuse into IndexWriter 
> we wouldn’t have to have a close() method on Analyzer (because the thread 
> locals go away, and we expect file resources etc to be closed once the 
> analyzer has finished building itself), and delegating or wrapping analyzers 
> becomes much simpler.
>
> Does that make more sense?
>
> (I agree on the thread pool stuff, but we need to be careful about not 
> blowing up users systems even if they are implementing anti-patterns!)
>
> > On 8 Jun 2021, at 16:12, Robert Muir  wrote:
> >
> > Alan: a couple thoughts:
> >
> > Analyzers are not just used for formulating queries, but also may be
> > used by highlighters and other things on document results at query
> > time.
> > Some analyzers may do too-expensive/garbage-creating stuff on
> > construction, that you wouldn't want to do at query-time.
> > Separately, I think Analyzer being closable makes sense.
> > Users still need to carefully consider the lifecycle of this thing for
> > performance, and may want to return their own resources for some
> > reason (close() is a non-final method today)
> > Analyzers might require large amounts of resources (such as parsing
> > files/lists, ml models, who knows what).
> > For the built-in minimal resources that we ship, we try to make
> > construction cheap and use static holder classes, and so on. I'm
> > concerned some of these are costly.
> > But I'm definitely worried about longer files and stuff that many
> > users might use.
> >
> > I feel like some of this "large threadpool" stuff is just a java
> > antipattern for search. I configure servers with fixed threadpools
> > matching the number of CPU cores, and tell my load balancer about that
> > number (e.g. haproxy maxconn), so that it can effectively queue and
> > not overload search servers.
> >
> > On Tue, Jun 8, 2021 at 10:23 AM Alan Woodward  wrote:
> >>
> >> Hi all,
> >>
> >> I’ve been on holiday and away from a keyboard for a week, so that means I 
> >> of course spent my time thinking about lucene Analyzers and specifically 
> >> their ReuseStrategies…
> >>
> >> Building a TokenStream can be quite a heavy operation, and so we try and 
> >> reuse already-constructed token streams as much as possible.  This is 
> >> particularly important at index time, as having to create lots and lots of 
> >> very short-lived token streams for documents with many short text fields 
> >> could mean that we spend longer building these objects than we do pulling 
> >> data from them.  To help support this, lucene Analyzers have a 
> >> ReuseStrategy, which defaults to storing a map of fields to token streams 
> >> in a ThreadLocal object.  Because ThreadLocals can behave badly when it 
> >> comes to containers that have large thread pools, we use a special 
> >> CloseableThreadLocal class that can null out its contents once the 
> >> Analyzer 

Re: Analyzer lifecycles

2021-06-09 Thread Alan Woodward
Hey Robert,

Analyzers themselves can be heavy and load large data files, etc, I agree, but 
I’m really talking about token stream construction.  The way things are set up, 
we expect the heavy lifting to be done when the Analyzer is constructed, but 
these heavy resources should then be shared between token streams (I fixed a 
bug in the Ukrainian analyzer a while back that was getting this wrong, see 
LUCENE-9930).  So you’d build your Analyzers once and use the same instances at 
query and at index time, and there’s no worry about reloading large 
dictionaries on every use.

But re-using token streams is different. The reason we have 
TokenStreamComponents and ReuseStrategies (as I understand it) is not because 
they may have to load large resource files or dictionaries or whatever, but 
it’s because building a TokenStream is itself quite a heavy operation due to 
AttributeFactories and reflection.  My argument is that this is only heavy 
relative to the cost of indexing a single field, and that this only really 
matters when you have documents with lots of small fields in them.  For query 
building or highlighting or MoreLlikeThis, the cost of building a small number 
of token streams is tiny compared to all the other heavy lifting and IO going 
on.  And so if we pushed this *TokenStream* reuse into IndexWriter we wouldn’t 
have to have a close() method on Analyzer (because the thread locals go away, 
and we expect file resources etc to be closed once the analyzer has finished 
building itself), and delegating or wrapping analyzers becomes much simpler.

Does that make more sense?

(I agree on the thread pool stuff, but we need to be careful about not blowing 
up users systems even if they are implementing anti-patterns!)

> On 8 Jun 2021, at 16:12, Robert Muir  wrote:
> 
> Alan: a couple thoughts:
> 
> Analyzers are not just used for formulating queries, but also may be
> used by highlighters and other things on document results at query
> time.
> Some analyzers may do too-expensive/garbage-creating stuff on
> construction, that you wouldn't want to do at query-time.
> Separately, I think Analyzer being closable makes sense.
> Users still need to carefully consider the lifecycle of this thing for
> performance, and may want to return their own resources for some
> reason (close() is a non-final method today)
> Analyzers might require large amounts of resources (such as parsing
> files/lists, ml models, who knows what).
> For the built-in minimal resources that we ship, we try to make
> construction cheap and use static holder classes, and so on. I'm
> concerned some of these are costly.
> But I'm definitely worried about longer files and stuff that many
> users might use.
> 
> I feel like some of this "large threadpool" stuff is just a java
> antipattern for search. I configure servers with fixed threadpools
> matching the number of CPU cores, and tell my load balancer about that
> number (e.g. haproxy maxconn), so that it can effectively queue and
> not overload search servers.
> 
> On Tue, Jun 8, 2021 at 10:23 AM Alan Woodward  wrote:
>> 
>> Hi all,
>> 
>> I’ve been on holiday and away from a keyboard for a week, so that means I of 
>> course spent my time thinking about lucene Analyzers and specifically their 
>> ReuseStrategies…
>> 
>> Building a TokenStream can be quite a heavy operation, and so we try and 
>> reuse already-constructed token streams as much as possible.  This is 
>> particularly important at index time, as having to create lots and lots of 
>> very short-lived token streams for documents with many short text fields 
>> could mean that we spend longer building these objects than we do pulling 
>> data from them.  To help support this, lucene Analyzers have a 
>> ReuseStrategy, which defaults to storing a map of fields to token streams in 
>> a ThreadLocal object.  Because ThreadLocals can behave badly when it comes 
>> to containers that have large thread pools, we use a special 
>> CloseableThreadLocal class that can null out its contents once the Analyzer 
>> is done with, and this leads to Analyzer itself being Closeable.  This makes 
>> extending analyzers more complicated, as delegating wrappers need to ensure 
>> that they don’t end up sharing token streams with their delegates.
>> 
>> It’s common to use the same analyzer for indexing and for parsing user 
>> queries.  At query time, reusing token streams is a lot less important - the 
>> amount of time spent building the query is typically much lower than the 
>> amount of time spent rewriting and executing it.  The fact that this re-use 
>> is only really useful for index time and that the lifecycle of the analyzer 
>> is therefore very closely tied to the lifecycle of its associated 
>> IndexWriter makes me think that we should think about moving the re-use 
>> strategies into IndexWriter itself.  One option would be to have token 
>> streams be constructed once per DocumentsWriterPerThread, which would lose 
>> some 

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 297 - Still Unstable!

2021-06-09 Thread Michael Sokolov
just pushed fix; we should stop seeing this failure now

On Wed, Jun 9, 2021 at 3:51 AM Apache Jenkins Server
 wrote:
>
> Build: 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/297/
>
> 1 tests failed.
> FAILED:  
> org.apache.lucene.codecs.lucene90.TestLucene90HnswVectorFormat.testDeleteAllVectorDocs
>
> Error Message:
> org.apache.lucene.index.CorruptIndexException: Problem reading index from 
> MockDirectoryWrapper(ByteBuffersDirectory@41c6c9be 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@3771de2) 
> (resource=MockDirectoryWrapper(ByteBuffersDirectory@41c6c9be 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@3771de2))
>
> Stack Trace:
> org.apache.lucene.index.CorruptIndexException: Problem reading index from 
> MockDirectoryWrapper(ByteBuffersDirectory@41c6c9be 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@3771de2) 
> (resource=MockDirectoryWrapper(ByteBuffersDirectory@41c6c9be 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@3771de2))
> at 
> __randomizedtesting.SeedInfo.seed([75D25E7017151CD7:2DA5A4A80B897D2C]:0)
> at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:160)
> at org.apache.lucene.index.SegmentReader.(SegmentReader.java:89)
> at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)
> at 
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:221)
> at 
> org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:534)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:137)
> at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:596)
> at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:452)
> at 
> org.apache.lucene.index.BaseVectorFormatTestCase.testDeleteAllVectorDocs(BaseVectorFormatTestCase.java:542)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
>