Re: [VOTE] Release Lucene 9.11.0 RC1

2024-06-05 Thread Michael McCandless
+1 SUCCESS! [0:24:55.332837]

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jun 5, 2024 at 11:21 AM Adrien Grand  wrote:

> +1 SUCCESS! [1:09:30.262027]
>
> On Wed, Jun 5, 2024 at 4:15 PM Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
>> +1
>>
>> SUCCESS! [1:12:30.029470]
>>
>> On Wed, Jun 5, 2024 at 9:22 AM Bruno Roustant 
>> wrote:
>>
>>> +1
>>>
>>> SUCCESS! [0:41:14.593265]
>>>
>>> Bruno
>>>

>
> --
> Adrien
>


Re: Lucene 9.11

2024-05-29 Thread Michael McCandless
Thanks Ben!

Mike McCandless

http://blog.mikemccandless.com


On Wed, May 29, 2024 at 12:45 AM Stefan Vodita 
wrote:

> Ben, I just merged #13414 ,
> so it's not a blocker for the release.
> Thanks again for volunteering to be release manager!
>
> Stefan
>
> On Tue, 28 May 2024 at 14:58, Benjamin Trent 
> wrote:
>
>> Hey y'all,
>>
>> I am planning on starting the release process tomorrow (May 29).
>>
>> I am in the Eastern USA time zone, so I will start the process around
>> noon UTC.
>>
>> I noticed one PR from Stefan. I can wait for that one if I need to.
>>
>> Did we figure out the hppc concerns? I saw some PR activity, wanted to
>> make sure we are all still good with starting the release process this week.
>>
>> Anything else I should be aware of or wait for?
>>
>> Thanks!
>>
>> Ben Trent
>>
>> On Wed, May 15, 2024, 3:58 AM Chris Hegarty
>>  wrote:
>>
>>> +1
>>>
>>> -Chris.
>>>
>>> > On 14 May 2024, at 16:10, Adrien Grand  wrote:
>>> >
>>> > +1 the 9.11 changelog looks great!
>>> >
>>> > On Tue, May 14, 2024 at 4:50 PM Benjamin Trent 
>>> wrote:
>>> > Hey y'all,
>>> >
>>> > Looking at changes for 9.11, we are building a significant list. I
>>> propose we do a release in the next couple of weeks.
>>> >
>>> > While this email is a little early (I am about to go on vacation for a
>>> bit), I volunteer myself as release manager.
>>> >
>>> > Unless there are objections, I plan on kicking off the release process
>>> May 28th.
>>> >
>>> > Thanks!
>>> >
>>> > Ben
>>> >
>>> >
>>> > --
>>> > Adrien
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>


Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1315 - Still Unstable!

2024-04-02 Thread Michael McCandless
Hmm this failure looks not great.

I tried the "Reproduce with:" for one of the failures (see below) but it
fails to run any tests at all?  Maybe because of the cool parameterized
testing we now have for our back compat tests?  If I remove the "{...}"
pattern then the failures do repro.

./gradlew :lucene:backward-codecs:test --tests
"org.apache.lucene.backward_index.TestBinaryBackwardsCompatibility.testSearchOldIndex
{Lucene-Version:9.10.1; Pattern: unsupported.%1$s-cfs.zip}" -Ptests.jvms=4
-Ptests.jvmargs= -Ptests.seed=AED171B219\
72F50D -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.gui=true
-Ptests.file.encoding=ISO-8859-1
-Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt
-Ptests.vectorsize=256

Mike McCandless

http://blog.mikemccandless.com


On Tue, Apr 2, 2024 at 4:52 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1315/
>
> 6 tests failed.
> FAILED:
> org.apache.lucene.backward_index.TestBinaryBackwardsCompatibility.testSearchOldIndex
> {Lucene-Version:9.10.1; Pattern: unsupported.%1$s-cfs.zip}
>
> Error Message:
> java.lang.AssertionError: Index name 9.10.1 not found:
> unsupported.9.10.1-cfs.zip
>
> Stack Trace:
> java.lang.AssertionError: Index name 9.10.1 not found:
> unsupported.9.10.1-cfs.zip
> at
> __randomizedtesting.SeedInfo.seed([AED171B21972F50D:E4679B8937FD59F]:0)
> at junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
> at junit@4.13.1/org.junit.Assert.assertTrue(Assert.java:42)
> at junit@4.13.1/org.junit.Assert.assertNotNull(Assert.java:713)
> at
> org.apache.lucene.backward_index.BackwardsCompatibilityTestBase.setUp(BackwardsCompatibilityTestBase.java:145)
> at
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> at java.base/java.lang.reflect.Method.invoke(Method.java:580)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1
> /org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at randomizedtesting.runner@2.8.1
> 

Re: Query about the GitHub statistics for Lucene

2024-03-06 Thread Michael McCandless
On Wed, Mar 6, 2024 at 4:41 AM Chris Hegarty 
wrote:

Seems that I’ve fallen into the newbie PMC Chair rabbit hole! ;-) - the
> reporting tool has long standing issues. Maybe they’re fixable, maybe not,
> but it’s possible we don’t necessarily need it now.
>

Sorry :)  Seems to be a rite-of-passage at this point!  It should be
mentioned in the handover instructions... or, we should simply merge Daniel
Gruno's one-line fix to the regexp that Kibble/Whimsy/reporter tool uses:
https://issues.apache.org/jira/browse/COMDEV-425?focusedCommentId=17823767=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17823767

@Mike is it possible to add “created since” filter?
>

Ahh good idea, done!
https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=created%3APast+3+months=issue_or_pr%3APR
(this is PRs created in the Past 3 months ... it shows 36 open and 162
closed right now, close to the GitHub counts you found).

Here's the luceneserver commit that adds it:
https://github.com/mikemccand/luceneserver/commit/397942573bed3e2c4fd00ab0a324a19fd014bfd4

Mike McCandless

http://blog.mikemccandless.com


Re: Query about the GitHub statistics for Lucene

2024-03-05 Thread Michael McCandless
Found the prior discussion/issue:
https://lists.apache.org/thread/fhzw0y7kpnf48cxfml8t0313sdswdv6b

And a prior prior discussion:
https://lists.apache.org/thread/6rsr8v982fjqgyopprqzw057cpzfnz3z

Issue: https://issues.apache.org/jira/browse/COMDEV-425.  Jan seemed to get
close to fixing the (regexp?) bug!

Mike McCandless

http://blog.mikemccandless.com


On Tue, Mar 5, 2024 at 1:03 PM Michael McCandless 
wrote:

>
> On Tue, Mar 5, 2024 at 4:49 AM Chris Hegarty <
> christopher.hega...@elastic.co> wrote:
>
> In preparation for the project’s upcoming ASF board report, I came across
>> and reported [1] an issue with the GH statistics, available at:
>> https://reporter.apache.org/wizard/statistics?lucene
>>
>> It appears that there is no GH activity for 2024! Clearly this is
>> incorrect. I’ve yet to track down what’s going on with this. Familiar to
>> anyone here?
>
>
> There is a long-standing INFRA issue about this.  Lemme try to locate it
> ...
>
> @Mike. Would it be possible to add a “Past 3 months” to
>> https://githubsearch.mikemccandless.com/search.py ? Which would be
>> helpful when reporting.
>>
>
> Good idea!  Done!
> https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=updated%3APast+3+months
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>


Re: Query about the GitHub statistics for Lucene

2024-03-05 Thread Michael McCandless
On Tue, Mar 5, 2024 at 4:49 AM Chris Hegarty 
wrote:

In preparation for the project’s upcoming ASF board report, I came across
> and reported [1] an issue with the GH statistics, available at:
> https://reporter.apache.org/wizard/statistics?lucene
>
> It appears that there is no GH activity for 2024! Clearly this is
> incorrect. I’ve yet to track down what’s going on with this. Familiar to
> anyone here?


There is a long-standing INFRA issue about this.  Lemme try to locate it
...

@Mike. Would it be possible to add a “Past 3 months” to
> https://githubsearch.mikemccandless.com/search.py ? Which would be
> helpful when reporting.
>

Good idea!  Done!
https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=updated%3APast+3+months

Mike McCandless

http://blog.mikemccandless.com


Re: [VOTE] Release PyLucene 9.10.0-rc1

2024-03-04 Thread Michael McCandless
+1 to release.

I successfully ran my standard PyLucene smoke test of indexing the first
100K enwiki documents, running a couple queries, force merging to one
segment, and running again.

This was on Python 3.11, OpenJDK 21, Arch Linux kernel 6.4.1.

I am sad that this may be the last official PyLucene release!!  Sorry for
the long delay on completing my vote.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Feb 21, 2024 at 4:50 PM Andi Vajda  wrote:

>
> The PyLucene 9.10.0 (rc1) release tracking the recent release of
> Apache Lucene 9.10.0 is ready.
>
> A release candidate is available from:
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/9.10.0-rc1/
>
> PyLucene 9.10.0 is built with JCC 3.14, included in these release
> artifacts.
>
> Apart from the catch-up to Lucene 9.10.0, the other major new feature in
> this release candidate is that JCC can now generate a setup.py file
> instead
> of calling Setup() directly. This makes it possible to use modern Python
> packaging without falling afoul of "python setup.py install" being
> deprecated. Setup.py itself is not deprecated, only some of its associated
> commands are; see [1] for more information about this.
>
> In PyLucene's Makefile, there now is a new MODERN_PACKAGING variable,
> which
> can be set to true so that "python -m build" and "python -m pip install"
> are
> used for building and installing PyLucene.
>
> JCC 3.14 supports Python 3.3 up to Python 3.12.
> PyLucene may also be built with Python 2 but this configuration is no
> longer
> tested.
>
> Please vote to release these artifacts as PyLucene 9.10.0.
> Anyone interested in this release can and should vote !
>
> Thanks !
>
> Andi..
>
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
>
> pps: here is my +1
>
> [1]
> https://packaging.python.org/en/latest/discussions/setup-py-deprecated/
>


Re: Announcing githubsearch!

2024-02-26 Thread Michael McCandless
Done!  Deployed!  Thank you Mike S.

Though on my "dark mode" Chrome on a Macbook, it's super dark.  I can make
it out but I gotta stare for a bit ... do they make light and dark mode
.ico files in one!?

Mike McCandless

http://blog.mikemccandless.com


On Sun, Feb 25, 2024 at 6:05 PM Michael Sokolov  wrote:

> here is a favicon you might want to try: I cropped the "VL" from the
> Apache Lucene logo (ok I guess it's an AL) -- if you save it as
> favicon.ico in the root of your website (ie as url /favicon.ico) it
> should show up in bookmarks, browser toolbars, etc as a handy memory
> aid. Of course you might have other ideas for a picture - it's
> actually pretty easy to make the favicon once you have a picture you
> like; I followed the instructions here
>
> https://www.logikfabrik.se/blog/how-to-create-a-multisize-favicon-using-gimp/
>
> On Thu, Feb 22, 2024 at 10:48 AM Zhang Chao <80152...@qq.com.invalid>
> wrote:
> >
> > Great job! Thanks Mike!
> >
> > 2024年2月22日 22:31,Alessandro Benedetti  写道:
> >
> > That's cool Mike! Well done!
> >
> > On Wed, 21 Feb 2024, 22:02 Anshum Gupta,  wrote:
> >>
> >> This is great! Like always, thank you Mike!
> >>
> >> On Mon, Feb 19, 2024 at 8:40 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> ~1.5 years ago (August 2022) we migrated our Lucene issue tracking
> from Jira to GitHub. Thank you Tomoko for all the hard work doing such a
> complex, multi-phased, high-fidelity migration!
> >>>
> >>> I finally finished also migrating jirasearch to GitHub:
> githubsearch.mikemccandless.com. It was tricky because GitHub issues/PRs
> are fundamentally more complex than Jira's data model, and the GitHub REST
> API is also quite rich / heavily normalized. All of the source code for
> githubsearch lives here. The UI remains its barebones self ;)
> >>>
> >>> Githubsearch is dog food for us: it showcases Lucene (currently
> 9.8.0), and many of its fun features like infix autosuggest, block join
> queries (each comment is a sub-document on the issue/PR), DrillSideways
> faceting, near-real-time indexing/searching, synonyms (try “oome”),
> expressions, non-relevance and blended-relevance sort, etc.  (This old blog
> post goes into detail.)  Plus, it’s meta-fun to use Lucene to search its
> own issues, to help us be more productive in improving Lucene!  Nicely
> recursive.
> >>>
> >>> In addition to good ol’ searching by text, githubsearch has some
> new/fun features:
> >>>
> >>> Drill down to just PRs or issues
> >>> Filter by “review requested” for a given user: poor Adrien has 8
> (open) now (sorry)! Or see your mentions (Robert is mentioned in 27 open
> issues/PRs). Or PRs that you reviewed (Uwe has reviewed 9 still-open PRs).
> Or issues and PRs where a user has had any involvement at all (Dawid has
> interacted on 197 issues/PRs).
> >>> Find still-open PRs that were created by a New Contributor (an author
> who has no changes merged into our repository) or Contributor
> (non-committer who has had some changes merged into our repository) or
> Member
> >>> Here are the uber-stale (last touched more than a month ago) open PRs
> by outside contributors. We should ideally keep this at 0, but it’s 83 now!
> >>> “Link to this search” to get a short-er, more permanent URL (it is NOT
> a URL shortener, though!)
> >>> Save named searches you frequently run (they just save to local cookie
> state on that one browser)
> >>>
> >>> I’m sure there are exciting bugs, feedback/patches welcome!  If you
> see problems, please reply to this email or file an issue here.
> >>>
> >>> Note that jirasearch remains running, to search Solr, Tika and Infra
> issues.
> >>>
> >>> Happy Searching,
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


Re: [Vote] Bump the Lucene main branch to Java 21

2024-02-26 Thread Michael McCandless
+1, exciting!

Mike McCandless

http://blog.mikemccandless.com


On Fri, Feb 23, 2024 at 6:24 AM Chris Hegarty
 wrote:

> Hi,
>
> Since the discussion on bumping the Lucene main branch to Java 21 is
> winding down, let's hold a vote on this important change.
>
> Once bumped, the next major release of Lucene (whenever that will be) will
> require a version of Java greater than or equal to Java 21.
>
> The vote will be open for at least 72 hours (and allow some additional
> time for the weekend) i.e. until 2024-02-28 12:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
> -Chris.
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Bump the Lucene main branch to Java 21

2024-02-21 Thread Michael McCandless
On Wed, Feb 21, 2024 at 7:41 AM Chris Hegarty
 wrote:

> So I think this means we are now free to use all the newfangled language
> features since Java 11 (min required for Lucene 9.x) -> Java 21?
>
> For the _main_ branch, yes.
>
> The _branch_9x_ remains unchanged - it stays on Java 11.
>
> So, if you’re planning to backport a change from main to 9x, then you may
> want to consider what Java language feature and/or JDK API you use - to
> make the backport more straightforward. But this is nothing new, _main_ is
> already on Java 17, while 9x is on Java 11, so the scenario already exists,
> just that the range is changing with this proposal. Hope this helps.
>

Thanks Chris, this makes sense!  So what's new with this change is on main
branch we can now use new language features from Java 17 -> Java 21.  But
on backport to 9.x we must still use only Java 11.

Thanks!

Mike McCandless

http://blog.mikemccandless.com


Re: Bump the Lucene main branch to Java 21

2024-02-21 Thread Michael McCandless
Thank you for the heads up Chris.

So I think this means we are now free to use all the newfangled language
features since Java 11 (min required for Lucene 9.x) -> Java 21?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Feb 21, 2024 at 3:58 AM Chris Hegarty
 wrote:

> Hi,
>
> A number of us have been iterating on a PR to bump the Lucene main branch
> to a minimum of Java 21 [1]. The work is in a good state and is almost
> ready to commit.
>
> While the changes themselves are not large, the impact is arguably larger.
> So I’m raising awareness here with the wider group.
>
> Clearly one could conflate the bump to Java 21 with the question of when
> will Lucene have a next major release, but those issues, while somewhat
> related, are orthogonal. My position is that the next Lucene major should
> be on Java 21, regardless of when that will happen.
>
> Comments, feedback, suggestions welcome.
>
> Thanks,
> -Chris.
>
> [1] https://github.com/apache/lucene/pull/12753
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Welcome Zhang Chao as Lucene committer

2024-02-21 Thread Michael McCandless
Welcome Chao!

Mike McCandless

http://blog.mikemccandless.com


On Wed, Feb 21, 2024 at 5:02 AM Stefan Vodita 
wrote:

> Congratulations, Chao!
>
> On Tue, 20 Feb 2024 at 17:28, Adrien Grand  wrote:
>
>> I'm pleased to announce that Zhang Chao has accepted the PMC's
>> invitation to become a committer.
>>
>> Chao, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations and welcome!
>>
>> --
>> Adrien
>>
>


Re: Announcing githubsearch!

2024-02-21 Thread Michael McCandless
On Tue, Feb 20, 2024 at 10:06 AM Stefan Vodita 
wrote:

Thank you Mike, I really like all the facets!
>

Me too lol.  It was one of the big motivators for me to build this out.
GitHub's search didn't have all the facet drill-downs/up/sideways I
wanted.  Some of them are super useful like "which PRs have review
requested for me
"
or "where am I mentioned
".
Also, GitHub's filter choices do not seem to be dynamically generated for
this query -- so you can pick a filter value and it brings you to 0 hits,
violating the "no dead end" promise of Lucene's facets.

I was also disappointed with GitHub search's lack of hit highlighting, to
solve the "final inch" problem (show me specifically where, in this
massive massive list of comments on a PR/issue, my search terms appear),
and also not showing me the individual comment or code review comment
(multiple ones of those on a PR) where my search terms appear, lack of
linking directly to that comment, etc.  Githubsearch uses Lucene's block
joins to achieve this.

GitHub's search doesn't offer a blended relevance+recency sort, which I
think makes a great default.  It looks like it does support phrase search
(with double-quotes), curious how that works with ngrams.

I do like that the text query language includes all of the sort/filter
criteria -- the "is:open" and "sort:comments-desc".  Githubsearch doens't
support that through the text query language, just the facets UI / REST
query URL.

Anyway, I don't want to complain (too much) about GitHub's search efforts.
Search is clearly hard, and we all (Lucene experts) have a fairly
biased/opinionated take on it all, heh.  I've never met a search engine
that I'm fully happy with ;)

One thing that bothered me about GitHub's own search was that it would
> return
> different results if I wasn't signed in. Maybe it does early stopping for
> non-authenticated users? In any case, this won't be a problem with
> githubsearch.
>

Oh, that is very interesting -- I didn't know that.

Wow, I just tested -- indeed, you cannot even search the source code (for
Lucene's repo anyways) if you are not signed in.  That's weird.

For issues/PRs searching, the three queries I tried seem to produce the
same results signed in or out.  But it is scary/dangerous if this can
differ!!


> Have you considered indexing the Lucene source code too?
>

Oh my, I have not (until now lol).  That's a great idea.  Source code
tokenization would be such a fun problem ... I wonder if GitHub
open-sources how they tokenize the many different languages' source code.
GitHub's code search is in Rust (not using Lucene nor Rucene), a custom
search engine they recently built / switched to:
https://github.blog/2023-02-06-the-technology-behind-githubs-new-code-search,
away from Elasticsearch previously I think.  It looks like they use ngrams,
maybe instead of language-specific tokenization (?), to do the initial
matching/retrieval.  I would try normal lexical tokenization to see if
highlighting could work well.

I opened this luceneserver/GitHubSearch issue
 to think about this
... it'd sure be fun to build and use :)  Thank you for the suggestion
Stefan!

Mike McCandless

http://blog.mikemccandless.com

>


Re: Announcing githubsearch!

2024-02-20 Thread Michael McCandless
Thank you for all the warm feedback everyone, and all the exciting issues
already uncovered / ideas for improvements.  Now I have some more fun work
to do!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 19, 2024 at 12:58 PM Julie Tibshirani 
wrote:

> This is so cool! Thank you Mike for developing and hosting these services!
>
> Julie
>
> On Mon, Feb 19, 2024 at 9:40 AM Michael Wechner 
> wrote:
>
>> thank you very much!
>>
>> Am 19.02.24 um 17:39 schrieb Michael McCandless:
>>
>> Hi Team,
>>
>> ~1.5 years ago (August 2022) we migrated our Lucene issue tracking from
>> Jira to GitHub. Thank you Tomoko for all the hard work doing such a
>> complex, multi-phased, high-fidelity migration!
>>
>> I finally finished also migrating jirasearch to GitHub:
>> githubsearch.mikemccandless.com. It was tricky because GitHub issues/PRs
>> are fundamentally more complex than Jira's data model, and the GitHub REST
>> API is also quite rich / heavily normalized. All of the source code for
>> githubsearch lives here
>> <https://github.com/mikemccand/luceneserver/tree/master/examples/githubsearch>.
>> The UI remains its barebones self ;)
>>
>> Githubsearch
>> <https://github.com/mikemccand/luceneserver/tree/master/examples/githubsearch>
>> is dog food for us: it showcases Lucene (currently 9.8.0), and many of its
>> fun features like infix autosuggest, block join queries (each comment is a
>> sub-document on the issue/PR), DrillSideways faceting, near-real-time
>> indexing/searching, synonyms (try “oome
>> <https://githubsearch.mikemccandless.com/search.py?text=oome=status%3AOpen>”),
>> expressions, non-relevance and blended-relevance sort, etc.  (This old
>> blog post
>> <https://blog.mikemccandless.com/2016/10/jiraseseach-20-dog-food-using-lucene-to.html>
>>  goes
>> into detail.)  Plus, it’s meta-fun to use Lucene to search its own issues,
>> to help us be more productive in improving Lucene!  Nicely recursive.
>>
>> In addition to good ol’ searching by text, githubsearch
>> <https://githubsearch.mikemccandless.com/> has some new/fun features:
>>
>>- Drill down to just PRs or issues
>>- Filter by “review requested” for a given user: poor Adrien has 8
>>(open) now
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=requested_reviewers%3Ajpountz>
>>(sorry)! Or see your mentions (Robert is mentioned in 27 open
>>issues/PRs
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=mentioned_users%3Armuir>).
>>Or PRs that you reviewed (Uwe has reviewed 9 still-open PRs
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=reviewed_users%3Auschindler>).
>>Or issues and PRs where a user has had any involvement at all (Dawid
>>has interacted on 197 issues/PRs
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=reviewed_users%3Adweiss>
>>).
>>- Find still-open PRs that were created by a New Contributor
>>
>> <https://githubsearch.mikemccandless.com/search.py?chg=dds==author_association=New+contributor=0=25792=recentlyUpdated=list=cjhfx60attlt=status%3AOpen=>
>>(an author who has no changes merged into our repository) or
>>Contributor
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=author_association%3AContributor>
>>(non-committer who has had some changes merged into our repository) or
>>Member
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=author_association%3AMember>
>>- Here are the uber-stale (last touched more than a month ago) open
>>PRs by outside contributors
>>
>> <https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated=status%3AOpen=author_association%3ANew+contributor%2CContributor%2CNone=updated_ago%3A%3E+1+month+ago=issue_or_pr%3APR>.
>>We should ideally keep this at 0, but it’s 83 now!
>>- “Link to this search” to get a short-er, more permanent URL (it is
>>NOT a URL shortener, though!)
>>- Save named searches you frequently run (they just save to local
>>cookie state on that one browser)
>>
>> I’m sure there are exciting bugs, feedback/patches welcome!  If you see
>> problems, please reply to this email or file an issue here
>> <https://github.com/mikemccand/luceneserver/issues>.
>>
>> Note that jirasearch <https://jirasearch.mikemccandless.com/search.py>
>> remains running, to search Solr, Tika and Infra issues.
>>
>> Happy Searching,
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>>


Re: Announcing githubsearch!

2024-02-20 Thread Michael McCandless
On Mon, Feb 19, 2024 at 1:00 PM Walter Underwood 
wrote:

It appears to always search prefixes, so there is no way to search for
> “wunder” without getting “wundermap” and “wunderground”. Putting the term
> in quotes doesn’t turn that off.
>

Hmm that shouldn't be the case?  It does split on camel case though (thank
you WordDelimiterFilter!).  E.g. try searching on infix

and
you should see it highlighted inside terms like AnalyzingInfixSuggester.

In fact when I search for wunder

I get a horrible exception, I think I know why (it happens for any query
that gets no hits!).  I opened this issue
.  I'll try to fix
that soon.

Walter, I'm not sure how you were able to even search on "wunder" -- did
you get actual results?  From githubsearch
?

Mike McCandless

http://blog.mikemccandless.com


Re: Announcing githubsearch!

2024-02-20 Thread Michael McCandless
On Tue, Feb 20, 2024 at 6:01 AM Michael Sokolov  wrote:

I love the gray all text UI. Don't change it! But I wonder if it's time for
> a favicon?
>

LOL favicon!  You do NOT want to have to confront my artistic skills!

Mike McCandless

http://blog.mikemccandless.com

>


Announcing githubsearch!

2024-02-19 Thread Michael McCandless
Hi Team,

~1.5 years ago (August 2022) we migrated our Lucene issue tracking from
Jira to GitHub. Thank you Tomoko for all the hard work doing such a
complex, multi-phased, high-fidelity migration!

I finally finished also migrating jirasearch to GitHub:
githubsearch.mikemccandless.com. It was tricky because GitHub issues/PRs
are fundamentally more complex than Jira's data model, and the GitHub REST
API is also quite rich / heavily normalized. All of the source code for
githubsearch lives here
.
The UI remains its barebones self ;)

Githubsearch

is dog food for us: it showcases Lucene (currently 9.8.0), and many of its
fun features like infix autosuggest, block join queries (each comment is a
sub-document on the issue/PR), DrillSideways faceting, near-real-time
indexing/searching, synonyms (try “oome
”),
expressions, non-relevance and blended-relevance sort, etc.  (This old blog
post

goes
into detail.)  Plus, it’s meta-fun to use Lucene to search its own issues,
to help us be more productive in improving Lucene!  Nicely recursive.

In addition to good ol’ searching by text, githubsearch
 has some new/fun features:

   - Drill down to just PRs or issues
   - Filter by “review requested” for a given user: poor Adrien has 8
   (open) now
   

   (sorry)! Or see your mentions (Robert is mentioned in 27 open issues/PRs
   
).
   Or PRs that you reviewed (Uwe has reviewed 9 still-open PRs
   
).
   Or issues and PRs where a user has had any involvement at all (Dawid has
   interacted on 197 issues/PRs
   

   ).
   - Find still-open PRs that were created by a New Contributor
   

   (an author who has no changes merged into our repository) or Contributor
   

   (non-committer who has had some changes merged into our repository) or
   Member
   

   - Here are the uber-stale (last touched more than a month ago) open PRs
   by outside contributors
   
.
   We should ideally keep this at 0, but it’s 83 now!
   - “Link to this search” to get a short-er, more permanent URL (it is NOT
   a URL shortener, though!)
   - Save named searches you frequently run (they just save to local cookie
   state on that one browser)

I’m sure there are exciting bugs, feedback/patches welcome!  If you see
problems, please reply to this email or file an issue here
.

Note that jirasearch 
remains running, to search Solr, Tika and Infra issues.

Happy Searching,

Mike McCandless

http://blog.mikemccandless.com


Re: [VOTE] Release Lucene 9.10.0 RC1

2024-02-19 Thread Michael McCandless
+1

SUCCESS! [0:19:57.370204]

Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 19, 2024 at 6:26 AM Chris Hegarty
 wrote:

>
> +1   SUCCESS! [1:14:49.683559]
>
> -Chris.
>
> > On 15 Feb 2024, at 21:08, Uwe Schindler  wrote:
> >
> > Hi,
> > I used Stefan Vodita's Hack to make the Smoketester run on a large list
> of JDKs: https://github.com/apache/lucene/pull/13108
> > See the console of running Java 11, Java 17, Java 19, Java 20, Java 21.
> Due to limitations of Gradle I wasn't able to do the smoker checks on Java
> 22 release candidate, but as there are no changes to 9.x branch I assume
> that everything also works in Java 22. If anybody else has time to run a
> test project with Java 22 using mmap and vectors it would be great!
> > Log file:
> https://jenkins.thetaphi.de/job/Lucene-Release-Tester-v2/3/console
> > Result was:
> > SUCCESS! [2:42:55.968473]
> >
> > Here is my +1 (binding).
> > Uwe
> >
> > Am 15.02.2024 um 12:50 schrieb Uwe Schindler:
> >> Hi,
> >> I ran the default smoke tester with Java 11 and Java 17 on Policeman
> Jenkins; all looks fine:
> https://jenkins.thetaphi.de/job/Lucene-Release-Tester/32/console
> >> SUCCESS! [1:04:45.740708]
> >> I only have one problem. Now that Java 21 LTS is out and more an more
> people use it, it would be good to also run the smoke tester with Java 21.
> I tried that locally by just passing the home dir of java 21 instead of
> Java 17, but that failed due to some check in smoker.
> >> I will work this evening on patching Smoke tester to also allow it to
> pass Java 21. Maybe the best would be to pass multiple Java versions as
> comma spearated list, just the default one must be Java 11 (the baseline).
> This would allo me to spin Policeman Jenkins with Java 11, Java 17, Java
> 19, Java 20, Java 21 and Java 22-rc1. Takes a while but would make sure all
> works in the officially MR-JAR supported relaeses + LTS.
> >> What do you think.
> >> I will give my +1 later when I checked the options and also looked into
> the downloaded artifacts.
> >> Uwe
> >> Am 14.02.2024 um 20:28 schrieb Adrien Grand:
> >>> Please vote for release candidate 1 for Lucene 9.10.0
> >>>
> >>> The artifacts can be downloaded from:
> >>>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.10.0-RC1-rev-695c0ac84508438302cd346a812cfa2fdc5a10df
> >>>
> >>> You can run the smoke tester directly with this command:
> >>>
> >>> python3 -u dev-tools/scripts/smokeTestRelease.py \
> >>>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.10.0-RC1-rev-695c0ac84508438302cd346a812cfa2fdc5a10df
> >>>
> >>> The vote will be open for at least 72 hours i.e. until 2024-02-17
> 20:00 UTC.
> >>>
> >>> [ ] +1  approve
> >>> [ ] +0  no opinion
> >>> [ ] -1  disapprove (and reason why)
> >>>
> >>> Here is my +1
> >>>
> >>> --
> >>> Adrien
> >> --
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> https://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> > --
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: u...@thetaphi.de
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Lucene 9.10

2024-02-08 Thread Michael McCandless
+1 to release 9.10.  Thank you for volunteering Adrien!

Mike McCandless

http://blog.mikemccandless.com


On Wed, Feb 7, 2024 at 9:57 AM Adrien Grand  wrote:

> Hello all,
>
> It's been 2 months since we released 9.9 and we accumulated a good number
> of changes, so I'd like to propose that we release 9.10.0.
>
> If there are no objections, I volunteer to be the release manager and
> suggest cutting the branch next Monday (February 12th) and starting the
> release process on Wednesday, one week from now (February 14th).
>
> +Uwe Schindler  I remember that there are JDK22-related
> changes that you'd like to get into 9.10, feel free to let me know if this
> timeline doesn't work for you.
>
> --
> Adrien
>


Re: Needs help reviewing on Lucene PostingsFormat memory improvement

2024-02-07 Thread Michael McCandless
Hi Anh Dũng Bùi,

Thank you for tackling these and being so gently patient/persisting!  Sorry
for the delay.  I will try to review them soon.  The off-heap (streaming?)
building of FSTs is really a massive improvement to Lucene, inspired by
Tantivy's FST implementation: https://blog.burntsushi.net/transducers/

Read-time for Lucene90BlockTreePostingsFormat was already off-heap?  And
your PR changes write-time to do so as well?  This will reduce RAM pressure
during indexing which is great.  And some Lucene usages generate incredibly
large FSTs (I'm looking at you HathiTrust!). I don't think we need to
explicitly measure any performance impact before merging?, but let's watch
the nightly benchy to see if there is any measurable impact?

And, yes, Lucene90BlockTreePostingsFormat is the default.  You find the
default codec from Codec.getDefault() and then trace downwards to all its
sources.

Maybe building the synonyms FST (SynonymMap.Builder) would be a good place
for off-heap writing too?

And this exciting PR  (still a
work in progres) would likely strongly benefit from streaming FST building,
since its FSTs will be much larger than the Lucene90BlockTree since it
stores all terms (not just the sampled prefix/index) in a single FST for
the segment.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 1, 2024 at 10:40 PM Anh Dũng Bùi  wrote:

> Hi Lucene devs!
>
> I have 2 PRs to optimize Lucene PostingsFormat
> (Lucene90BlockTreePostingsFormat and FSTPostingsFormat) by utilizing a new
> feature to stream the FST to IndexOutput directly, bypassing the on-heap
> writing:
> - https://github.com/apache/lucene/pull/12980
> - https://github.com/apache/lucene/pull/12985
>
> It would be great if someone can help reviewing. I also have some general
> questions:
> - How do I measure the memory improvement impact in Lucene?
> - Is Lucene90BlockTreePostingsFormat the main index format used in Lucene?
> If not, what is the main format?
> - Are there other places worth using the new streaming FST feature?
>
> Thank you!
> Anh Dung Bui
>


Re: [VOTE] Release Lucene 9.9.2 RC1

2024-01-25 Thread Michael McCandless
+1

SUCCESS! [0:18:29.298410]

Thank you Chris!

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 25, 2024 at 6:57 AM Chris Hegarty
 wrote:

> Please vote for release candidate 1 for Lucene 9.9.2
>
> The artifacts can be downloaded from:
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.2-RC1-rev-a2939784c4ca60bc28bf488b5479c02fc2e5e22c
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.2-RC1-rev-a2939784c4ca60bc28bf488b5479c02fc2e5e22c
>
> The vote will be open for 96 hours ( allowing some additional time for
> weekend span) i.e. until 2024-01-29 12:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
> Draft release notes can be found at
> https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote9_9_2
>
> -Chris.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Welcome Stefan Vodita as Lucene committter

2024-01-18 Thread Michael McCandless
Hi Team,

I'm pleased to announce that Stefan Vodita has accepted the Lucene PMC's
invitation to become a committer!

Stefan, the tradition is that new committers introduce themselves with a
brief bio.

Congratulations, welcome, and thank you for all your improvements to Lucene
and our community,

Mike McCandless

http://blog.mikemccandless.com


Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-21.0.1) - Build # 46049 - Unstable!

2024-01-04 Thread Michael McCandless
Hmm this is an interesting failure ... one of the hits scores is off by a
wee bit (one ULP?).

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 2, 2024 at 7:33 AM Policeman Jenkins Server 
wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/46049/
> Java: 64bit/hotspot/jdk-21.0.1 -XX:+UseCompressedOops -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:
> org.apache.lucene.search.TestBooleanMinShouldMatch.testRandomQueries
>
> Error Message:
> java.lang.AssertionError: Doc 6 scores don't match
> TopDocs totalHits=3 hits top=3
> 0) doc=0score=8.962799
> 1) doc=6score=7.7155914
> 2) doc=4score=7.1769814
> TopDocs totalHits=3 hits top=3
> 0) doc=0score=8.962799
> 1) doc=6score=7.7155924
> 2) doc=4score=7.1769814
> for query:(data:X +data:3 data:3 data:3 data:6 -data:B data:4 data:3
> data:1)~4 expected:<7.7155924> but was:<7.7155914>
>
> Stack Trace:
> java.lang.AssertionError: Doc 6 scores don't match
> TopDocs totalHits=3 hits top=3
> 0) doc=0score=8.962799
> 1) doc=6score=7.7155914
> 2) doc=4score=7.1769814
> TopDocs totalHits=3 hits top=3
> 0) doc=0score=8.962799
> 1) doc=6score=7.7155924
> 2) doc=4score=7.1769814
> for query:(data:X +data:3 data:3 data:3 data:6 -data:B data:4 data:3
> data:1)~4 expected:<7.7155924> but was:<7.7155914>
> at
> __randomizedtesting.SeedInfo.seed([DB978BEEAF5DCEF5:85BC3B029787E36B]:0)
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:577)
> at
> org.apache.lucene.search.TestBooleanMinShouldMatch.assertSubsetOfSameScores(TestBooleanMinShouldMatch.java:384)
> at
> org.apache.lucene.search.TestBooleanMinShouldMatch.testRandomQueries(TestBooleanMinShouldMatch.java:357)
> at
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> at java.base/java.lang.reflect.Method.invoke(Method.java:580)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> 

Heads up: upcoming GitHub action to mark stale Lucene PRs

2024-01-04 Thread Michael McCandless
Hi Team,

Stefan Vodita made an awesome simple PR adding a GitHub action to remind /
nag us about stale PRs: https://github.com/apache/lucene/pull/12813

This happened after an in-person discussion at the last Community Over Code
NA in Halifax where Stefan learned about the nice automation Apache Beam
uses to nudge PRs forward.  This change is just a baby step to try to get
our stale PRs into a healthier state / workflow.

In the ultimate irony, that PR itself had become stale recently (2 weeks of
no activity) -- a "meta-stale PR"!

I would like to merge this PR soon, but:
* It will generate a bunch of one-time noise because we have ~163 open
PRs many of which are stale:
https://githubsearch.mikemccandless.com/search.py?dd=status%3AOpen=issue_or_pr%3APR
* I know nothing about GitHub actions YAML format, but worst comes to
worst we push it, it fails in some exotic way, and we revert.

I assume lazy consensus soon ;)

Mike McCandless

http://blog.mikemccandless.com


Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-19) - Build # 45856 - Unstable!

2023-12-20 Thread Michael McCandless
I'll try to get to the bottom of this Adrien, thanks for digging!

I wonder if we are violating this (subtle) requirement in the
Terms.intersect API:

 Note that the provided startTerm must be accepted by the
automaton.

The failures involving DirectPostingsFormat seem angry that we are indeed
violating this.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Dec 20, 2023 at 6:09 AM Adrien Grand  wrote:

> I don't fully understandi it yet. I opened an issue:
> https://github.com/apache/lucene/issues/12957.
>
> On Tue, Dec 19, 2023 at 6:02 PM Adrien Grand  wrote:
>
>> This looks like a real bug with the default codec when the prefix
>> compares greater than every indexed term. I'll look into it tomorrow if
>> nobody beats me to it.
>>
>> On Tue, Dec 19, 2023 at 12:35 PM Policeman Jenkins Server <
>> jenk...@thetaphi.de> wrote:
>>
>>> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45856/
>>> Java: 64bit/hotspot/jdk-19 -XX:+UseCompressedOops -XX:+UseSerialGC
>>>
>>> 1 tests failed.
>>> FAILED:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom
>>>
>>> Error Message:
>>> java.lang.AssertionError
>>>
>>> Stack Trace:
>>> java.lang.AssertionError
>>> at
>>> __randomizedtesting.SeedInfo.seed([CBF65306049672F4:8785DC72680AA991]:0)
>>> at
>>> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.getState(IntersectTermsEnum.java:245)
>>> at
>>> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.seekToStartTerm(IntersectTermsEnum.java:288)
>>> at
>>> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:126)
>>> at
>>> org.apache.lucene.codecs.lucene90.blocktree.FieldReader.intersect(FieldReader.java:223)
>>> at
>>> org.apache.lucene.index.CheckIndex.checkTermsIntersect(CheckIndex.java:2374)
>>> at
>>> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:2327)
>>> at
>>> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2529)
>>> at
>>> org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1067)
>>> at
>>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:783)
>>> at
>>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:550)
>>> at
>>> org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:340)
>>> at
>>> org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
>>> at
>>> org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:85)
>>> at
>>> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>>> at java.base/java.lang.reflect.Method.invoke(Method.java:578)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>>> at
>>> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>>> at
>>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>>> at
>>> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>>> at
>>> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>>> at
>>> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>> at
>>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>> at
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>>> at
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>>> at
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
>>> at
>>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>>> at
>>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>>   

Re: [JENKINS] Lucene » Lucene-Check-9.x - Build # 7187 - Still Unstable!

2023-12-20 Thread Michael McCandless
I'll try to bottom this one out.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Dec 20, 2023 at 6:25 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/7187/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom
>
> Error Message:
> java.lang.AssertionError
>
> Stack Trace:
> java.lang.AssertionError
> at
> __randomizedtesting.SeedInfo.seed([C8D1EBB5035DA9F:40FE91CF3CA901FA]:0)
> at
> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField$DirectIntersectTermsEnum.(DirectPostingsFormat.java:1055)
> at
> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.intersect(DirectPostingsFormat.java:655)
> at
> org.apache.lucene.index.CheckIndex.checkTermsIntersect(CheckIndex.java:2391)
> at
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:2344)
> at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2550)
> at
> org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1083)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:797)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:564)
> at
> org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:345)
> at
> org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
> at
> org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:85)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> 

Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-17.0.9) - Build # 45858 - Still Unstable!

2023-12-19 Thread Michael McCandless
Oh this is the new (awesome) check Adrien recent added to CheckIndex, so
maybe this check is catching some pre-existing bugs in one of our
(hopefully experimental, not default!) codecs?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 19, 2023 at 3:43 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hmm anyone know why this test suddenly started failing...?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Dec 19, 2023 at 3:35 PM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45858/
>> Java: 64bit/hotspot/jdk-17.0.9 -XX:-UseCompressedOops -XX:+UseG1GC
>>
>> 1 tests failed.
>> FAILED:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom
>>
>> Error Message:
>> java.lang.AssertionError
>>
>> Stack Trace:
>> java.lang.AssertionError
>> at
>> __randomizedtesting.SeedInfo.seed([3D3B000BC0883C14:71488F7FAC14E771]:0)
>> at
>> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.getState(IntersectTermsEnum.java:245)
>> at
>> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.seekToStartTerm(IntersectTermsEnum.java:288)
>> at
>> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:126)
>> at
>> org.apache.lucene.codecs.lucene90.blocktree.FieldReader.intersect(FieldReader.java:223)
>> at
>> org.apache.lucene.tests.index.AssertingLeafReader$AssertingTerms.intersect(AssertingLeafReader.java:191)
>> at
>> org.apache.lucene.index.CheckIndex.checkTermsIntersect(CheckIndex.java:2374)
>> at
>> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:2327)
>> at
>> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2529)
>> at
>> org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1067)
>> at
>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:783)
>> at
>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:550)
>> at
>> org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:340)
>> at
>> org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
>> at
>> org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:85)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at
>> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at
>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at
>> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at
>> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at
>> com.

Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-17.0.9) - Build # 45858 - Still Unstable!

2023-12-19 Thread Michael McCandless
Hmm anyone know why this test suddenly started failing...?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 19, 2023 at 3:35 PM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45858/
> Java: 64bit/hotspot/jdk-17.0.9 -XX:-UseCompressedOops -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom
>
> Error Message:
> java.lang.AssertionError
>
> Stack Trace:
> java.lang.AssertionError
> at
> __randomizedtesting.SeedInfo.seed([3D3B000BC0883C14:71488F7FAC14E771]:0)
> at
> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.getState(IntersectTermsEnum.java:245)
> at
> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.seekToStartTerm(IntersectTermsEnum.java:288)
> at
> org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:126)
> at
> org.apache.lucene.codecs.lucene90.blocktree.FieldReader.intersect(FieldReader.java:223)
> at
> org.apache.lucene.tests.index.AssertingLeafReader$AssertingTerms.intersect(AssertingLeafReader.java:191)
> at
> org.apache.lucene.index.CheckIndex.checkTermsIntersect(CheckIndex.java:2374)
> at
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:2327)
> at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2529)
> at
> org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1067)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:783)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:550)
> at
> org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:340)
> at
> org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
> at
> org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:85)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> 

Re: [VOTE] Release Lucene 9.9.1 RC1

2023-12-14 Thread Michael McCandless
On Thu, Dec 14, 2023 at 11:33 AM Adrien Grand  wrote:

Thanks Chris for taking care of this release.
>

+1!

And sorry about the respin...

Mike McCandless

http://blog.mikemccandless.com


Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1209 - Unstable!

2023-12-14 Thread Michael McCandless
Ha, no worries, it was a good suggestion / idea!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Dec 11, 2023 at 12:47 PM Adrien Grand  wrote:

> Woops, sorry for suggesting this change in the first place! I didn't know
> we had this validation for points, but not for postings.
>
> On Fri, Dec 8, 2023 at 2:16 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> OK I reverted the "optimization" to not pull FieldInfo for a field when
>> getting Points values from SlowCompositeCodecReaderWrapper!  Clearly it was
>> not safe ;)
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Dec 8, 2023 at 8:06 AM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> Uh oh -- I'll dig.  We may need to put back the FieldInfo check before
>>> pulling points.  Tricky!
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Dec 8, 2023 at 3:55 AM Apache Jenkins Server <
>>> jenk...@builds.apache.org> wrote:
>>>
>>>> Build:
>>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1209/
>>>>
>>>> 3 tests failed.
>>>> FAILED:  org.apache.lucene.index.TestPointValues.testSparsePoints
>>>>
>>>> Error Message:
>>>> java.lang.IllegalStateException: this writer hit an unrecoverable
>>>> error; cannot merge
>>>>
>>>> Stack Trace:
>>>> java.lang.IllegalStateException: this writer hit an unrecoverable
>>>> error; cannot merge
>>>> at
>>>> __randomizedtesting.SeedInfo.seed([ADA30A2081CE6DA4:A05414293C35A568]:0)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.hasPendingMerges(IndexWriter.java:2425)
>>>> at
>>>> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.hasPendingMerges(IndexWriter.java:6527)
>>>> at
>>>> org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:589)
>>>> at
>>>> org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:540)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.executeMerge(IndexWriter.java:2315)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2310)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5985)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.flushNextBuffer(IndexWriter.java:3606)
>>>> at
>>>> org.apache.lucene.tests.index.RandomIndexWriter.flushAllBuffersSequentially(RandomIndexWriter.java:263)
>>>> at
>>>> org.apache.lucene.tests.index.RandomIndexWriter.maybeFlushOrCommit(RandomIndexWriter.java:235)
>>>> at
>>>> org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:226)
>>>> at
>>>> org.apache.lucene.index.TestPointValues.testSparsePoints(TestPointValues.java:697)
>>>> at
>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> at
>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>>>> at
>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>>>> at
>>>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>>>> at
>>>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>>>> at
>>>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>>>> at
>>>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>>>> at
>>>> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>>>> at
>>>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>>>> at
>>>> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>>>> at

Re: [JENKINS] Lucene-MMAPv2-Linux (64bit/hotspot/jdk-17.0.9) - Build # 1537 - Failure!

2023-12-14 Thread Michael McCandless
Timeout.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Dec 14, 2023 at 9:09 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/1537/
> Java: 64bit/hotspot/jdk-17.0.9 -XX:-UseCompressedOops -XX:+UseShenandoahGC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: Running tests with streaming console output but NOT verbose?

2023-12-14 Thread Michael McCandless
the test
> options and their defaults are displayed with:
>
> gradlew -p lucene\core testOpts
>
> Dawid
>
> On Tue, Dec 12, 2023 at 8:45 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Ahh thanks for the quick explanation and temporary solution Dawid!.
>> Naming is the hardest part :)
>>
>> I think long ago we used to call it "-Dtests.stdout=true" or so?  Not the
>> greatest name tho.  Maybe "tests.liveConsoleOut"?  "tests.liveConsole"?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Dec 12, 2023 at 2:31 PM Dawid Weiss 
>> wrote:
>>
>>>
>>> This is actually an accidental (?) clash between Lucene's system
>>> property and what's in defaults-tests.gradle.
>>> You can manually prepend true || ... to the following in
>>> defaults-tests.gradle.
>>>
>>> def verboseMode = resolvedTestOption("tests.verbose").toBoolean()
>>>
>>> I can't remember why it aligns with Lucene's logger. Maybe it
>>> should/could be a
>>> separate property? I find it difficult to come up with a reasonable name
>>> though.
>>>
>>> D.
>>>
>>> On Tue, Dec 12, 2023 at 8:03 PM Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> This is prolly a Dawid question...
>>>>
>>>> Sometimes I want to run a test (like a slow Monster test), seeing its
>>>> ongoing musings popping out on the console in "real time" (not buffered).
>>>>
>>>> I can do this today by adding "-Dtests.verbose=true" to the ./gradlew
>>>> invocation that's running the test.
>>>>
>>>> But that also turns on LuceneTestCase.VERBOSE which sometimes produces
>>>> insane amounts of mostly not helpful content.
>>>>
>>>> Is there any way to do the first (stream console output) without the
>>>> second (mega verbosity enabled)?
>>>>
>>>> Thanks,
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>


Re: [VOTE] Release Lucene 9.9.1 RC1

2023-12-14 Thread Michael McCandless
+1

SUCCESS! [0:14:52.296147]


I also cracked a bit of rust off our Monster tests and all but one passed:
https://github.com/apache/lucene/pull/12942

Mike McCandless

http://blog.mikemccandless.com


On Wed, Dec 13, 2023 at 4:24 PM Benjamin Trent 
wrote:

> SUCCESS! [1:06:02.232333]
>
> + 1!
>
> On Wed, Dec 13, 2023 at 3:26 PM Greg Miller  wrote:
>
>> SUCCESS! [2:27:01.875939]
>>
>> +1
>>
>> Thanks!
>> -Greg
>>
>> On Wed, Dec 13, 2023 at 3:58 AM Chris Hegarty
>>  wrote:
>>
>>> And (short) release note:
>>>
>>>   https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote9_9_1
>>>
>>> -Chris.
>>>
>>> > On 13 Dec 2023, at 11:55, Chris Hegarty <
>>> christopher.hega...@elastic.co> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Please vote for release candidate 1 for Lucene 9.9.1
>>> >
>>> > The artifacts can be downloaded from:
>>> >
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.1-RC1-rev-eee32cbf5e072a8c9d459c349549094230038308
>>> >
>>> > You can run the smoke tester directly with this command:
>>> >
>>> > python3 -u dev-tools/scripts/smokeTestRelease.py \
>>> >
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.1-RC1-rev-eee32cbf5e072a8c9d459c349549094230038308
>>> >
>>> > The vote will be open for at least 72 hours i.e. until 2023-12-16
>>> 12:00 UTC.
>>> >
>>> > [ ] +1  approve
>>> > [ ] +0  no opinion
>>> > [ ] -1  disapprove (and reason why)
>>> >
>>> > Here is my +1
>>> >
>>> > -Chris.
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>


Re: [JENKINS] Lucene-9.x-MacOSX (64bit/hotspot/jdk-17.0.9) - Build # 3226 - Failure!

2023-12-14 Thread Michael McCandless
Build timed out.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Dec 13, 2023 at 7:50 PM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/3226/
> Java: 64bit/hotspot/jdk-17.0.9 -XX:+UseCompressedOops -XX:+UseG1GC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: Running tests with streaming console output but NOT verbose?

2023-12-12 Thread Michael McCandless
Ahh thanks for the quick explanation and temporary solution Dawid!.  Naming
is the hardest part :)

I think long ago we used to call it "-Dtests.stdout=true" or so?  Not the
greatest name tho.  Maybe "tests.liveConsoleOut"?  "tests.liveConsole"?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 12, 2023 at 2:31 PM Dawid Weiss  wrote:

>
> This is actually an accidental (?) clash between Lucene's system property
> and what's in defaults-tests.gradle.
> You can manually prepend true || ... to the following in
> defaults-tests.gradle.
>
> def verboseMode = resolvedTestOption("tests.verbose").toBoolean()
>
> I can't remember why it aligns with Lucene's logger. Maybe it should/could
> be a
> separate property? I find it difficult to come up with a reasonable name
> though.
>
> D.
>
> On Tue, Dec 12, 2023 at 8:03 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Team,
>>
>> This is prolly a Dawid question...
>>
>> Sometimes I want to run a test (like a slow Monster test), seeing its
>> ongoing musings popping out on the console in "real time" (not buffered).
>>
>> I can do this today by adding "-Dtests.verbose=true" to the ./gradlew
>> invocation that's running the test.
>>
>> But that also turns on LuceneTestCase.VERBOSE which sometimes produces
>> insane amounts of mostly not helpful content.
>>
>> Is there any way to do the first (stream console output) without the
>> second (mega verbosity enabled)?
>>
>> Thanks,
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>


Running tests with streaming console output but NOT verbose?

2023-12-12 Thread Michael McCandless
Hi Team,

This is prolly a Dawid question...

Sometimes I want to run a test (like a slow Monster test), seeing its
ongoing musings popping out on the console in "real time" (not buffered).

I can do this today by adding "-Dtests.verbose=true" to the ./gradlew
invocation that's running the test.

But that also turns on LuceneTestCase.VERBOSE which sometimes produces
insane amounts of mostly not helpful content.

Is there any way to do the first (stream console output) without the second
(mega verbosity enabled)?

Thanks,

Mike McCandless

http://blog.mikemccandless.com


Re: The need for a Lucene 9.9.1 release

2023-12-12 Thread Michael McCandless
OK this is merged now.  Are there any other 9.9.1 blockers?  I am trying to
pass all Monster tests but that can probably just run concurrently with
voting (optimistic concurrency!)?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 12, 2023 at 9:18 AM Chris Hegarty
 wrote:

> Hi Mike,
>
> > On 12 Dec 2023, at 12:56, Michael McCandless 
> wrote:
> >
> > Hi Chris,
> >
> > I think we should also regenerate the FSTs for 9.9.1?
>
> Seems reasonable.
>
> > https://github.com/apache/lucene/pull/12924
>
> I added my comments and review on the PR.
>
> -Chris.
>
> > Thanks,
> >
> > Mike
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: The need for a Lucene 9.9.1 release

2023-12-12 Thread Michael McCandless
Hi Chris,

I think we should also regenerate the FSTs for 9.9.1?

https://github.com/apache/lucene/pull/12924

Thanks,

Mike

On Tue, Dec 12, 2023 at 7:54 AM Guo Feng  wrote:

> Heads up:
>
> The bug fix PR (https://github.com/apache/lucene/pull/12900) has been
> merged to main, and backported to lucene_9x & lucene_9_9.
>
> On 2023/12/11 20:27:48 Chris Hegarty wrote:
> > Just a quick update on this...
> >
> > > On 9 Dec 2023, at 09:09, Chris Hegarty 
> wrote:
> > >
> > > Hi,
> > >
> > > We’ve encounter two very serious issues with the recent Lucene 9.9.0
> release, both of which (even if taken by themselves) would warrant a 9.9.1.
> The issues are:
> > >
> > > 1. https://github.com/apache/lucene/issues/12895 - Corruption read on
> term dictionaries in Lucene 9.9
> >
> > Great work has been done re-adding tests, creating a new test to
> reproduce, and also working on an underlying fix. It feels like we’re
> getting close! :-)
> >
> > > 2. https://github.com/apache/lucene/issues/12898 - JVM SIGSEGV crash
> when compiling computeCommonPrefixLengthAndBuildHistogram Lucene 9.9.0
> >
> > Merged to branch_9_9.
> >
> > Once no.1 is merged, I’ll build a 9.9.1 RC1 and start a vote.
> >
> > -Chris
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1209 - Unstable!

2023-12-08 Thread Michael McCandless
OK I reverted the "optimization" to not pull FieldInfo for a field when
getting Points values from SlowCompositeCodecReaderWrapper!  Clearly it was
not safe ;)

Mike McCandless

http://blog.mikemccandless.com


On Fri, Dec 8, 2023 at 8:06 AM Michael McCandless 
wrote:

> Uh oh -- I'll dig.  We may need to put back the FieldInfo check before
> pulling points.  Tricky!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Dec 8, 2023 at 3:55 AM Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
>> Build:
>> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1209/
>>
>> 3 tests failed.
>> FAILED:  org.apache.lucene.index.TestPointValues.testSparsePoints
>>
>> Error Message:
>> java.lang.IllegalStateException: this writer hit an unrecoverable error;
>> cannot merge
>>
>> Stack Trace:
>> java.lang.IllegalStateException: this writer hit an unrecoverable error;
>> cannot merge
>> at
>> __randomizedtesting.SeedInfo.seed([ADA30A2081CE6DA4:A05414293C35A568]:0)
>> at
>> org.apache.lucene.index.IndexWriter.hasPendingMerges(IndexWriter.java:2425)
>> at
>> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.hasPendingMerges(IndexWriter.java:6527)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:589)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:540)
>> at
>> org.apache.lucene.index.IndexWriter.executeMerge(IndexWriter.java:2315)
>> at
>> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2310)
>> at
>> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5985)
>> at
>> org.apache.lucene.index.IndexWriter.flushNextBuffer(IndexWriter.java:3606)
>> at
>> org.apache.lucene.tests.index.RandomIndexWriter.flushAllBuffersSequentially(RandomIndexWriter.java:263)
>> at
>> org.apache.lucene.tests.index.RandomIndexWriter.maybeFlushOrCommit(RandomIndexWriter.java:235)
>> at
>> org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:226)
>> at
>> org.apache.lucene.index.TestPointValues.testSparsePoints(TestPointValues.java:697)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at
>> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at
>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at
>> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at
>> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>> at
>> co

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1209 - Unstable!

2023-12-08 Thread Michael McCandless
Uh oh -- I'll dig.  We may need to put back the FieldInfo check before
pulling points.  Tricky!

Mike McCandless

http://blog.mikemccandless.com


On Fri, Dec 8, 2023 at 3:55 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1209/
>
> 3 tests failed.
> FAILED:  org.apache.lucene.index.TestPointValues.testSparsePoints
>
> Error Message:
> java.lang.IllegalStateException: this writer hit an unrecoverable error;
> cannot merge
>
> Stack Trace:
> java.lang.IllegalStateException: this writer hit an unrecoverable error;
> cannot merge
> at
> __randomizedtesting.SeedInfo.seed([ADA30A2081CE6DA4:A05414293C35A568]:0)
> at
> org.apache.lucene.index.IndexWriter.hasPendingMerges(IndexWriter.java:2425)
> at
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.hasPendingMerges(IndexWriter.java:6527)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:589)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:540)
> at
> org.apache.lucene.index.IndexWriter.executeMerge(IndexWriter.java:2315)
> at
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2310)
> at
> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5985)
> at
> org.apache.lucene.index.IndexWriter.flushNextBuffer(IndexWriter.java:3606)
> at
> org.apache.lucene.tests.index.RandomIndexWriter.flushAllBuffersSequentially(RandomIndexWriter.java:263)
> at
> org.apache.lucene.tests.index.RandomIndexWriter.maybeFlushOrCommit(RandomIndexWriter.java:235)
> at
> org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:226)
> at
> org.apache.lucene.index.TestPointValues.testSparsePoints(TestPointValues.java:697)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> 

Re: [JENKINS] Lucene-MMAPv2-Linux (64bit/openj9/jdk-17.0.8) - Build # 1500 - Unstable!

2023-12-07 Thread Michael McCandless
Oh, nevermind -- we have seen it before, and added a comment on the
upstream (Open J9) issue:
https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1795093834

Mike McCandless

http://blog.mikemccandless.com


On Thu, Dec 7, 2023 at 8:32 AM Michael McCandless 
wrote:

> Hmm -- this looks like maybe another Open J9 specific failure?  I have not
> seen this one before I think...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Dec 1, 2023 at 10:20 PM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/1500/
>> Java: 64bit/openj9/jdk-17.0.8 -XX:-UseCompressedOops -Xgcpolicy:gencon
>>
>> 1 tests failed.
>> FAILED:
>> org.apache.lucene.index.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
>>
>> Error Message:
>> java.lang.AssertionError
>>
>> Stack Trace:
>> java.lang.AssertionError
>> at __randomizedtesting.SeedInfo.seed([701E8537ED3618D8]:0)
>> at app//org.junit.Assert.fail(Assert.java:87)
>> at app//org.junit.Assert.assertTrue(Assert.java:42)
>> at app//org.junit.Assert.assertTrue(Assert.java:53)
>> at
>> app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
>> at
>> java.base@17.0.8.1/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
>> at
>> java.base@17.0.8.1/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
>> at
>> app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
>>
>> -
>> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: builds-h...@lucene.apache.org
>
>


Re: [JENKINS] Lucene-MMAPv2-Linux (64bit/openj9/jdk-17.0.8) - Build # 1500 - Unstable!

2023-12-07 Thread Michael McCandless
Hmm -- this looks like maybe another Open J9 specific failure?  I have not
seen this one before I think...

Mike McCandless

http://blog.mikemccandless.com


On Fri, Dec 1, 2023 at 10:20 PM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/1500/
> Java: 64bit/openj9/jdk-17.0.8 -XX:-UseCompressedOops -Xgcpolicy:gencon
>
> 1 tests failed.
> FAILED:
> org.apache.lucene.index.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
>
> Error Message:
> java.lang.AssertionError
>
> Stack Trace:
> java.lang.AssertionError
> at __randomizedtesting.SeedInfo.seed([701E8537ED3618D8]:0)
> at app//org.junit.Assert.fail(Assert.java:87)
> at app//org.junit.Assert.assertTrue(Assert.java:42)
> at app//org.junit.Assert.assertTrue(Assert.java:53)
> at
> app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
> at
> java.base@17.0.8.1/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
> at
> java.base@17.0.8.1/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
> at
> app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-11.0.21) - Build # 14204 - Unstable!

2023-12-01 Thread Michael McCandless
Hmm this reproduces for me, and looks new/unique.  Could it be related to
recent 9.9.0 changes / release blocker?

Mike

On Fri, Dec 1, 2023 at 3:33 PM Policeman Jenkins Server 
wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/14204/
> Java: 64bit/hotspot/jdk-11.0.21 -XX:+UseCompressedOops -XX:+UseParallelGC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.index.TestParallelLeafReader.testQueries
>
> Error Message:
> org.junit.ComparisonFailure: expected: but was:
>
> Stack Trace:
> org.junit.ComparisonFailure: expected: but was:
> at
> __randomizedtesting.SeedInfo.seed([6CA57EA3FB50CA0D:302BB278E1397FA3]:0)
> at org.junit.Assert.assertEquals(Assert.java:117)
> at org.junit.Assert.assertEquals(Assert.java:146)
> at
> org.apache.lucene.index.TestParallelLeafReader.queryTest(TestParallelLeafReader.java:263)
> at
> org.apache.lucene.index.TestParallelLeafReader.testQueries(TestParallelLeafReader.java:48)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> 

Re: [VOTE] Release Lucene 9.9.0 RC2

2023-12-01 Thread Michael McCandless
+1


SUCCESS! [0:20:12.297376]


Mike McCandless

http://blog.mikemccandless.com


On Fri, Dec 1, 2023 at 9:21 AM Uwe Schindler  wrote:

> Hi,
>
> I let Policeman Jenkins run the smoke tester with Java 11 and Java 17
> (unfortunately we have no support for 21 yet, so new MMap and Vectors were
> not tested). But this was tested long enough, so I trust everything. I just
> did some cross-checking and validated the MR-JAR to contain all classes and
> that Javadocs are uptodate. Looks fine after the manual review.
>
> Here is Policeman's work and opinion: SUCCESS! [1:02:37.749085] (
> https://jenkins.thetaphi.de/job/Lucene-Release-Tester/30/console)
>
> Here is my personal opinion:
> +1 to release
>
> Uwe
> Am 30.11.2023 um 18:31 schrieb Chris Hegarty:
>
> Please vote for release candidate 2 for Lucene 9.9.0
>
>
> The artifacts can be downloaded from:
>
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.0-RC2-rev-06070c0dceba07f0d33104192d9ac98ca16fc500
>
>
> You can run the smoke tester directly with this command:
>
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.0-RC2-rev-06070c0dceba07f0d33104192d9ac98ca16fc500
>
>
> The vote will be open for at least 72 hours, and given the weekend in
> between, let’s keep it open until 2023-12-04 12:00 UTC.
>
> [ ] +1  approve
>
> [ ] +0  no opinion
>
> [ ] -1  disapprove (and reason why)
>
>
> Here is my +1
>
>
> -Chris.
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>


Re: [VOTE] Release Lucene 9.9.0 RC1

2023-11-30 Thread Michael McCandless
On Thu, Nov 30, 2023 at 9:56 AM Chris Hegarty
 wrote:

P.S. I’m less sure about this, but the RC 2 starts a 72hr voting time
> again? (Just so I know what TTL to put on that)
>

Yeah a new 72 hour clock starts with each new RC :)

Mike McCandless

http://blog.mikemccandless.com


Re: GitHub issues vs PRs vs Lucene's CHANGES.txt

2023-11-30 Thread Michael McCandless
Well, I created a starting tool to at least help us keep the
what-should-be-identical-yet-is-nearly-impossible-for-us-to-achieve
sections in CHANGES.txt in sync: https://github.com/apache/lucene/pull/12860

Right now it finds a number of mostly minor differences in the 9.9.0
sections in main vs branch_9_9:

NOTE: resolving branch_9_9 -->
https://raw.githubusercontent.com/apache/lucene/branch_9_9/lucene/CHANGES.txt
NOTE: resolving main -->
https://raw.githubusercontent.com/apache/lucene/main/lucene/CHANGES.txt
15a16,18
> * GITHUB#12646, GITHUB#12690: Move FST#addNode to FSTCompiler to avoid a
circular dependency
>   between FST and FSTCompiler (Anh Dung Bui)
>
27,30c30
< * GITHUB#12646, GITHUB#12690: Move FST#addNode to FSTCompiler to avoid a
circular dependency
<   between FST and FSTCompiler (Anh Dung Bui)
<
< * GITHUB#12709 Consolidate FSTStore and BytesStore in FST. Created
FSTReader which contains the common methods
---
> * GITHUB#12709: Consolidate FSTStore and BytesStore in FST. Created
FSTReader which contains the common methods
33,34d32
< * GITHUB#12735: Remove FSTCompiler#getTermCount() and
FSTCompiler.UnCompiledNode#inputCount (Anh Dung Bui)
<
37a36,37
> * GITHUB#12735: Remove FSTCompiler#getTermCount() and
FSTCompiler.UnCompiledNode#inputCount (Anh Dung Bui)
>
166a167,168
> * GITHUB#12748: Specialize arc store for continuous label in FST. (Guo
Feng, Zhang Chao)
>
173,177d174
< * GITHUB#12748: Specialize arc store for continuous label in FST. (Guo
Feng, Chao Zhang)
<
< * GITHUB#12825, GITHUB#12834: Hunspell: improved dictionary loading
performance, allowed in-memory entry sorting.
<   (Peter Gromov)
<
185,186d181
<
< * GITHUB#12552: Make FSTPostingsFormat load FSTs off-heap. (Tony X)


Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 29, 2023 at 6:01 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Oh, and that the CHANGES.txt entries in e.g. 9.9.0 section match on 9.x
> and main branches... I think that one we have some automation to catch?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 29, 2023 at 5:58 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Team,
>>
>> I see Chris is tagging issues that were left open after their linked PRs
>> were merged (thanks!).
>>
>> Is there something in our release tooling that cross-checks all the
>> weakly linked metadata today: Milestone marked (or more often: not) on an
>> issue vs commits to the respective branches vs location in Lucene's
>> CHANGES.txt vs open/closed issue matching the linked PRs?
>>
>> It seems like some simple automation here could catch mistakes.  E.g. I'm
>> uncertain I properly moved all the FST related CHANGES.txt entries to the
>> right places.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>


Re: [VOTE] Release Lucene 9.9.0 RC1

2023-11-30 Thread Michael McCandless
+1 to release.

I hit a corner-case test failure and opened a PR to fix it:
https://github.com/apache/lucene/pull/12859

I don't think this should block the release? -- it looks exotic.

Thanks Chris!

Mike McCandless

http://blog.mikemccandless.com


On Thu, Nov 30, 2023 at 1:16 AM Patrick Zhai  wrote:

> SUCCESS! [1:03:54.880200]
>
> +1. Thank you Chris!
>
> On Wed, Nov 29, 2023 at 8:45 PM Nhat Nguyen 
> wrote:
>
>> SUCCESS! [1:11:30.037919]
>>
>> +1. Thanks, Chris!
>>
>> On Wed, Nov 29, 2023 at 8:53 AM Chris Hegarty
>>  wrote:
>>
>>> Hi,
>>>
>>>
>>> Please vote for release candidate 1 for Lucene 9.9.0
>>>
>>>
>>> The artifacts can be downloaded from:
>>>
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.0-RC1-rev-92a5e5b02e0e083126c4122f2b7a02426c21a037
>>>
>>>
>>> You can run the smoke tester directly with this command:
>>>
>>>
>>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>>
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.0-RC1-rev-92a5e5b02e0e083126c4122f2b7a02426c21a037
>>>
>>>
>>> The vote will be open for at least 72 hours, and given the weekend in
>>> between, let’s it open until 2023-12-04 12:00 UTC.
>>>
>>>
>>> [ ] +1  approve
>>>
>>> [ ] +0  no opinion
>>>
>>> [ ] -1  disapprove (and reason why)
>>>
>>>
>>> Here is my +1
>>>
>>>
>>> Draft release highlights can be viewed here (comments and feedback
>>> welcome):
>>> https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote9_9_0
>>>
>>> -Chris.
>>>
>>


Re: [JENKINS] Lucene » Lucene-Check-main - Build # 10750 - Unstable!

2023-11-30 Thread Michael McCandless
I hit this one running the smoke tester on 9.9.0 RC 0, and it repros.  I'll
open an issue ... I think it's just a missing null check in the
SlowCompositeCodecReaderWrapper.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 28, 2023 at 6:37 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/10750/
>
> 2 tests failed.
> FAILED:
> org.apache.lucene.search.TestPointQueries.testAllPointDocsWereDeletedAndThenMergedAgain
>
> Error Message:
> java.io.IOException: background merge hit exception:
> _3(10.0.0):C2:[diagnostics={lucene.version=10.0.0, source=merge,
> os.arch=amd64, java.runtime.version=17.0.7+7, mergeFactor=1, os=Linux,
> java.vendor=Eclipse Adoptium, os.version=5.4.0-167-generic,
> timestamp=1701213730838,
> mergeMaxNumSegments=1}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
> :id=4i1dci11qy6ymdhw6cirss6iz into _4 [maxNumSegments=1]
>
> Stack Trace:
> java.io.IOException: background merge hit exception:
> _3(10.0.0):C2:[diagnostics={lucene.version=10.0.0, source=merge,
> os.arch=amd64, java.runtime.version=17.0.7+7, mergeFactor=1, os=Linux,
> java.vendor=Eclipse Adoptium, os.version=5.4.0-167-generic,
> timestamp=1701213730838,
> mergeMaxNumSegments=1}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
> :id=4i1dci11qy6ymdhw6cirss6iz into _4 [maxNumSegments=1]
> at
> __randomizedtesting.SeedInfo.seed([F9345050587589D4:B625E461757388B1]:0)
> at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2170)
> at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2099)
> at
> org.apache.lucene.search.TestPointQueries.testAllPointDocsWereDeletedAndThenMergedAgain(TestPointQueries.java:1212)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> 

Re: GitHub issues vs PRs vs Lucene's CHANGES.txt

2023-11-29 Thread Michael McCandless
Oh, and that the CHANGES.txt entries in e.g. 9.9.0 section match on 9.x and
main branches... I think that one we have some automation to catch?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 29, 2023 at 5:58 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi Team,
>
> I see Chris is tagging issues that were left open after their linked PRs
> were merged (thanks!).
>
> Is there something in our release tooling that cross-checks all the weakly
> linked metadata today: Milestone marked (or more often: not) on an issue vs
> commits to the respective branches vs location in Lucene's CHANGES.txt vs
> open/closed issue matching the linked PRs?
>
> It seems like some simple automation here could catch mistakes.  E.g. I'm
> uncertain I properly moved all the FST related CHANGES.txt entries to the
> right places.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


GitHub issues vs PRs vs Lucene's CHANGES.txt

2023-11-29 Thread Michael McCandless
Hi Team,

I see Chris is tagging issues that were left open after their linked PRs
were merged (thanks!).

Is there something in our release tooling that cross-checks all the weakly
linked metadata today: Milestone marked (or more often: not) on an issue vs
commits to the respective branches vs location in Lucene's CHANGES.txt vs
open/closed issue matching the linked PRs?

It seems like some simple automation here could catch mistakes.  E.g. I'm
uncertain I properly moved all the FST related CHANGES.txt entries to the
right places.

Mike McCandless

http://blog.mikemccandless.com


Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-19) - Build # 14180 - Failure!

2023-11-29 Thread Michael McCandless
JVM crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f9fa8545493, pid=2982126, tid=2990096
#
# JRE version: OpenJDK Runtime Environment (19.0+36) (build 19+36-2238)
# Java VM: OpenJDK 64-Bit Server VM (19+36-2238, mixed mode, sharing,
tiered, compressed oops, compressed class ptrs, serial gc,
linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xb45493]
PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0xf3
#
# No core dump will be written. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting Java
again
#
# An error report file with more information is saved as:
# 
/home/jenkins/workspace/Lucene-9.x-Linux/lucene/core/build/tmp/tests-cwd/hs_err_pid2982126.log
#
# Compiler replay data is saved as:
# 
/home/jenkins/workspace/Lucene-9.x-Linux/lucene/core/build/tmp/tests-cwd/replay_pid2982126.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#


Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 29, 2023 at 3:49 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/14180/
> Java: 64bit/hotspot/jdk-19 -XX:+UseCompressedOops -XX:+UseSerialGC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1199 - Unstable!

2023-11-28 Thread Michael McCandless
OK I pushed a fix.

Mike

On Tue, Nov 28, 2023 at 7:32 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> I think maybe LuceneTestCase.newSearcher is turning on concurrency
> (setting the executor randomly).  Since this test explicitly passes a "no
> concurrency" collector manager I think we should switch to "new
> IndexSearcher(...)".
>
> Mike
>
> On Tue, Nov 28, 2023 at 7:29 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> This reproduces for me.
>>
>> Maybe related to LUCENE-10002 / #240?
>>
>> Mike
>>
>> On Tue, Nov 28, 2023 at 1:58 AM Apache Jenkins Server <
>> jenk...@builds.apache.org> wrote:
>>
>>> Build:
>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1199/
>>>
>>> 1 tests failed.
>>> FAILED:  org.apache.lucene.search.TestTopFieldCollector.testSort
>>>
>>> Error Message:
>>> java.lang.IllegalStateException: This TopFieldCollectorManager was
>>> created without concurrency (supportsConcurrency=false), but multiple
>>> collectors are being created
>>>
>>> Stack Trace:
>>> java.lang.IllegalStateException: This TopFieldCollectorManager was
>>> created without concurrency (supportsConcurrency=false), but multiple
>>> collectors are being created
>>> at
>>> __randomizedtesting.SeedInfo.seed([4B0B913D92123C6D:1AEEB914595F267D]:0)
>>> at
>>> org.apache.lucene.search.TopFieldCollectorManager.newCollector(TopFieldCollectorManager.java:142)
>>> at
>>> org.apache.lucene.search.TopFieldCollectorManager.newCollector(TopFieldCollectorManager.java:31)
>>> at
>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:623)
>>> at
>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:607)
>>> at
>>> org.apache.lucene.search.TestTopFieldCollector.testSort(TestTopFieldCollector.java:124)
>>> at
>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> at
>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>>> at
>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>>> at
>>> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>>> at
>>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>>> at
>>> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>>> at
>>> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>>> at
>>> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>> at
>>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>> at
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>>> at
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>>> at
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
>>> at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Random

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1199 - Unstable!

2023-11-28 Thread Michael McCandless
I think maybe LuceneTestCase.newSearcher is turning on concurrency (setting
the executor randomly).  Since this test explicitly passes a "no
concurrency" collector manager I think we should switch to "new
IndexSearcher(...)".

Mike

On Tue, Nov 28, 2023 at 7:29 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> This reproduces for me.
>
> Maybe related to LUCENE-10002 / #240?
>
> Mike
>
> On Tue, Nov 28, 2023 at 1:58 AM Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
>> Build:
>> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1199/
>>
>> 1 tests failed.
>> FAILED:  org.apache.lucene.search.TestTopFieldCollector.testSort
>>
>> Error Message:
>> java.lang.IllegalStateException: This TopFieldCollectorManager was
>> created without concurrency (supportsConcurrency=false), but multiple
>> collectors are being created
>>
>> Stack Trace:
>> java.lang.IllegalStateException: This TopFieldCollectorManager was
>> created without concurrency (supportsConcurrency=false), but multiple
>> collectors are being created
>> at
>> __randomizedtesting.SeedInfo.seed([4B0B913D92123C6D:1AEEB914595F267D]:0)
>> at
>> org.apache.lucene.search.TopFieldCollectorManager.newCollector(TopFieldCollectorManager.java:142)
>> at
>> org.apache.lucene.search.TopFieldCollectorManager.newCollector(TopFieldCollectorManager.java:31)
>> at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:623)
>> at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:607)
>> at
>> org.apache.lucene.search.TestTopFieldCollector.testSort(TestTopFieldCollector.java:124)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at
>> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at
>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at
>> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at
>> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
>> at
>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>> at
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverr

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1199 - Unstable!

2023-11-28 Thread Michael McCandless
This reproduces for me.

Maybe related to LUCENE-10002 / #240?

Mike

On Tue, Nov 28, 2023 at 1:58 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1199/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.search.TestTopFieldCollector.testSort
>
> Error Message:
> java.lang.IllegalStateException: This TopFieldCollectorManager was created
> without concurrency (supportsConcurrency=false), but multiple collectors
> are being created
>
> Stack Trace:
> java.lang.IllegalStateException: This TopFieldCollectorManager was created
> without concurrency (supportsConcurrency=false), but multiple collectors
> are being created
> at
> __randomizedtesting.SeedInfo.seed([4B0B913D92123C6D:1AEEB914595F267D]:0)
> at
> org.apache.lucene.search.TopFieldCollectorManager.newCollector(TopFieldCollectorManager.java:142)
> at
> org.apache.lucene.search.TopFieldCollectorManager.newCollector(TopFieldCollectorManager.java:31)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:623)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:607)
> at
> org.apache.lucene.search.TestTopFieldCollector.testSort(TestTopFieldCollector.java:124)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> 

Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-19) - Build # 45643 - Failure!

2023-11-25 Thread Michael McCandless
Hmm JVM crashed (there's an hs_err file there):

> Process 'Gradle Test Executor 33' finished with non-zero exit value 134

Mike McCandless

http://blog.mikemccandless.com


On Sat, Nov 25, 2023 at 6:34 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45643/
> Java: 64bit/hotspot/jdk-19 -XX:+UseCompressedOops -XX:+UseParallelGC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: Lucene 9.9.0 Release

2023-11-21 Thread Michael McCandless
+1

Thank you for volunteering as RC Chris!

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 21, 2023 at 4:52 AM Chris Hegarty
 wrote:

> Hi,
>
> It's been a while since the 9.8.0 release and we’ve accumulated quite a
> few changes. I’d like to propose that we release 9.9.0.
>
> If there's no objections, I volunteer to be the release manager and will
> cut the feature branch a week from now, 12:00 28th Nov UTC.
>
> -Chris.
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [JENKINS] Lucene-9.x-MacOSX (64bit/hotspot/jdk-11.0.21) - Build # 3165 - Still Failing!

2023-11-20 Thread Michael McCandless
Build timed out (after 169 minutes). Marking the build as aborted.
Build timed out (after 169 minutes). Marking the build as failed.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 20, 2023 at 5:01 PM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/3165/
> Java: 64bit/hotspot/jdk-11.0.21 -XX:-UseCompressedOops -XX:+UseSerialGC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: [JENKINS] Lucene-MMAPv2-Linux (64bit/hotspot/jdk-17.0.9) - Build # 1442 - Failure!

2023-11-14 Thread Michael McCandless
Hmm again timeout.  Something seems amiss.  Do our super slow tests still
print out HEARTBEAT periodically?  Or did we lose that in the gradle
migration maybe?

Build timed out (after 126 minutes). Marking the build as aborted.
Build timed out (after 126 minutes). Marking the build as failed.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 14, 2023 at 7:59 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/1442/
> Java: 64bit/hotspot/jdk-17.0.9 -XX:+UseCompressedOops -XX:+UseShenandoahGC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-17.0.9) - Build # 45536 - Failure!

2023-11-14 Thread Michael McCandless
Thanks Uwe.

OK so this might just be a high-sigma outlier-ish event due to unluckily
slow seed selection?

I wonder whether the distribution of total run time of each full "./gradlew
test" on each JVM configuration is roughly Gaussian-ish?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 14, 2023 at 5:40 AM Uwe Schindler  wrote:

> Hi,
>
> Actually this is the default JVM, so its not OpenJ9 or another EA
> release.It could be one of the tests haging, but we can't figure that out.
>
> P.S.: Jenkins kills jobs, if they take longer than usual it kills it (it
> has no hard limit, it takes the average time of previous runs and if one
> takes much longer it kills).
>
> Uwe
>
> Am 14.11.2023 um 11:06 schrieb Michael McCandless:
>
> Hmm build timed out -- not sure why it's taking so long to run tests:
>
> Build timed out (after 137 minutes). Marking the build as aborted.
> Build timed out (after 137 minutes). Marking the build as failed.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Nov 14, 2023 at 1:39 AM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45536/
>> Java: 64bit/hotspot/jdk-17.0.9 -XX:+UseCompressedOops -XX:+UseShenandoahGC
>>
>> All tests passed
>>
>> -
>> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: builds-h...@lucene.apache.org
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>


Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-17.0.9) - Build # 45536 - Failure!

2023-11-14 Thread Michael McCandless
Hmm build timed out -- not sure why it's taking so long to run tests:

Build timed out (after 137 minutes). Marking the build as aborted.
Build timed out (after 137 minutes). Marking the build as failed.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 14, 2023 at 1:39 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45536/
> Java: 64bit/hotspot/jdk-17.0.9 -XX:+UseCompressedOops -XX:+UseShenandoahGC
>
> All tests passed
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: Boolean field type

2023-11-13 Thread Michael McCandless
Hi Michael/Mikhails, yet another Mike here:

If you create a NumericDocValuesField, and it only ever has one value per
doc (0, 1), I think the default Codec will compress it well, though maybe
not as well as your idea.  It's a neat idea to notice a "very common
default value" and not store it and just store the sparse non-default
values.  I don't think Codec does that today.

For the search-time opto, I thought somewhere in Lucene we do something
like your idea, converting to a negation if it has lower estimated
cardinality than the positive form.  It might only be for points?  If the
field were stored as postings, you could consult the metadata in the terms
dictionary to know the cardinality of each case, and perhaps that the field
is single valued and fully set (no missing values), at which point your
optimization logic might be able to apply during rewrite maybe?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 10, 2023 at 6:05 PM Michael Froh  wrote:

> Thanks Mikhail and Mike!
>
> Mikhail, since you replied, I remembered your work on block joins in Solr
> (thank you for that, by the way!), which reminded me that it's not unusual
> for docs in a Lucene index to "mix" their schemata, like in parent/child
> blocks. If 90% of parent docs are "true" on a Boolean field, but the field
> doesn't exist for the child docs, my suggested approach would probably see
> "true" as the sparse value (assuming there are at least as many children as
> parents). Ideally, I would want to only track the "false" parents (and
> leave the field off of the children).
>
> Indeed the idea of a "required" field in Lucene is tricky (though Mike's
> suggestion of missing defaults could help). Even worse, I think we'd also
> need a way to enforce "exactly one value", since a "Boolean" term field can
> really have four states -- true, false, neither, or both.
>
> It feels like there's not a workable short-term solution to implement
> something like this as a regular IndexableField in Lucene (or at least I'm
> not seeing it).
>
> I don't think I see enough value to pitch the idea of adding a new
> field-like "thing" (that would have exactly one value for every doc, and
> maybe could be counted relative to docs in a block). For now, I think it's
> probably only practical to implement something like this as part of a
> schema definition in one of the higher-level search servers.
>
> Thanks for the discussion!
> Froh
>
> On Thu, Nov 9, 2023 at 5:01 AM Michael Sokolov  wrote:
>
>> Can you require the user to specify missing: true or missing: false
>> semantics. With that you can decide what to do with the missing values
>>
>> On Thu, Nov 9, 2023, 7:55 AM Mikhail Khludnev  wrote:
>>
>>> Hello Michael.
>>> This optimization "NOT the less common value" assumes that boolean field
>>> is required, but how to enforce this mandatory field constraint in Lucene?
>>> I'm not aware of something like Solr schema or mapping.
>>> If saying foo:true is common, it means that the posting list goes like
>>> dense sequentially increasing numbers 1,2,3,4,5.. May it already be
>>> compressed by codecs like
>>> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/util/packed/MonotonicBlockPackedWriter.html
>>> ?
>>>
>>> On Thu, Nov 9, 2023 at 3:31 AM Michael Froh  wrote:
>>>
 Hey,

 I've been musing about ideas for a "clever" Boolean field type on
 Lucene for a while, and I think I might have an idea that could work. That
 said, this popped into my head this afternoon and has not been fully-baked.
 It may not be very clever at all.

 My experience is that Boolean fields tend to be overwhelmingly true or
 overwhelmingly false. I've had pretty good luck with using a keyword-style
 field, where the only term represents the more sparse value. (For example,
 I did a thing years ago with explicit tombstones, where versioned deletes
 would have the field "deleted" with a value of "true", and live
 documents didn't have the deleted field at all. Every query would add a
 filter on "NOT deleted:true".)

 That's great when you know up-front what the sparse value is going to
 be. Working on OpenSearch, I just created an issue suggesting that we take
 a hint from users for which value they think is going to be more common so
 we only index the less common one:
 https://github.com/opensearch-project/OpenSearch/issues/11143

 At the Lucene level, though, we could index a Boolean field type as the
 less common term when we flush (by counting the values and figuring out
 which is less common). Then, per segment, we can rewrite any query for the
 more common value as NOT the less common value.

 You can compute upper/lower bounds on the value frequencies cheaply
 during a merge, so I think you could usually write the doc IDs for the less
 common value directly (without needing to count them first), even when
 input segments disagree on 

Re: Healthy PR Approaches from Apache Beam

2023-11-13 Thread Michael McCandless
Thanks Stefan!

Mike McCandless

http://blog.mikemccandless.com


On Sat, Nov 11, 2023 at 5:22 AM Stefan Vodita 
wrote:

> Thank you for going through all those PRs Mike!
> I opened an issue for porting some of the bot functionality:
> https://github.com/apache/lucene/issues/12796
>
> Stefan
>
>
> On Thu, 2 Nov 2023 at 15:30, Michael McCandless 
> wrote:
>
>> Thanks for raising this Stefan.  This is an impressive approach to more
>> rigorously responding on PRs and taking them through their lifecycle,
>> giving a better community experience especially for newcomers.  I love
>> their docs too.
>>
>> Those graphs are awesome!  Much better than the simple PR open/closed
>> count chart we have in our nightlies:
>> https://home.apache.org/~mikemccand/lucenebench/github_pr_counts.html
>>
>> I just made a pass through some of our PRs (sorted oldest to newest, and
>> sorry for all the dev list noise!) and it's sad how many PRs we (Lucene dev
>> community) really should have responded to, but failed to, in a
>> timely manner.  I think something like the Apache Beam bot could help this,
>> though we don't really document attaching labels to newly opened PRs.
>>
>> I wonder what baby step we could adopt from Beam's approach to PRs?
>> Maybe open an issue on GitHub so we can discuss?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Oct 31, 2023 at 5:39 AM Stefan Vodita 
>> wrote:
>>
>>> Hi all,
>>>
>>> I recently learned a few interesting things that the Beam
>>> <https://github.com/apache/beam> project does to
>>> promote and maintain good interactions on PRs.
>>>
>>> 1. Community metrics dashboard
>>> <http://35.193.202.176/d/code_velocity/code-velocity?orgId=1>. The
>>> graphs are pretty and insightful. You can
>>>see things like the number of open PRs across time or the mean time to
>>>first interaction on a new PR.
>>>
>>> 2. Life cycle management for PRs.
>>> a. A bot labels the PR and assigns reviewers based on the labels
>>> (example
>>> <https://github.com/apache/beam/pull/26424#issuecomment-1522788593>).
>>> b. Authors can run and re-run the pre-commit checks (doc
>>> <https://github.com/apache/beam/blob/master/CONTRIBUTING.md#create-a-pull-request>
>>> ).
>>> c. If the PR is not reviewed within 3 business days, the author is
>>> encouraged to notify the mailing list (doc
>>> <https://github.com/apache/beam/blob/master/CONTRIBUTING.md#get-reviewed>
>>> ).
>>> d. If the PR doesn't have activity, the bot comments on it, warning
>>> that it
>>> will be closed (example
>>> <https://github.com/apache/beam/pull/26424#issuecomment-1671254755>).
>>>
>>> It's hard for me to tell which of these ideas would translate well to the
>>> Lucene community, but we can try out something small, like an automated
>>> comment
>>> on stale PRs.
>>>
>>>
>>> Stefan
>>>
>>>
>>> https://github.com/apache/beam
>>> http://35.193.202.176/d/code_velocity/code-velocity?orgId=1
>>> https://github.com/apache/beam/pull/26424#issuecomment-1522788593
>>>
>>> https://github.com/apache/beam/blob/master/CONTRIBUTING.md#create-a-pull-request
>>> https://github.com/apache/beam/blob/master/CONTRIBUTING.md#get-reviewed
>>> https://github.com/apache/beam/pull/26424#issuecomment-1671254755
>>>
>>>


Re: [JENKINS] Lucene-9.x-Linux (64bit/openj9/jdk-11.0.20) - Build # 14015 - Still Failing!

2023-11-13 Thread Michael McCandless
I linked to this thread on the upstream (OpenJ9) issue about recent Lucene
CI build failures: https://github.com/eclipse-openj9/openj9/issues/18400

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 13, 2023 at 3:09 AM Dawid Weiss  wrote:

>
> Sure, thanks. What's strange is that we don't use add-opens anywhere, I
> think (there is a mention of it I left in one of the
> comments, but nothing else across the codebase uses this directive).
>
> > Task :lucene:distribution.tests:compileTestJava
> warning: [options] --add-opens has no effect at compile time
>
>
>
> On Sun, Nov 12, 2023 at 10:56 PM Uwe Schindler  wrote:
>
>> Will check tomorrow, it's too late now.
>>
>> On Jenkins there were no windows builds with IBM and Java 11 yet:
>> https://jenkins.thetaphi.de/job/Lucene-9.x-Windows/
>> Am 12.11.2023 um 22:00 schrieb Dawid Weiss:
>>
>>
>> Hi Uwe,
>>
>> Can you reproduce this on Windows with the same JVM versions though?
>> Seems like I have exactly the same setup and yet this works for me just
>> fine. Strange.
>>
>> Dawid
>>
>> On Sun, Nov 12, 2023 at 9:52 PM Uwe Schindler  wrote:
>>
>>> This one was my first idea, too.
>>>
>>> It fails only with IBM Semeru in combination with Gradle using Temurin.
>>>
>>> I will dig tomorrow on Jenkins server and print all debug info.
>>>
>>> Uwe
>>>
>>>
>>> Am 12. November 2023 21:48:54 MEZ schrieb Dawid Weiss <
>>> dawid.we...@gmail.com>:
>>>

 I can't reproduce this though - used exactly the same JVMs (on Windows):

 > gradlew :lucene:distribution.tests:compileTestJava --rerun-tasks
 --console=plain
 Generating gradle.properties
 ...
 > Task :altJvmWarning
 NOTE: Alternative java toolchain will be used for compilation and tests:
   Project will use 11 (IBM JDK 11.0.20.1+1, home at:
 c:\_tmp\jdk-11.0.20.1+1)
   Gradle runs with 11 (Eclipse Temurin JDK 11.0.21+9, home at:
 C:\_tmp\jdk-11.0.21+9)
 ...
 > Task :lucene:distribution.tests:compileJava NO-SOURCE
 > Task :lucene:distribution.tests:classes UP-TO-DATE
 > Task :lucene:distribution.tests:compileTestJava

 BUILD SUCCESSFUL in 23s
 5 actionable tasks: 5 executed

 On main branch it works, no idea why:
>

 O thought it's because of this:

 https://github.com/apache/lucene/commit/2e12a35c876a

 but I don't think so... seems to work for me on Windows on branch_9x
 just fine?

 D.

>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://www.thetaphi.de
>>>
>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>


Welcome Patrick Zhai to the Lucene PMC

2023-11-10 Thread Michael McCandless
I'm happy to announce that Patrick Zhai has accepted an invitation to join
the Lucene Project Management Committee (PMC)!

Congratulations Patrick, thank you for all your hard work improving
Lucene's community and source code, and welcome aboard!

Mike McCandless

http://blog.mikemccandless.com


Re: [JENKINS] Lucene » Lucene-Check-9.x - Build # 6861 - Failure!

2023-11-06 Thread Michael McCandless
I pushed a fix.

Curious -- the build indeed fails if I use Java 11 on 9.x, but passes if I
use Java 17+.

I'm really confused.  Did the javadoc checking get weaker with newer JDKs?
Anyway, I'll port this fix to main.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 6, 2023 at 10:27 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/6861/
>
> No tests ran.
>
> Build Log:
> [...truncated 496 lines...]
> BUILD FAILED in 2m 8s
> 300 actionable tasks: 300 executed
>
> Publishing build scan...
> https://ge.apache.org/s/embosyxjwma5i
>
> Build step 'Invoke Gradle script' changed build result to FAILURE
> Build step 'Invoke Gradle script' marked build as failure
> Archiving artifacts
> Recording test results
> ERROR: Step ‘Publish JUnit test result report’ failed: No test report
> files were found. Configuration error?
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: [JENKINS] Lucene » Lucene-Check-9.x - Build # 6861 - Failure!

2023-11-06 Thread Michael McCandless
Woops -- I'll fix.  renderJavadoc failure!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 6, 2023 at 10:27 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/6861/
>
> No tests ran.
>
> Build Log:
> [...truncated 496 lines...]
> BUILD FAILED in 2m 8s
> 300 actionable tasks: 300 executed
>
> Publishing build scan...
> https://ge.apache.org/s/embosyxjwma5i
>
> Build step 'Invoke Gradle script' changed build result to FAILURE
> Build step 'Invoke Gradle script' marked build as failure
> Archiving artifacts
> Recording test results
> ERROR: Step ‘Publish JUnit test result report’ failed: No test report
> files were found. Configuration error?
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45394 - Unstable!

2023-11-06 Thread Michael McCandless
On Sun, Nov 5, 2023 at 5:01 AM Uwe Schindler  wrote:

> I will update the J9 runtime later this day. But this was a real bug, so
> it's good it catched this :-) So - no - I won't remove OpenJ9 support at
> all.
>

I see, that's great that J9 build is indeed catching real Lucene bugs!  +1
to keep running it in CI builds.


> The errors someties happen are bugs, they might get better with latest
> versions. I see there's no waslo a Java 20 version. I will give it a try,
> too - especially regarding Panama (+ Vector). Want to see how it behaves.
>
+1

Thanks Uwe.

Mike McCandless

http://blog.mikemccandless.com
>
>


Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45394 - Unstable!

2023-11-04 Thread Michael McCandless
OK I opened https://github.com/eclipse-openj9/openj9/issues/18400 -- let's
see where that goes.

Uwe, should we upgrade to the latest OpenJ9 again maybe?

Mike McCandless

http://blog.mikemccandless.com


On Sat, Nov 4, 2023 at 12:25 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Should we maybe stop testing J9?  Reduce its frequency?  So much noise ...
>
> I know I can filter these out from my gmail box.
>
> I will try opening an issue in the OpenJ9 GitHub repo:
> https://github.com/eclipse-openj9/openj9/issues
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 3, 2023 at 7:43 PM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45394/
>> Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:metronome
>>
>> 2 tests failed.
>> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testReferenceSize
>>
>> Error Message:
>> java.lang.AssertionError: For 64 bit JVMs, reference size must be 8,
>> unless compressed references are enabled expected:<8> but was:<4>
>>
>> Stack Trace:
>> java.lang.AssertionError: For 64 bit JVMs, reference size must be 8,
>> unless compressed references are enabled expected:<8> but was:<4>
>> at
>> __randomizedtesting.SeedInfo.seed([91923EC152043BB:15B168BF99C02E62]:0)
>> at app//org.junit.Assert.fail(Assert.java:89)
>> at app//org.junit.Assert.failNotEquals(Assert.java:835)
>> at app//org.junit.Assert.assertEquals(Assert.java:647)
>> at
>> app//org.apache.lucene.util.TestRamUsageEstimator.testReferenceSize(TestRamUsageEstimator.java:195)
>> at 
>> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at java.base@17.0.5
>> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at java.base@17.0.5
>> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base@17.0.5
>> /java.lang.reflect.Method.invoke(Method.java:568)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at
>> app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at
>> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at
>> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at
>> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at
>> app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at
>> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
>> at
>> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> app//org.apache.lucene.tests.util.TestRuleSt

Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45394 - Unstable!

2023-11-04 Thread Michael McCandless
Should we maybe stop testing J9?  Reduce its frequency?  So much noise ...

I know I can filter these out from my gmail box.

I will try opening an issue in the OpenJ9 GitHub repo:
https://github.com/eclipse-openj9/openj9/issues

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 3, 2023 at 7:43 PM Policeman Jenkins Server 
wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45394/
> Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:metronome
>
> 2 tests failed.
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testReferenceSize
>
> Error Message:
> java.lang.AssertionError: For 64 bit JVMs, reference size must be 8,
> unless compressed references are enabled expected:<8> but was:<4>
>
> Stack Trace:
> java.lang.AssertionError: For 64 bit JVMs, reference size must be 8,
> unless compressed references are enabled expected:<8> but was:<4>
> at
> __randomizedtesting.SeedInfo.seed([91923EC152043BB:15B168BF99C02E62]:0)
> at app//org.junit.Assert.fail(Assert.java:89)
> at app//org.junit.Assert.failNotEquals(Assert.java:835)
> at app//org.junit.Assert.assertEquals(Assert.java:647)
> at
> app//org.apache.lucene.util.TestRamUsageEstimator.testReferenceSize(TestRamUsageEstimator.java:195)
> at 
> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at java.base@17.0.5
> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at java.base@17.0.5
> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@17.0.5
> /java.lang.reflect.Method.invoke(Method.java:568)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> 

Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45409 - Unstable!

2023-11-04 Thread Michael McCandless
Likely J9 specific?

Mike McCandless

http://blog.mikemccandless.com


On Sat, Nov 4, 2023 at 11:34 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45409/
> Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:gencon
>
> 2 tests failed.
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testQuery
>
> Error Message:
> java.lang.AssertionError: expected:<1160.0> but was:<1760.0>
>
> Stack Trace:
> java.lang.AssertionError: expected:<1160.0> but was:<1760.0>
> at
> __randomizedtesting.SeedInfo.seed([7FE6F0C1668A9364:F49784B29700A9B1]:0)
> at app//org.junit.Assert.fail(Assert.java:89)
> at app//org.junit.Assert.failNotEquals(Assert.java:835)
> at app//org.junit.Assert.assertEquals(Assert.java:555)
> at app//org.junit.Assert.assertEquals(Assert.java:685)
> at
> app//org.apache.lucene.util.TestRamUsageEstimator.testQuery(TestRamUsageEstimator.java:189)
> at 
> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at java.base@17.0.5
> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at java.base@17.0.5
> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@17.0.5
> /java.lang.reflect.Method.invoke(Method.java:568)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   

Re: [JENKINS] Lucene-main-Windows (64bit/openj9/jdk-17.0.5) - Build # 13400 - Unstable!

2023-11-04 Thread Michael McCandless
Maybe J9 specific?

Mike McCandless

http://blog.mikemccandless.com


On Sat, Nov 4, 2023 at 11:01 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Windows/13400/
> Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:gencon
>
> 2 tests failed.
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testReferenceSize
>
> Error Message:
> java.lang.AssertionError: For 64 bit JVMs, reference size must be 8,
> unless compressed references are enabled expected:<8> but was:<4>
>
> Stack Trace:
> java.lang.AssertionError: For 64 bit JVMs, reference size must be 8,
> unless compressed references are enabled expected:<8> but was:<4>
> at
> __randomizedtesting.SeedInfo.seed([41AB595A28A8656B:5D031209A44808B2]:0)
> at app//org.junit.Assert.fail(Assert.java:89)
> at app//org.junit.Assert.failNotEquals(Assert.java:835)
> at app//org.junit.Assert.assertEquals(Assert.java:647)
> at
> app//org.apache.lucene.util.TestRamUsageEstimator.testReferenceSize(TestRamUsageEstimator.java:195)
> at 
> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at java.base@17.0.5
> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at java.base@17.0.5
> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@17.0.5
> /java.lang.reflect.Method.invoke(Method.java:568)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> 

Re: Squash vs merge of PRs

2023-11-04 Thread Michael McCandless
I didn't realize the community had decided squashing (rewriting history)
was our standard.

> Comparing histories between branches with git-bisect to find bugs is just
one example.

But if the bug was introduced in one of the N local commits the developer
had done, wouldn't that be helpful?  You could see that one commit instead
of all N squashed, and get better context on how/why the bug was introduced?

I would prefer history-preserving commits.  It can reveal/preserve
important information -- like we tried one approach, and discovered some
issue, tweaked it to a better approach.  This can be useful in the future
if someone is working on that part of the code and is trying to understand
why it was done a certain way.  It preserves the natural and healthy
iterations we all experience when working closely together.  Why discard
such possibly helpful history?

Also, one can always wear hazy glasses in the future to "summarize" the
full history down to a view that's more palatable to them personally, if
you don't like seeing merge commit branching.  But we cannot do the
reverse.  Discarding the actual development history is a one-way door.

http://blog.mikemccandless.com


On Sat, Nov 4, 2023 at 11:03 AM Gus Heck  wrote:

> Also, since (as noted) this is a previously decided issue, not sure why
> this is a list email instead of a simple direct query to Robert seeking to
> understand the specific case? No need to make a public discussion unless
> it's a long term pattern, actually breaking something, or we want to change
> something?
>
> On Sat, Nov 4, 2023 at 9:37 AM Benjamin Trent 
> wrote:
>
>> TL;DR, forcing non-committers to squash things is a good idea. Enforcing
>> through some measure for committers is a bad idea.
>>
>> Since this thread is now in Robert's spam, I am guessing it won't have
>> any impact :). I do not think Robert is actively trying hurt the project in
>> any way. It seems to me that he doesn't think a clean git history is worth
>> the effort.
>>
>> Having a clean git history makes things easier for everyone. Comparing
>> histories between branches with git-bisect to find bugs is just one
>> example. Another is simply reading commits to see when
>> features/bug fixes/etc. were added.
>>
>> I do NOT think we should add procedures or branch protections to actively
>> enforce this.
>>
>> Small personal sacrifices (like dealing with commit conflicts) are
>> necessary for a community. Being part of a community is about buying into
>> what the community is about and working towards a common goal. Many times
>> we do things we don't agree with, or make things slightly more difficult
>> for us, for the community as a whole. This thing being OSS shows that we
>> all buy into its importance and are willing to put work into the project.
>>
>> Having a cultural default of "make things nice for others" is good.
>> Enforcing this ideology on others is antithesis to its definition.
>>
>>
>>
>> On Sat, Nov 4, 2023 at 9:02 AM Robert Muir  wrote:
>>
>>> This isn't a community issue, it is me avoiding useless unnecessary
>>> merge conflicts. Word "community" is invoked here to try to make it
>>> out, like you can hold a vote about what git commands i should type on
>>> my computer? You know that isn't gonna work. have some humility.
>>>
>>> thread moved to spam.
>>>
>>> On Sat, Nov 4, 2023 at 8:36 AM Mike Drob  wrote:
>>> >
>>> > We all agree on using Java though, and using a specific version, and
>>> even the style output from gradle tidy. Is that nanny state or community
>>> consensus?
>>> >
>>> > On Sat, Nov 4, 2023 at 7:29 AM Robert Muir  wrote:
>>> >>
>>> >> example of a nanny state IMO, trying to dictate what git commands to
>>> >> use, or what editor to use. Maybe this works for you in your corporate
>>> >> hellholes, but I think some folks have a bit of a power issue, are
>>> >> accustomed to dictacting this stuff to their employees and so on, but
>>> >> this is open-source. I don't report to you, i dont use the editor you
>>> >> tell me, or the git commands you tell me.
>>> >>
>>> >> On Sat, Nov 4, 2023 at 8:21 AM Uwe Schindler  wrote:
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > I just wanted to give your attention to the following discussion:
>>> >> > https://github.com/apache/lucene/pull/12737#issuecomment-1793426911
>>> >> >
>>> >> >  From my knowledge the Lucene (and Solr) community decided a while
>>> back
>>> >> > to disable merging and only allow squashig of PRs. Robert always did
>>> >> > this, but because of a one-time problem with two branches he was
>>> working
>>> >> > on in parallel, he suddenly changed his mind and did merges on his
>>> own,
>>> >> > not sqashing the branch and pushing to ASF Git.
>>> >> >
>>> >> > I am also not a fan of removing all history, but especially for
>>> heavy
>>> >> > committing branches like the given PR, I think we should invite our
>>> >> > committers to also adhere to community standards everyone else
>>> >> > practices. I would agree with merging those 

Re: Healthy PR Approaches from Apache Beam

2023-11-02 Thread Michael McCandless
Thanks for raising this Stefan.  This is an impressive approach to more
rigorously responding on PRs and taking them through their lifecycle,
giving a better community experience especially for newcomers.  I love
their docs too.

Those graphs are awesome!  Much better than the simple PR open/closed count
chart we have in our nightlies:
https://home.apache.org/~mikemccand/lucenebench/github_pr_counts.html

I just made a pass through some of our PRs (sorted oldest to newest, and
sorry for all the dev list noise!) and it's sad how many PRs we (Lucene dev
community) really should have responded to, but failed to, in a
timely manner.  I think something like the Apache Beam bot could help this,
though we don't really document attaching labels to newly opened PRs.

I wonder what baby step we could adopt from Beam's approach to PRs?  Maybe
open an issue on GitHub so we can discuss?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Oct 31, 2023 at 5:39 AM Stefan Vodita 
wrote:

> Hi all,
>
> I recently learned a few interesting things that the Beam
>  project does to
> promote and maintain good interactions on PRs.
>
> 1. Community metrics dashboard
> . The graphs
> are pretty and insightful. You can
>see things like the number of open PRs across time or the mean time to
>first interaction on a new PR.
>
> 2. Life cycle management for PRs.
> a. A bot labels the PR and assigns reviewers based on the labels
> (example
> ).
> b. Authors can run and re-run the pre-commit checks (doc
> 
> ).
> c. If the PR is not reviewed within 3 business days, the author is
> encouraged to notify the mailing list (doc
> 
> ).
> d. If the PR doesn't have activity, the bot comments on it, warning
> that it
> will be closed (example
> ).
>
> It's hard for me to tell which of these ideas would translate well to the
> Lucene community, but we can try out something small, like an automated
> comment
> on stale PRs.
>
>
> Stefan
>
>
> https://github.com/apache/beam
> http://35.193.202.176/d/code_velocity/code-velocity?orgId=1
> https://github.com/apache/beam/pull/26424#issuecomment-1522788593
>
> https://github.com/apache/beam/blob/master/CONTRIBUTING.md#create-a-pull-request
> https://github.com/apache/beam/blob/master/CONTRIBUTING.md#get-reviewed
> https://github.com/apache/beam/pull/26424#issuecomment-1671254755
>
>


Re: [JENKINS] Lucene-9.x-Linux (64bit/openj9/jdk-17.0.5) - Build # 13732 - Unstable!

2023-10-31 Thread Michael McCandless
Wow, thank you Adrien!  Cascaded merges count as uncommitted changes ...

Mike McCandless

http://blog.mikemccandless.com


On Tue, Oct 31, 2023 at 4:51 AM Adrien Grand  wrote:

> I pushed a fix for these failures:
> https://github.com/apache/lucene/commit/85f5d3bb0bf84fed46ca4c093c1aa084e4a43873
>
> On Fri, Oct 27, 2023 at 9:55 AM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/13732/
>> Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:gencon
>>
>> 1 tests failed.
>> FAILED:  org.apache.lucene.index.TestIndexWriter.testHasUncommittedChanges
>>
>> Error Message:
>> java.lang.AssertionError
>>
>> Stack Trace:
>> java.lang.AssertionError
>> at
>> __randomizedtesting.SeedInfo.seed([63AADDD55C51D4C2:45E1C0A475266832]:0)
>> at app//org.junit.Assert.fail(Assert.java:87)
>> at app//org.junit.Assert.assertTrue(Assert.java:42)
>> at app//org.junit.Assert.assertFalse(Assert.java:65)
>> at app//org.junit.Assert.assertFalse(Assert.java:75)
>> at
>> app//org.apache.lucene.index.TestIndexWriter.testHasUncommittedChanges(TestIndexWriter.java:2400)
>> at 
>> java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at java.base@17.0.5
>> /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at java.base@17.0.5
>> /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base@17.0.5
>> /java.lang.reflect.Method.invoke(Method.java:568)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at
>> app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at
>> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at
>> app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at
>> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at
>> app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at
>> app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
>> at
>> app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
>> at
>> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>> at
>> app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at
>> 

Re: [JENKINS] Lucene » Lucene-Coverage-main - Build # 937 - Unstable!

2023-10-29 Thread Michael McCandless
OK I pushed a fix:
https://github.com/apache/lucene/commit/11436a848cbcc8302b31ca01e63e57dd35a93b1e

Mike

On Sat, Oct 28, 2023 at 3:10 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/937/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.util.automaton.TestAutomaton.testRandomFinite
>
> Error Message:
> java.lang.IllegalArgumentException: This builder doesn't allow terms that
> are larger than 1000 characters, got [e3 8e be e3 8f 80 e3 8c 92 e3 8c a5
> e3 8f b4 e3 8d 89 e3 8c 89 e3 8f 82 e3 8f b4 e3 8c b5 f0 90 8a a2 f0 90 8a
> bb f0 90 8b 83 f0 90 8a ab f0 90 8a b9 f0 90 8b 87 f0 90 8b 89 f0 90 8a b1
> f0 90 8a a3 f0 90 8a a2 f0 90 8a b3 f0 90 8a aa f0 90 8a ad f0 90 8a b4 f0
> 90 91 98 f0 90 91 98 f0 90 91 ad f0 90 91 aa f0 90 91 9d f0 90 91 b8 f0 90
> 91 92 f0 90 91 a0 f0 90 91 af f0 90 91 9e f0 90 91 a3 f0 90 91 94 f0 90 91
> 9d f0 90 91 99 ea 9a 99 ea 99 88 ea 99 b9 ea 99 af ea 99 9d ea 99 95 ea 99
> 8d ea 99 8b ea 9a 9f ea 9a 90 ea 99 bb ea 99 a8 ea 99 b8 ea 99 8e ef b8 81
> ef b8 85 ef b8 87 ef b8 8c ef b8 8b ef b8 8f ef b8 8e ef b8 8f ef b8 82 ef
> b8 82 ef b8 86 ef b8 84 ef b8 8c ef b8 8b f0 90 83 bc f0 90 83 85 f0 90 83
> aa f0 90 83 b5 f0 90 82 92 f0 90 83 8e f0 90 82 b2 f0 90 83 a1 f0 90 82 aa
> d4 a3 d4 91 d4 84 d4 9a d4 90 d4 9d e3 bb a2 e4 b1 92 e3 a9 ac e4 9f 8a e4
> ab 80 e3 a0 b2 e4 9b ad e4 97 bc e4 af a4 e4 b3 a9 e3 9c ba e3 a6 b8 e4 a8
> a9 e4 8b 8e f0 90 ad b0 f0 90 ad be f0 90 ad b5 f0 90 ad b2 f0 90 ad ac f0
> 90 ad b4 f0 90 ad ae f0 90 ad b7 f0 90 ad aa f0 90 ad a3 f0 90 ad b5 f0 90
> ad a1 f0 90 ad b6 f0 90 ad a1 f0 90 ad ad f0 90 ad b3 f0 90 ad aa ea 9b 94
> ea 9b ba ea 9a a1 e1 83 a4 e1 83 a5 e1 82 b9 e1 82 b7 e1 83 86 e1 83 b5 e1
> 83 ab e1 83 bd e1 83 95 e1 82 be e1 83 a4 e1 83 8c e1 83 9d e2 bd 94 e2 be
> 98 e2 be 9c e2 be a5 e2 be a6 e2 be ba e2 bc ae e2 bc b6 f0 90 ad aa e1 a5
> b6 e1 a5 9e e1 a5 98 e1 a5 93 e1 a5 ab e1 a5 a1 e1 a5 90 e1 a5 b7 e1 a5 9d
> f3 b0 9d ad f3 bb b2 bf f3 b9 90 b3 f3 b5 86 8c f3 b8 9e a4 f3 b8 a6 8b f3
> b4 ae a2 f3 b5 95 b5 f3 bf 8f 8d f3 bd 8e a7 f3 b4 ba 81 f3 b4 81 be f3 b4
> 93 b2 f3 bb bf 84 f3 b0 aa 8e f3 b9 8d 9c f3 b5 a2 85 f3 b7 8d 93 f0 92 91
> b6 f0 92 90 8e f0 92 91 8d f0 92 91 8b f0 92 91 b2 f0 92 91 8c f0 92 90 86
> f0 9d 8d 8c f0 9d 8c 85 f0 9d 8d 94 f0 9d 8c be f0 9d 8c b0 f0 9d 8c b1 f0
> 9d 8c a5 f0 9d 8d 8d f0 9d 8c 81 f0 9d 8c b2 f0 9d 8d 93 f0 9d 8c b7 f0 9d
> 8d 85 f0 9d 8c bd f0 9d 8c 87 e1 b1 a5 e1 b1 b8 e1 b1 b9 e1 b1 a0 e1 b1 a2
> e1 b1 a6 e1 b1 b3 e1 b1 a7 e1 b1 b3 e1 b1 ab e1 b1 ac c7 af c7 93 c8 ae c7
> b2 c6 9d c8 be c8 a5 c7 81 c8 b6 c9 8a c9 81 e1 86 89 e1 84 9d e1 87 ad e1
> 86 b2 e1 86 a0 e1 87 92 e1 87 ad e1 85 87 e1 86 a2 e1 86 ae e1 84 86 e1 84
> 98 e1 85 b5 e1 85 b8 e1 87 b3 e1 85 be e1 86 b0 e1 84 98 e1 87 8a e1 87 97
> f0 9f 85 b9 f0 9f 85 a3 f0 9f 86 ab f0 9f 84 a6 f0 9f 85 b7 f0 9d 8c b0 e0
> ba a6 e0 ba 84 e0 ba 8b e0 ba 94 e0 bb 8f e0 bb ba e0 ba ad e0 bb a6 e0 bb
> 88 e0 ba a3 e0 bb 93 e0 ba 99 e0 bb 92 e0 ba b3 e0 ba 88 e0 bb b5 e0 ba 87
> e0 ba bc e1 8f 81 e1 8f 94 e1 8f ab e1 8f 98 e1 8e b0 e1 8f a0 e1 8f a7 e1
> 8e bd e1 8e b4 e1 8f a5 e1 8f 9d e1 8f b2 e1 8f bb e1 8e b8 f0 90 b1 80 f0
> 90 b0 8e f0 90 b0 a8 f0 90 b0 99 f0 90 b1 8a f0 90 b1 81 f0 90 b1 87 f0 90
> b1 87 f0 90 b1 8d f0 90 b0 af f0 90 b1 80 f0 90 b0 8d f0 90 b0 b7 f0 90 b1
> 84 f0 90 b0 8c f0 90 b0 b2 ef b9 9b ef b9 99 ef b9 91 ef b9 9e ef b9 9e ef
> b9 a7 ef b9 93 ef b9 9f e0 ac 8c e0 ad bb e0 ac 82 e0 ad 8a e0 ac 84 e0 ac
> bb e0 ad 8b e0 ac b1 e0 ad a7 e0 ac a4 e0 ad 94 e0 ad be e0 ad 88 e0 ad 8d]
>
> Stack Trace:
> java.lang.IllegalArgumentException: This builder doesn't allow terms that
> are larger than 1000 characters, got [e3 8e be e3 8f 80 e3 8c 92 e3 8c a5
> e3 8f b4 e3 8d 89 e3 8c 89 e3 8f 82 e3 8f b4 e3 8c b5 f0 90 8a a2 f0 90 8a
> bb f0 90 8b 83 f0 90 8a ab f0 90 8a b9 f0 90 8b 87 f0 90 8b 89 f0 90 8a b1
> f0 90 8a a3 f0 90 8a a2 f0 90 8a b3 f0 90 8a aa f0 90 8a ad f0 90 8a b4 f0
> 90 91 98 f0 90 91 98 f0 90 91 ad f0 90 91 aa f0 90 91 9d f0 90 91 b8 f0 90
> 91 92 f0 90 91 a0 f0 90 91 af f0 90 91 9e f0 90 91 a3 f0 90 91 94 f0 90 91
> 9d f0 90 91 99 ea 9a 99 ea 99 88 ea 99 b9 ea 99 af ea 99 9d ea 99 95 ea 99
> 8d ea 99 8b ea 9a 9f ea 9a 90 ea 99 bb ea 99 a8 ea 99 b8 ea 99 8e ef b8 81
> ef b8 85 ef b8 87 ef b8 8c ef b8 8b ef b8 8f ef b8 8e ef b8 8f ef b8 82 ef
> b8 82 ef b8 86 ef b8 84 ef b8 8c ef b8 8b f0 90 83 bc f0 90 83 85 f0 90 83
> aa f0 90 83 b5 f0 90 82 92 f0 90 83 8e f0 90 82 b2 f0 90 83 a1 f0 90 82 aa
> d4 a3 d4 91 d4 84 d4 9a d4 90 d4 9d e3 bb a2 e4 b1 92 e3 a9 ac e4 9f 8a e4
> ab 80 e3 a0 b2 e4 9b ad e4 97 bc e4 af a4 e4 b3 a9 e3 9c ba e3 a6 b8 e4 a8
> a9 e4 8b 8e f0 90 ad b0 f0 90 ad be f0 90 ad b5 f0 90 ad b2 f0 90 ad ac f0
> 90 ad b4 f0 90 ad ae f0 90 ad b7 f0 90 ad aa f0 90 ad a3 f0 90 ad b5 f0 90
> ad a1 f0 90 ad b6 f0 90 ad a1 f0 90 ad ad f0 90 ad b3 f0 90 ad aa ea 9b 94
> ea 9b ba ea 9a a1 e1 83 a4 e1 83 a5 e1 82 b9 

Re: [JENKINS] Lucene » Lucene-Coverage-main - Build # 937 - Unstable!

2023-10-29 Thread Michael McCandless
This repros for me ... silly test bug.  I'll commit a fix.

Mike

On Sat, Oct 28, 2023 at 3:10 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/937/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.util.automaton.TestAutomaton.testRandomFinite
>
> Error Message:
> java.lang.IllegalArgumentException: This builder doesn't allow terms that
> are larger than 1000 characters, got [e3 8e be e3 8f 80 e3 8c 92 e3 8c a5
> e3 8f b4 e3 8d 89 e3 8c 89 e3 8f 82 e3 8f b4 e3 8c b5 f0 90 8a a2 f0 90 8a
> bb f0 90 8b 83 f0 90 8a ab f0 90 8a b9 f0 90 8b 87 f0 90 8b 89 f0 90 8a b1
> f0 90 8a a3 f0 90 8a a2 f0 90 8a b3 f0 90 8a aa f0 90 8a ad f0 90 8a b4 f0
> 90 91 98 f0 90 91 98 f0 90 91 ad f0 90 91 aa f0 90 91 9d f0 90 91 b8 f0 90
> 91 92 f0 90 91 a0 f0 90 91 af f0 90 91 9e f0 90 91 a3 f0 90 91 94 f0 90 91
> 9d f0 90 91 99 ea 9a 99 ea 99 88 ea 99 b9 ea 99 af ea 99 9d ea 99 95 ea 99
> 8d ea 99 8b ea 9a 9f ea 9a 90 ea 99 bb ea 99 a8 ea 99 b8 ea 99 8e ef b8 81
> ef b8 85 ef b8 87 ef b8 8c ef b8 8b ef b8 8f ef b8 8e ef b8 8f ef b8 82 ef
> b8 82 ef b8 86 ef b8 84 ef b8 8c ef b8 8b f0 90 83 bc f0 90 83 85 f0 90 83
> aa f0 90 83 b5 f0 90 82 92 f0 90 83 8e f0 90 82 b2 f0 90 83 a1 f0 90 82 aa
> d4 a3 d4 91 d4 84 d4 9a d4 90 d4 9d e3 bb a2 e4 b1 92 e3 a9 ac e4 9f 8a e4
> ab 80 e3 a0 b2 e4 9b ad e4 97 bc e4 af a4 e4 b3 a9 e3 9c ba e3 a6 b8 e4 a8
> a9 e4 8b 8e f0 90 ad b0 f0 90 ad be f0 90 ad b5 f0 90 ad b2 f0 90 ad ac f0
> 90 ad b4 f0 90 ad ae f0 90 ad b7 f0 90 ad aa f0 90 ad a3 f0 90 ad b5 f0 90
> ad a1 f0 90 ad b6 f0 90 ad a1 f0 90 ad ad f0 90 ad b3 f0 90 ad aa ea 9b 94
> ea 9b ba ea 9a a1 e1 83 a4 e1 83 a5 e1 82 b9 e1 82 b7 e1 83 86 e1 83 b5 e1
> 83 ab e1 83 bd e1 83 95 e1 82 be e1 83 a4 e1 83 8c e1 83 9d e2 bd 94 e2 be
> 98 e2 be 9c e2 be a5 e2 be a6 e2 be ba e2 bc ae e2 bc b6 f0 90 ad aa e1 a5
> b6 e1 a5 9e e1 a5 98 e1 a5 93 e1 a5 ab e1 a5 a1 e1 a5 90 e1 a5 b7 e1 a5 9d
> f3 b0 9d ad f3 bb b2 bf f3 b9 90 b3 f3 b5 86 8c f3 b8 9e a4 f3 b8 a6 8b f3
> b4 ae a2 f3 b5 95 b5 f3 bf 8f 8d f3 bd 8e a7 f3 b4 ba 81 f3 b4 81 be f3 b4
> 93 b2 f3 bb bf 84 f3 b0 aa 8e f3 b9 8d 9c f3 b5 a2 85 f3 b7 8d 93 f0 92 91
> b6 f0 92 90 8e f0 92 91 8d f0 92 91 8b f0 92 91 b2 f0 92 91 8c f0 92 90 86
> f0 9d 8d 8c f0 9d 8c 85 f0 9d 8d 94 f0 9d 8c be f0 9d 8c b0 f0 9d 8c b1 f0
> 9d 8c a5 f0 9d 8d 8d f0 9d 8c 81 f0 9d 8c b2 f0 9d 8d 93 f0 9d 8c b7 f0 9d
> 8d 85 f0 9d 8c bd f0 9d 8c 87 e1 b1 a5 e1 b1 b8 e1 b1 b9 e1 b1 a0 e1 b1 a2
> e1 b1 a6 e1 b1 b3 e1 b1 a7 e1 b1 b3 e1 b1 ab e1 b1 ac c7 af c7 93 c8 ae c7
> b2 c6 9d c8 be c8 a5 c7 81 c8 b6 c9 8a c9 81 e1 86 89 e1 84 9d e1 87 ad e1
> 86 b2 e1 86 a0 e1 87 92 e1 87 ad e1 85 87 e1 86 a2 e1 86 ae e1 84 86 e1 84
> 98 e1 85 b5 e1 85 b8 e1 87 b3 e1 85 be e1 86 b0 e1 84 98 e1 87 8a e1 87 97
> f0 9f 85 b9 f0 9f 85 a3 f0 9f 86 ab f0 9f 84 a6 f0 9f 85 b7 f0 9d 8c b0 e0
> ba a6 e0 ba 84 e0 ba 8b e0 ba 94 e0 bb 8f e0 bb ba e0 ba ad e0 bb a6 e0 bb
> 88 e0 ba a3 e0 bb 93 e0 ba 99 e0 bb 92 e0 ba b3 e0 ba 88 e0 bb b5 e0 ba 87
> e0 ba bc e1 8f 81 e1 8f 94 e1 8f ab e1 8f 98 e1 8e b0 e1 8f a0 e1 8f a7 e1
> 8e bd e1 8e b4 e1 8f a5 e1 8f 9d e1 8f b2 e1 8f bb e1 8e b8 f0 90 b1 80 f0
> 90 b0 8e f0 90 b0 a8 f0 90 b0 99 f0 90 b1 8a f0 90 b1 81 f0 90 b1 87 f0 90
> b1 87 f0 90 b1 8d f0 90 b0 af f0 90 b1 80 f0 90 b0 8d f0 90 b0 b7 f0 90 b1
> 84 f0 90 b0 8c f0 90 b0 b2 ef b9 9b ef b9 99 ef b9 91 ef b9 9e ef b9 9e ef
> b9 a7 ef b9 93 ef b9 9f e0 ac 8c e0 ad bb e0 ac 82 e0 ad 8a e0 ac 84 e0 ac
> bb e0 ad 8b e0 ac b1 e0 ad a7 e0 ac a4 e0 ad 94 e0 ad be e0 ad 88 e0 ad 8d]
>
> Stack Trace:
> java.lang.IllegalArgumentException: This builder doesn't allow terms that
> are larger than 1000 characters, got [e3 8e be e3 8f 80 e3 8c 92 e3 8c a5
> e3 8f b4 e3 8d 89 e3 8c 89 e3 8f 82 e3 8f b4 e3 8c b5 f0 90 8a a2 f0 90 8a
> bb f0 90 8b 83 f0 90 8a ab f0 90 8a b9 f0 90 8b 87 f0 90 8b 89 f0 90 8a b1
> f0 90 8a a3 f0 90 8a a2 f0 90 8a b3 f0 90 8a aa f0 90 8a ad f0 90 8a b4 f0
> 90 91 98 f0 90 91 98 f0 90 91 ad f0 90 91 aa f0 90 91 9d f0 90 91 b8 f0 90
> 91 92 f0 90 91 a0 f0 90 91 af f0 90 91 9e f0 90 91 a3 f0 90 91 94 f0 90 91
> 9d f0 90 91 99 ea 9a 99 ea 99 88 ea 99 b9 ea 99 af ea 99 9d ea 99 95 ea 99
> 8d ea 99 8b ea 9a 9f ea 9a 90 ea 99 bb ea 99 a8 ea 99 b8 ea 99 8e ef b8 81
> ef b8 85 ef b8 87 ef b8 8c ef b8 8b ef b8 8f ef b8 8e ef b8 8f ef b8 82 ef
> b8 82 ef b8 86 ef b8 84 ef b8 8c ef b8 8b f0 90 83 bc f0 90 83 85 f0 90 83
> aa f0 90 83 b5 f0 90 82 92 f0 90 83 8e f0 90 82 b2 f0 90 83 a1 f0 90 82 aa
> d4 a3 d4 91 d4 84 d4 9a d4 90 d4 9d e3 bb a2 e4 b1 92 e3 a9 ac e4 9f 8a e4
> ab 80 e3 a0 b2 e4 9b ad e4 97 bc e4 af a4 e4 b3 a9 e3 9c ba e3 a6 b8 e4 a8
> a9 e4 8b 8e f0 90 ad b0 f0 90 ad be f0 90 ad b5 f0 90 ad b2 f0 90 ad ac f0
> 90 ad b4 f0 90 ad ae f0 90 ad b7 f0 90 ad aa f0 90 ad a3 f0 90 ad b5 f0 90
> ad a1 f0 90 ad b6 f0 90 ad a1 f0 90 ad ad f0 90 ad b3 f0 90 ad aa ea 9b 94
> ea 9b ba ea 9a a1 e1 83 a4 e1 83 a5 e1 82 b9 e1 82 b7 e1 83 86 e1 83 b5 e1
> 83 ab e1 

Re: Welcome Guo Feng to the Lucene PMC

2023-10-25 Thread Michael McCandless
Welcome Feng!

Mike McCandless

http://blog.mikemccandless.com


On Wed, Oct 25, 2023 at 5:05 AM Michael Sokolov  wrote:

> Welcome, gf2121!
>
> On Wed, Oct 25, 2023, 3:03 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Congratulations and welcome, Feng!
>>
>> On Tue, 24 Oct 2023 at 22:35, Adrien Grand  wrote:
>>
>>> I'm pleased to announce that Guo Feng has accepted an invitation to join
>>> the Lucene PMC!
>>>
>>> Congratulations Feng, and welcome aboard!
>>>
>>> --
>>> Adrien
>>>
>>


Re: Could we allow an IndexInput to read from a still writing IndexOutput?

2023-10-23 Thread Michael McCandless
Thanks everyone!  Responses below:

On Thu, Oct 19, 2023 at 11:17 AM Robert Muir  wrote:

> what will happen on windows?
>
> sorry, could not resist.
>

LOL, yeah, sigh.

On Thu, Oct 19, 2023 at 10:36 PM Dawid Weiss  wrote:

>
> I think there is a certain beauty (of tape-backed storage flavor...) in
> existing abstractions and I wouldn't change them unless absolutely
> necessary (FST construction isn't the dominant cost in indexing). Also,
> random seeks all over the place may be really problematic in certain
> scenarios (as is opening a written-to file for reading, as Robert
> mentioned).
>

I do agree.  I love how minimalistic the IO semantics Lucene actually
requires are.

> Failing that, our plan B is to wastefully duplicate the byte[] slices
> from the already written bytes into our own private (heap resident, boo)
> copy, which would use quite a bit more RAM while building the FST, and make
> less minimal FSTs for a given RAM budget.
>
> Well, this node cache doesn't have to be on heap... It can be a plain
> temporary file (with full random access). It's a scratch-only structure
> which you can delete after the fst is written. It does add I/O overhead but
> doesn't interfere with the rest of the code in Lucene. Perhaps, instead of
> changing IndexInput and IndexOutput, one could start with a plain temp file
> (NIO API)?
>

That's an interesting option.  I had ruled out "bypassing Directory
abstraction and going straight to JDK IO APIs", but maybe it's OK to do
so.  I like this option Dawid!


> I also think that the tradeoffs presented in graphs on the fst-node-cache
> issue are not so bad at all. Yes, the FST is not minimal, but the
> construction-space vs output size is quite all right to me.
>

Well, the tradeoffs I posted in this PR
 (now merged, to main, and
eventually to 9.x) are only if we still buffer the whole FST in RAM, and so
we use that as our random-access cache to past FST nodes.  If we succeed in
changing FST writing to fully off-heap (append bytes directly to disk),
then we need to duplicate that random-access RAM somewhere else (maybe a
direct NIO file, maybe just duplicated byte[] copies in the NodeHash).  So
net/net those curves will get worse -- more RAM required to achieve the
same minimality.  I haven't tested just how much worse yet ... I wanted to
probe this possibility (random read access on an appending write file)
first to not wastefully duplicate these bytes in RAM.

On Sat, Oct 21, 2023 at 1:09 AM Uwe Schindler  wrote:

> Hi, the biggest problem is with some IndexInputs that work on FS Cache
> (mmapdir). The file size changes while you are writing therefore it could
> cause strange issues. Especially the mapping of mmap may not see the
> changes you have already written as there is no happens-before relationship.
>
Hmm, I didn't realize Panama's MMap implementation had this limitation.  Or
maybe you are saying this is an OS level limitation?  Because when you map
a region of a file, you must give a bounded range (0 .. file-length), and
then if the file grows, you would have to re-map or make a 2nd, 3rd, ...
map?  Yeah OK this seems problematic indeed.

> So as said by the others, if you need stuff already written, keep it in
> memory (like nodes). We should really not change our IO model for this
> singleton. 1% slowdown while writing due to some caching of buffering does
> not matter and risk us corrupting indexes or run into errors while reading.
>
Yeah OK I'm convinced :)  Let's leave Lucene's IO "WORM" semantics intact,
and either use direct NIO for the suffix hash (NodeHash), or, burn the RAM
in duplicating the FST nodes (and measure the impact on RAM vs minimality).

Thanks everyone,

Mike McCandless

http://blog.mikemccandless.com


Re: Can we get rid of "Approve & Run" on GitHub PRs by new contributors (non-committers)?

2023-10-23 Thread Michael McCandless
Thanks everyone!  Responses below:

On Mon, Oct 16, 2023 at 7:37 AM Uwe Schindler  wrote:

> this seems to be a safety feature and is also enabled in general for
> Github. I found no options in asf.yaml to enable/disable it:
>
OK, thanks for checking Uwe.

> Nevertheless, I see no problem for pressing the button. When I quickly
> review a PR, I generally press the button.
>
I press it as well, but this is just a "best effort" and leaves many runs
unapproved for quite some time.  When I check a few days ago, there were 52
pending runs (I think for 26 PRs, seems to be 2 runs per unapproved PR),
ranging up to 19 days in age:
https://github.com/apache/lucene/actions?query=is%3Aaction_required  (I
have since approved all of them).  We committers are not consistent in
checking all pending runs ...

> For safety reasons this is required in most projects I was contributing,
> too (not only ASF).
>
But this is a silly way to achieve safety -- it's "assume guilty until
proven innocent" of our newest contributors, when past evidence that I've
seen shows it's 100% the opposite: all of our new contributors opening PRs
are not bad agents.  Can't we instead assume innocent until proven guilty
of our newest contributors?

Sure, those of us with the karma to push the "Approve and run" don't see a
problem: we long ago lost the fresh eyes / Shoshin that new contributors
bring and experience.

Uwe, put yourself in the shoes of a new contributor: you see a small issue,
you know how to make PRs so you make one, submit it, and then no response.
You see that other contributors' PRs quickly get this nice GitHub action
catching problems, but for some reason yours does not (maybe for 19 days).
(I think this "Approve and run" button is only seen by committers.)  You're
not sure what you did wrong, what you should do next.  You feel like this
community doesn't listen to new people's PRs.  Some random time later, say
12 days, the jobs run, and now you see you made some silly mistake and you
fix it and push to your PR.  And, again, nothing happens to confirm you did
fix the problems from the first run, for maybe another 6 days.  You wonder
why you had to wait 12 days to see the first silly mistake and another 6 to
see the next...

We should constantly strive to make the new contributor experience as
wonderful / frictionless / responsive as we can, not the opposite (this
approval step).  Such brave new people is how our community grows.  And we
old timer committers are blind to the pains they feel.

(Separately, we have another problem: gradually growing number of
still-open PRs:
https://home.apache.org/~mikemccand/lucenebench/github_pr_counts.html)

> What's the problem in pressing the button? Of course you take
> responsibility when the crypto miner starts, but if there is a huge PR
> by an external contributor, I would first ask if they could split it into
> smaller pieces. At some point we have to review it, and most external
> people creating huge PRs did bad stuff like pressing the format button in
> their IDE.
>
> I think running "./gradlew precommit" is a must for new contributors. The
> online checks on Github are more for me as reviewer/committer, to make sure
> all is fine before I press the merge button (for many PRs I don't even
> checkout the code after review). So it is fine to not trigger it by
> end-users.
>
New contributors don't necessarily know everything that we old-timers
expect/know, like running precommit (they are not committers, yet!), or
tidy.  That's what's great about our Github actions -- they run that for
the contributor an the contributor can quickly see what went wrong.
Inserting committer approval there just gums up that whole nice feedback
loop for a new contributor (up to 19 days!).

> -1 to ask INFRA to enable this.
>
OK, I won't ask INFRA to change anything!

On Mon, Oct 16, 2023 at 7:53 AM Robert Muir  wrote:

> I think running the builds with a timeout is a good thing to do
> anyway, for any CI build. I'm sure github actions has some fancy yaml
> for that, but you can just do "timeout -k 1m 1h ./gradlew..." instead
> of "./gradlew" too.


+1, that seems like a much better way to achieve safety without harming the
new contributor onboarding experience.

On Mon, Oct 16, 2023 at 11:02 AM Dawid Weiss  wrote:

> I filed a PR here -
> https://github.com/apache/lucene/pull/12687
>

Ooh, thank you Dawid!  And it's now merged, so we now have a decent timeout
protection, so if a bad actor tries to crypto mine or run some distributed
LLM or whatever, at least the wasted resources are bounded by how long a
"typical" legitimate run takes, plus generous buffer.  So given this
protection, why require the added manual approval step :)

Net/net I don't think we have to do anything more here ... for now I'll try
to make a periodic effort myself to approve & run these blocked jobs.
Maybe that's enough to create a smoother first-contributor experience.

But I still strongly disagree with intentionally harming the 

Re: Welcome Luca Cavanna to the Lucene PMC

2023-10-22 Thread Michael McCandless
Welcome Luca!

Mike

On Sun, Oct 22, 2023 at 4:34 PM Tomás Fernández Löbbe 
wrote:

> Congratulations Luca!
>
> On Sun, Oct 22, 2023 at 10:51 AM Michael Sokolov 
> wrote:
>
>> Congratulations and welcome, Luca!
>>
>> On Sun, Oct 22, 2023 at 1:42 PM Julie Tibshirani 
>> wrote:
>> >
>> > Congratulations Luca!!
>> >
>> > On Fri, Oct 20, 2023 at 1:45 AM Bruno Roustant <
>> bruno.roust...@gmail.com> wrote:
>> >>
>> >> Welcome, congratulations!
>> >>
>> >> Le ven. 20 oct. 2023 à 10:02, Dawid Weiss  a
>> écrit :
>> >>>
>> >>>
>> >>> Congratulations, Luca!
>> >>>
>> >>> On Fri, Oct 20, 2023 at 7:51 AM Adrien Grand 
>> wrote:
>> 
>>  I'm pleased to announce that Luca Cavanna has accepted an invitation
>> to join the Lucene PMC!
>> 
>>  Congratulations Luca, and welcome aboard!
>> 
>>  --
>>  Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Could we allow an IndexInput to read from a still writing IndexOutput?

2023-10-19 Thread Michael McCandless
Hi Team,

Today, Lucene's Directory abstraction does not allow opening an IndexInput
on a file until the file is fully written and closed via IndexOutput.  We
enforce this in tests, and some of our core Directory implementations
demand this (e.g. caching the file's length on opening an IndexInput).

Yet, most filesystems will easily allow simultaneous read/append of a
single file.  We just don't expose this IO semantics to Lucene, but could
we allow random-access reads with append-only writes on one file?  Is there
a strong reason that we don't allow this?

Quick TL/DR context: we are trying to enable FST compilation to write
off-heap (directly to disk), enabling creating arbitrarily large FSTs with
bounded heap, matching how FSTs can now be read off-heap, and it would be
much much more RAM efficient if we could read/append the same file at once.

Full gory details context: inspired by how Tantivy
 (awesome and fast Rust search
engine!) writes its FSTs , over
in this issue  and PR
,
we (thank you Dzung Bui / @dungba88!) are trying to fix Lucene's FST
building to immediately stream the FST to disk, instead of buffering the
whole thing in RAM and then writing to disk.

This would allow building arbitrarily large FSTs without using up heap, and
symmetrically matches how we can now read FSTs off-heap, plus FST building
is already (mostly) append-only. This would also allow removing some of the
crazy abstractions we have for writing FST bytes into RAM (FSTStore,
BytesStore).  It would enable interesting things like a Codec whose term
dictionary is stored entirely in an FST
 (also inspired by Tantivy).

The wrinkle is that, while the FST is building, it sometimes looks back and
reads previously written bytes, to share suffixes and create a minimal (or
near minimal) FST.  So if IndexInput could read those bytes, even as the
FST is still appending to IndexOutput, it would "just work".

Failing that, our plan B is to wastefully duplicate the byte[] slices from
the already written bytes into our own private (heap resident, boo) copy,
which would use quite a bit more RAM while building the FST, and make less
minimal FSTs for a given RAM budget.  I haven't measured the added wasted
RAM if we have to go this route but I fear it is sizable in practice, i.e.
it strongly negates the whole idea of writing an FST off-heap since its
effectively storing a possibly large portion of the FST in many duplicated
byte[] fragments (in the NodeHash).

So ... could we somehow relax Lucene's Directory semantics to allow opening
an IndexInput on a still appending IndexOutput, since most filesystems are
fine with this?

Mike McCandless

http://blog.mikemccandless.com


Can we get rid of "Approve & Run" on GitHub PRs by new contributors (non-committers)?

2023-10-16 Thread Michael McCandless
When a non-committer (I think?) opens a PR, one of the committers must
notice it and click Approve & Run so the contributor can find out if
something broke in our automated tests/precommit/linting.

This seems like a waste, and a friction in the worst possible place for our
community: new contributor onboarding experience.

I think we have it to prevent e.g. a crypto mining bot of a PR sneaking in
and taking tons of resources to mine dogecoin or so?

But 1) that doesn't seem to be happening so far, 2) when I hit "Approve &
Run" I never look closely to see if there is in fact a hidden crypto miner
in there, and 3) can't we just put some reasonable timeout on the GitHub
actions to block such abuse?

Is this some sort of requirement by GitHub, or did we choose to turn on
this silly step?

Mike McCandless

http://blog.mikemccandless.com


Re: [VOTE] Release Lucene 9.8.0 RC1

2023-09-25 Thread Michael McCandless
+1

SUCCESS! [0:21:40.221538]

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 25, 2023 at 7:01 AM Julie Tibshirani 
wrote:

> +1
>
> SUCCESS! [0:41:00.311710]
>
> Julie
>
> On Mon, Sep 25, 2023 at 9:38 AM Anshum Gupta 
> wrote:
>
>> +1
>>
>> Smoke tester is happy!
>>
>> SUCCESS! [0:46:49.090567]
>>
>> On Thu, Sep 21, 2023 at 10:49 PM Patrick Zhai  wrote:
>>
>>> Please vote for release candidate 1 for Lucene 9.8.0
>>>
>>> The artifacts can be downloaded from:
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.8.0-RC1-rev-d914b3722bd5b8ef31ccf7e8ddc638a87fd648db
>>>
>>> You can run the smoke tester directly with this command:
>>>
>>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.8.0-RC1-rev-d914b3722bd5b8ef31ccf7e8ddc638a87fd648db
>>>
>>> The vote will be open for at least 72 hours, as there's a weekend, the
>>> vote will last until 2023-09-27 06:00 UTC.
>>>
>>> [ ] +1  approve
>>> [ ] +0  no opinion
>>> [ ] -1  disapprove (and reason why)
>>>
>>> Here is my +1 (non-binding)
>>>
>>
>>
>> --
>> Anshum Gupta
>>
>


Re: [JENKINS] Lucene » Lucene-NightlyTests-9.x - Build # 665 - Unstable!

2023-09-02 Thread Michael McCandless
OK I opened https://github.com/apache/lucene/pull/12535

Mike McCandless

http://blog.mikemccandless.com


On Sat, Sep 2, 2023 at 7:17 AM Michael McCandless 
wrote:

> > The code is just good old socket accept loop as we have all learned it
> in school when we were fighting to write a small echo server with C.
>
> LOL this is all my fault from lng ago, showing my poor understanding
> of sockets/networking/C echo servers!!
>
> So it sounds like the client was just super slow in starting up and didn't
> connect to the server within the timeout.
>
> So maybe we just remove the timeout entirely (client will eventually start
> up?), and remove the pointless SO_REUSADDR?  I'll try to whip up a PR.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Sep 2, 2023 at 6:53 AM Uwe Schindler  wrote:
>
>> Let's fix this issue with bogus socket reuse. I am not sure why it is
>> there. We touched the code last time around 2012
>>
>> Why does it has a timeout in setver at all? Normally the accept() call
>> should have no timeout. If the client does not start fast enough, of course
>> it runs into timeout.
>>
>> The code is just good old socket accept loop as we have all learned it in
>> school when we were fighting to write a small echo server with C. The bug
>> here is the timeout. A timeout should only be in the client and not in the
>> waiting call.
>>
>> Uwe
>>
>> Uwe
>>
>>
>> Am 31. August 2023 14:53:44 MESZ schrieb Robert Muir :
>>
>>> I looked at this lockverifyserver and would say its probably just the
>>> craziness of this code.
>>>
>>> it sets 30 second socket timeout and intentionally calls accept() when
>>> there is nothing yet to accept... well no wonder we see this issue.
>>>
>>> p.s. why does it set SO_REUSEADDR? no reason to do this leniency when
>>> binding to port 0. nuke it.
>>>
>>> On Thu, Aug 31, 2023 at 8:46 AM Robert Muir  wrote:
>>>
>>>>
>>>>  probably a bug in some jvm sockets code that called accept() in its
>>>>  default blocking mode, when there wasn't any connection to accept? in
>>>>  that case accept() call will just block and wait for someone to make a
>>>>  new connection.
>>>>
>>>>  On Thu, Aug 31, 2023 at 8:16 AM Dawid Weiss  wrote:
>>>>
>>>>>
>>>>>
>>>>>  
>>>>> https://ge.apache.org/s/orksynljk2yp6/tests/task/:lucene:core:test/details/org.apache.lucene.store.TestStressLockFactories/testSimpleFSLockFactory?top-execution=1
>>>>>
>>>>>  This test took 31 seconds... An extremely slow vm, perhaps? I don't know 
>>>>> what the default connection timeouts are... it does look weird though.
>>>>>
>>>>>  Dawid
>>>>>
>>>>>  On Thu, Aug 31, 2023 at 1:08 PM Michael McCandless 
>>>>>  wrote:
>>>>>
>>>>>>
>>>>>>  Good grief -- why are we getting SocketTimeoutException in our 
>>>>>> LockVerifyServer's attempt to accept an incoming connection!?  These are 
>>>>>> all processes running on the same host ...
>>>>>>
>>>>>>  Mike McCandless
>>>>>>
>>>>>>  http://blog.mikemccandless.com
>>>>>>
>>>>>>
>>>>>>  On Tue, Aug 29, 2023 at 11:17 PM Apache Jenkins Server 
>>>>>>  wrote:
>>>>>>
>>>>>>>
>>>>>>>  Build: 
>>>>>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/665/
>>>>>>>
>>>>>>>  2 tests failed.
>>>>>>>  FAILED:  
>>>>>>> org.apache.lucene.store.TestStressLockFactories.testSimpleFSLockFactory
>>>>>>>
>>>>>>>  Error Message:
>>>>>>>  java.net.SocketTimeoutException: Accept timed out
>>>>>>>
>>>>>>>  Stack Trace:
>>>>>>>  java.net.SocketTimeoutException: Accept timed out
>>>>>>>  at 
>>>>>>> __randomizedtesting.SeedInfo.seed([E1AD0D2AD68BA993:F325FE2A6E367AC7]:0)
>>>>>>>  at java.base/java.net.PlainSocketImpl.socketAccept(Native 
>>>>>>> Method)
>>>>>>>  at 
>>>>>>> java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:474)
>&

Re: [JENKINS] Lucene » Lucene-NightlyTests-9.x - Build # 665 - Unstable!

2023-09-02 Thread Michael McCandless
> The code is just good old socket accept loop as we have all learned it in
school when we were fighting to write a small echo server with C.

LOL this is all my fault from lng ago, showing my poor understanding of
sockets/networking/C echo servers!!

So it sounds like the client was just super slow in starting up and didn't
connect to the server within the timeout.

So maybe we just remove the timeout entirely (client will eventually start
up?), and remove the pointless SO_REUSADDR?  I'll try to whip up a PR.

Mike McCandless

http://blog.mikemccandless.com


On Sat, Sep 2, 2023 at 6:53 AM Uwe Schindler  wrote:

> Let's fix this issue with bogus socket reuse. I am not sure why it is
> there. We touched the code last time around 2012
>
> Why does it has a timeout in setver at all? Normally the accept() call
> should have no timeout. If the client does not start fast enough, of course
> it runs into timeout.
>
> The code is just good old socket accept loop as we have all learned it in
> school when we were fighting to write a small echo server with C. The bug
> here is the timeout. A timeout should only be in the client and not in the
> waiting call.
>
> Uwe
>
> Uwe
>
>
> Am 31. August 2023 14:53:44 MESZ schrieb Robert Muir :
>
>> I looked at this lockverifyserver and would say its probably just the
>> craziness of this code.
>>
>> it sets 30 second socket timeout and intentionally calls accept() when
>> there is nothing yet to accept... well no wonder we see this issue.
>>
>> p.s. why does it set SO_REUSEADDR? no reason to do this leniency when
>> binding to port 0. nuke it.
>>
>> On Thu, Aug 31, 2023 at 8:46 AM Robert Muir  wrote:
>>
>>>
>>>  probably a bug in some jvm sockets code that called accept() in its
>>>  default blocking mode, when there wasn't any connection to accept? in
>>>  that case accept() call will just block and wait for someone to make a
>>>  new connection.
>>>
>>>  On Thu, Aug 31, 2023 at 8:16 AM Dawid Weiss  wrote:
>>>
>>>>
>>>>
>>>>  
>>>> https://ge.apache.org/s/orksynljk2yp6/tests/task/:lucene:core:test/details/org.apache.lucene.store.TestStressLockFactories/testSimpleFSLockFactory?top-execution=1
>>>>
>>>>  This test took 31 seconds... An extremely slow vm, perhaps? I don't know 
>>>> what the default connection timeouts are... it does look weird though.
>>>>
>>>>  Dawid
>>>>
>>>>  On Thu, Aug 31, 2023 at 1:08 PM Michael McCandless 
>>>>  wrote:
>>>>
>>>>>
>>>>>  Good grief -- why are we getting SocketTimeoutException in our 
>>>>> LockVerifyServer's attempt to accept an incoming connection!?  These are 
>>>>> all processes running on the same host ...
>>>>>
>>>>>  Mike McCandless
>>>>>
>>>>>  http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>>  On Tue, Aug 29, 2023 at 11:17 PM Apache Jenkins Server 
>>>>>  wrote:
>>>>>
>>>>>>
>>>>>>  Build: 
>>>>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/665/
>>>>>>
>>>>>>  2 tests failed.
>>>>>>  FAILED:  
>>>>>> org.apache.lucene.store.TestStressLockFactories.testSimpleFSLockFactory
>>>>>>
>>>>>>  Error Message:
>>>>>>  java.net.SocketTimeoutException: Accept timed out
>>>>>>
>>>>>>  Stack Trace:
>>>>>>  java.net.SocketTimeoutException: Accept timed out
>>>>>>  at 
>>>>>> __randomizedtesting.SeedInfo.seed([E1AD0D2AD68BA993:F325FE2A6E367AC7]:0)
>>>>>>  at java.base/java.net.PlainSocketImpl.socketAccept(Native 
>>>>>> Method)
>>>>>>  at 
>>>>>> java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:474)
>>>>>>  at 
>>>>>> java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
>>>>>>  at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
>>>>>>  at 
>>>>>> org.apache.lucene.store.LockVerifyServer.run(LockVerifyServer.java:62)
>>>>>>  at 
>>>>>> org.apache.lucene.store.TestStressLockFactories.runImpl(TestStressLockFactories.java:53)
>>>>>>

Re: [JENKINS] Lucene » Lucene-NightlyTests-9.x - Build # 665 - Unstable!

2023-08-31 Thread Michael McCandless
Good grief -- why are we getting SocketTimeoutException in our
LockVerifyServer's attempt to accept an incoming connection!?  These are
all processes running on the same host ...

Mike McCandless

http://blog.mikemccandless.com


On Tue, Aug 29, 2023 at 11:17 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/665/
>
> 2 tests failed.
> FAILED:
> org.apache.lucene.store.TestStressLockFactories.testSimpleFSLockFactory
>
> Error Message:
> java.net.SocketTimeoutException: Accept timed out
>
> Stack Trace:
> java.net.SocketTimeoutException: Accept timed out
> at
> __randomizedtesting.SeedInfo.seed([E1AD0D2AD68BA993:F325FE2A6E367AC7]:0)
> at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
> at java.base/java.net
> .AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:474)
> at java.base/java.net
> .ServerSocket.implAccept(ServerSocket.java:565)
> at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
> at
> org.apache.lucene.store.LockVerifyServer.run(LockVerifyServer.java:62)
> at
> org.apache.lucene.store.TestStressLockFactories.runImpl(TestStressLockFactories.java:53)
> at
> org.apache.lucene.store.TestStressLockFactories.testSimpleFSLockFactory(TestStressLockFactories.java:104)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> 

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael McCandless
Thanks Michael, very interesting!  I of course agree that Lucene is all you
need, heh ;)

Jimmy Lin also tweeted about the strength of Lucene's HNSW:
https://twitter.com/lintool/status/1681333664431460353?s=20

Mike McCandless

http://blog.mikemccandless.com


On Thu, Aug 31, 2023 at 3:31 AM Michael Wechner 
wrote:

> Hi Together
>
> You might be interesed in this paper / article
>
> https://arxiv.org/abs/2308.14963
>
> Thanks
>
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: WrongThreadException using the new Panama MMap on Java 19

2023-08-22 Thread Michael McCandless
Thanks Uwe!  Responses inlined below:

On Thu, Aug 17, 2023 at 9:46 AM Uwe Schindler  wrote:

> this error indeed cannot happen as all our segments are shared. It could
> still be some bug in the Java 19 version, did you try Java 21 or Java 20?
>
OK, phew, that is what I thought (from reading Lucene's sources).  We are
testing Java 20 now (though EOL is in ~4 weeks -- hard to keep up!), and
soon Java 21.  Maybe we'll just go straight to Java 21 if there are no
problems.  This exception only happened on a single host (and JVM) across a
great many Lucene instances we run so whatever bug it is: it looks very
rare.  But, for that host, the exception happened a great many times, for
the few hours that this JVM was alive.

> It may also be a Coretto problem, maybe contact their team, maybe they
> have applied some changes. ScopedMemoryAccess is using an extension to the
> original Java memory model internally (I think the changed something in the
> specs), so it changed quite a lot internally. Maybe Coretto has some
> patches for hotspot that make the memory model changes hit us?
>
Good idea -- we will reach out to the Corretto team.  Scary to be using
extensions to JMM which was already complex enough to begin with!

> I don't think the bug is in Lucene's code, because if a thread is shared,
> it is shared. Maybe some other problem could be: Have you maybe
> accidentally closed the IndexInput too early. Normally this should cause an
> IllegalStateException (we have a test for this), but I am not fully sure
> what happens if the shared scope was already closed. I remmeber there were
> some bugs in 19, but it is already too long ago. So please try with plain
> OpenJDK Java 21 (or 20).
>
I don't think we are closing IndexInputs too early -- we would see many
many exceptions if so, I hope.  We will expedite getting off 19.

> I would like to know more about the speed improvements! In our
> benchmarking they were not so visible (only a slight change), so happy to
> see more.
>
Right, we were surprised too!  We are still trying to isolate where the
gains came from, but they were impactful for us (~5-7% reduction in CPU
time somehow).  We kept Java at 19 (we try not to upgrade both at the same
time!).

Mike McCandless

http://blog.mikemccandless.com

>


WrongThreadException using the new Panama MMap on Java 19

2023-08-17 Thread Michael McCandless
Hi Team,

We hit an interesting and exciting intermittent exception in our
customer-facing product search instance (all Lucene!) at Amazon:

 java.lang.WrongThreadException: Attempted access outside owning thread

  at
java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:460)

  at
java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)

  at
java.base/jdk.internal.misc.ScopedMemoryAccess.getByte(ScopedMemoryAccess.java:518)

  at
java.base/java.lang.invoke.VarHandleSegmentAsBytes.get(VarHandleSegmentAsBytes.java:109)

  at java.base/java.lang.foreign.MemorySegment.get(MemorySegment.java:1103)

  at
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:485)

  at
org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)

  at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1444)

  at
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:511)

  at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:111)

  at org.apache.lucene.index.TermStates.build(TermStates.java:96)


We are using Corretto Java full version:


  openjdk full version "19.0.2+9"


Looking at how Uwe's magic mrjar code works, it doesn't look like we ever
make a thread private MemorySegment?  If so, I don't see how this exception
could be occurring :)  We seem to do this:

final MemorySession session = MemorySession.openShared();

Or, maybe we do sometimes make thread private memory segments, and maybe we
(Amazon's sources) have a silly thread over-sharing bug, but so far I think
that's unlikely -- we are calling TermStates.build from a single thread,
which under the hood clones/slices the MMap IndexInputs to seek the terms
dictionary on each segment and only that one thread ever interacts with
those.  It's all just one thread under TermStates.build.


This only happened on a few hosts and only for a short period of time,
making me suspect some sort of intermittent JVM bug (e.g. HotSpot
miscomiplation or so).  It is clearly very rare, so we are still using the
new MMap (which btw seems to be a big performance gain for our service,
which we are still trying to fully understand, more on that later!).


Has anyone else seen such errant exceptions with the new Panama based
MMap?  Are there any known Java issues that smell like this?  (A quick
search on bugs.openjdk.org (
https://bugs.openjdk.org/browse/JDK-8287809?jql=issuetype%20%3D%20Bug%20AND%20text%20~%20WrongThreadException)
did not seem to turn up any obvious candidates).


Thanks,


Mike McCandless

http://blog.mikemccandless.com


Re: Branchless binary search in Java?

2023-08-14 Thread Michael McCandless
Oh I realized this super interesting read (thanks Rob!) was accidentally
sent privately to me, so I'm reply-all'ing explicitly so everyone else gets
to read this too :)

I especially love the charts and math at the end!

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 27, 2023 at 8:02 AM Rob Audenaerde 
wrote:

> Super interesting read!
>
> Btw. following the links of the internet I somehow ended up here, which is
> a nice in-depth comparison on binary-search approaches:
>
> https://en.algorithmica.org/hpc/data-structures/binary-search/
>
>
>
> On Thu, Jul 27, 2023 at 1:40 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Team,
>>
>> At Amazon (customer facing product search team) we've been playing with /
>> benchmarking Tantivy (exciting Rust search engine loosely inspired by
>> Lucene: https://github.com/quickwit-oss/tantivy, created by Paul Masurel
>> and developed now by Quickwit and the Tantivy open source dev community) vs
>> Lucene, by building a benchmark that blends Tantivy's search-benchmark-game
>> (https://github.com/quickwit-oss/search-benchmark-game) and Lucene's
>> nightly benchmarks (luceneutil: https://github.com/mikemccand/luceneutil
>> and https://home.apache.org/~mikemccand/lucenebench).
>>
>> It's great fun, and we would love more eyeballs to spot any remaining
>> unfair aspects of the comparison (thank you Uwe and Adrien for catching
>> things already!).  We are trying hard for an apples to apples comparison:
>> same (enwiki) corpus, same queries, confirming we get precisely the same
>> top N hits and total hit counts, latest versions of both engines.
>>
>> Indeed, Tantivy is substantially (2-3X?) faster than Lucene for many
>> queries, and we've been trying to understand why.
>>
>> Sometimes it is due to a more efficient algorithms, e.g. the count() API
>> for pure disjunctive queries, which Adrien is now porting to Lucene (thank
>> you!), showing sizable (~80% faster in one query) speedups:
>> https://github.com/apache/lucene/issues/12358.  Other times may be due
>> to Rust's more efficient/immediate/Python-like GC, or direct access to SIMD
>> (Lucene now has this for aKNN search -- big speedup -- and maybe soon for
>> postings too?), and unsafe code, different skip data / block postings
>> encoding, or ...
>>
>> Specifically, one of the fascinating Tantivy optimizations is the
>> branchless binary search: https://quickwit.io/blog/search-a-sorted-block.
>> Here's another blog post about it (implemented in C++):
>> https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/
>>
>> The idea is to get rustc/gcc to compile down to x86-64's CMOVcc
>> ("conditional move") instruction (I'm not sure if ARM has an equivalent?
>> Maybe "conditional execution"?).  The idea is to avoid a "true" branch of
>> the instruction stream (costly pipeline flush on mis-prediction, which is
>> likely in a binary search or priority queue context) by instead
>> conditionally moving a value from one location to another (register or
>> memory).  Tantivy uses this for skipping through postings, in a single
>> layer in-memory skip list structure (versus Lucene's on-disk
>> (memory-mapped, by default) multi-layer skip list) -- see the above blog
>> post.
>>
>> This made me wonder: does javac's hotspot compiler use CMOVcc?  I see
>> javac bug fixes like https://github.com/openjdk/mobile/commit/a03e9220
>> which seems to imply C2 does in fact compile to CMOVcc sometimes.  So then
>> I wondered whether a branchless binary search in Java is a possibility?
>> Has anyone played with this?
>>
>> Before Robert gets too upset :)  Even if we could build such a thing, the
>> right place for such optimizations is likely the JDK itself (see the
>> similar discussion about SIMD-optimized sorting:
>> https://github.com/apache/lucene/issues/12399).  Still, I'm curious if
>> anyone has explored this, and maybe saw some performance gains from way up
>> in javaland where we can barely see the bare metal shining beneath us :)
>>
>> Sorry for the long post!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>


Re: Branchless binary search in Java?

2023-08-01 Thread Michael McCandless
On Sun, Jul 30, 2023 at 11:17 AM Bruno Roustant 
wrote:

Interesting coincidence, I'm currently working on a learned index on sorted
> keys that can advantageously replace binary search.
> It is very compact (additional space of 2% of the sorted key array, e.g.
> 40KB for 200MB of keys), and it is between 2x to 3x faster than binary
> search for the rank/indexOf methods. By design there are nearly no
> branches: the index of a key is approximated by using hierarchical linear
> segment computation.
> The PGM-Index paper is there: https://pgm.di.unipi.it/
> And my implementation is here, just submitted in HPPC:
> https://github.com/carrotsearch/hppc/pull/39
>

Wow, this looks very relevant to Lucene!  Could this index be used for
faster implementation of our skip lists?  Even though they are static
(computed once at segment-write time) vs dynamic/online that these learned
indices are also able to handle, it looks like learned indices are still
better than simple binary search, and quite a bit more compact, for static
cases.

Our "best linear fit" approximation to compress monotonic longs
(DirectMonoticWriter/Reader) looks like a simple example of these learned
indices too.

Mike McCandless

http://blog.mikemccandless.com


Re: Branchless binary search in Java?

2023-08-01 Thread Michael McCandless
On Fri, Jul 28, 2023 at 7:04 AM Dawid Weiss  wrote:

> Actually this is exactly the same for Java:
>>
> Yup, I know (we all know by now, I guess). People (including me) evidently
> crave this low, iron-level control, while at the same time mostly try to
> dodge writing any software in languages that are designed to be close to
> the hardware. There is a love-hate relationship there that I often find
> amusing.
>

This is indeed amusing :)  Grass is always greener on the other side?

Mike McCandless

http://blog.mikemccandless.com

>


Re: Branchless binary search in Java?

2023-08-01 Thread Michael McCandless
Yeah +1 on waiting/asking/expecting CMOV to be properly utilized by
Hotspot, instead of trying to target the instruction ourselves.  This is
all more of a curiosity / exploration.  I am curious whether "branchless
binary search" is something Arrays.binarySearch already does / compiles
to.  Even in C (much closer to shiny bare metal than javaland!) you must
write the for loop just so to tickle the compiler into producing CMOV.

When Adrien fixed our FOR postings decode to write the java "just so" so
that Hotspot would auto-vectorize properly, it was an impactful performance
jump.  But I agree this is all very brittle: one small change on upgrade to
the JDK, CPU instructions, whatnot, might break the optimization.  How do
we know/track/test that it is even still working?  Maybe we need some crazy
unit test involving hsdis to confirm :)  Cutting over to explicit
vectorized code (Panama) should ensure that, but for postings it looks like
we are still a ways off from that being realistic:
https://github.com/apache/lucene/issues/12477#issuecomment-1658224212 ...
though for KNN we were able to cutover.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 28, 2023 at 4:36 AM Uwe Schindler  wrote:

> Actually this is exactly the same for Java: You can try whatever you want,
> the outcome of the dynamic optimization applied by various dynamic building
> blocks (Java bytecode, Java/Hotspot version, command line parameters,
> hardware CPU, virtualization) is not predictable and any change anywhere
> may produce different results. So we should stop on arguing about changing
> *our* code to improve assembly code. If we have some code on our side and
> it is not correctly converted to CMOV, we should open bug report on OpenJDK
> (Chris H. and I can do this easily - and ask for improvement).
>
> As you have seen in my other answer to this thread: Hotspot applies CMOV
> depending on analysis of branches. So in general our code *should* make us
> of CMOV. You can only get certainity by using hsdis and print of assembly
> for some of our methods which you think should use CMOV. But there's no
> guarantee that it is applied. And as always: It may take a very long time
> until Hotspot replaces the standard branched code by conditional moves (as
> they have significant overhead if used in cases where the result is
>
> With Hotspot you can try to add -XX:ConditionalMoveLimit ("Limit of ops
> to make speculative when using CMOVE") and try with different values (0
> disables, default is 3 on x86 and aarch64, 4 on arm). But as always: Wait
> long enough.
>
> To enforce usage of CMOV (maybe that's the first thing for trying around
> and to look on the type of assembly created; but this may slow down other
> code as CMOV is always used, without analysis):
> -XX:+UseCMoveUnconditionally ("Generates CMove (scalar and vector)
> instructions regardless of profitability analysis.")
>
> Uwe
>
> P.S.: Hotspot also has cmov for vectorized code
> Am 28.07.2023 um 09:08 schrieb Dawid Weiss:
>
>
>
>> Specifically, one of the fascinating Tantivy optimizations is the
>> branchless binary search: https://quickwit.io/blog/search-a-sorted-block
>> .
>>
>
> This is an interesting post, thanks for sharing, Mike. I remember when
> people did such low-level tricks frequently (but on much simpler processors
> and fairly consistent hardware) and it
> always makes me wonder whether all the moving blocks involved here (rust,
> llvm, actual hardware) make it sane - any change in any of these layers may
> affect the outcome (and debugging what actually happened will be a
> nightmare...). I like it though - nice intellectual exercise and some
> assembly dumps for a change. ;)
>
> D.
>
>> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>


Re: Branchless binary search in Java?

2023-08-01 Thread Michael McCandless
On Thu, Jul 27, 2023 at 9:19 AM Uwe Schindler  wrote:

See Shipilevs blog:
> https://shipilev.net/jvm/anatomy-quarks/30-conditional-moves/
>

Really interesting!  This is an awesome, quick explanation of the tradeoff
CMOV is making (pre-computing both paths) vs branching (have to predict,
with high cost of mis-prediction).  Maybe priority queues (used heavily in
Lucene, e.g. during disjunctive search, merging of terms during indexing or
by MultiTerms at search time, etc.) are a possible place where the branch
is hard to predict (on up or down heaping) and maybe CMOV could help?

He also has some examples and also there's a command line option to tell
> hotspot when to use cmov: -XX:ConditionalMoveLimit
>

Yay!  Another fun JVM command-line flag to tweak :)

Mike McCandless

http://blog.mikemccandless.com


Re: Branchless binary search in Java?

2023-08-01 Thread Michael McCandless
On Thu, Jul 27, 2023 at 8:44 AM Uwe Schindler  wrote:

> Hi Mike,
>
> actually Hotspot is using CMOV. Some nodes after bytecode analysis are
> converted to CMoveNode and it has implementations to generate machine code
> (as far as i see) for x86, s390 and arm.
>
> The generic code is here:
> https://github.com/openjdk/jdk/blob/486c7844f902728ce580c3994f58e3e497834952/src/hotspot/share/opto/movenode.cpp
>
Thanks Uwe, this is wild.  I read the comment at the top of that source
file and couldn't understand it!  And it made it sound like there was no
solution (all five bullets have problems).

> Actually it is used in some cases, but I did not find out when it uses it
> to generate instructions from bytecode. It also has some code to optimize
> the cmov away if the result is known before (or not, e.g. for floats it
> does not do this).
>
> I think best would be to ask on the hotspot compiler list on suggestions
> how to write Java code to trigger the JVM to insert the CMOV.
>
Thanks.  I wonder if there is any practical performance impact of targeting
CMOV from javaland...

Mike McCandless

http://blog.mikemccandless.com


Branchless binary search in Java?

2023-07-27 Thread Michael McCandless
Hi Team,

At Amazon (customer facing product search team) we've been playing with /
benchmarking Tantivy (exciting Rust search engine loosely inspired by
Lucene: https://github.com/quickwit-oss/tantivy, created by Paul Masurel
and developed now by Quickwit and the Tantivy open source dev community) vs
Lucene, by building a benchmark that blends Tantivy's search-benchmark-game
(https://github.com/quickwit-oss/search-benchmark-game) and Lucene's
nightly benchmarks (luceneutil: https://github.com/mikemccand/luceneutil
and https://home.apache.org/~mikemccand/lucenebench).

It's great fun, and we would love more eyeballs to spot any remaining
unfair aspects of the comparison (thank you Uwe and Adrien for catching
things already!).  We are trying hard for an apples to apples comparison:
same (enwiki) corpus, same queries, confirming we get precisely the same
top N hits and total hit counts, latest versions of both engines.

Indeed, Tantivy is substantially (2-3X?) faster than Lucene for many
queries, and we've been trying to understand why.

Sometimes it is due to a more efficient algorithms, e.g. the count() API
for pure disjunctive queries, which Adrien is now porting to Lucene (thank
you!), showing sizable (~80% faster in one query) speedups:
https://github.com/apache/lucene/issues/12358.  Other times may be due to
Rust's more efficient/immediate/Python-like GC, or direct access to SIMD
(Lucene now has this for aKNN search -- big speedup -- and maybe soon for
postings too?), and unsafe code, different skip data / block postings
encoding, or ...

Specifically, one of the fascinating Tantivy optimizations is the
branchless binary search: https://quickwit.io/blog/search-a-sorted-block.
Here's another blog post about it (implemented in C++):
https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/

The idea is to get rustc/gcc to compile down to x86-64's CMOVcc
("conditional move") instruction (I'm not sure if ARM has an equivalent?
Maybe "conditional execution"?).  The idea is to avoid a "true" branch of
the instruction stream (costly pipeline flush on mis-prediction, which is
likely in a binary search or priority queue context) by instead
conditionally moving a value from one location to another (register or
memory).  Tantivy uses this for skipping through postings, in a single
layer in-memory skip list structure (versus Lucene's on-disk
(memory-mapped, by default) multi-layer skip list) -- see the above blog
post.

This made me wonder: does javac's hotspot compiler use CMOVcc?  I see javac
bug fixes like https://github.com/openjdk/mobile/commit/a03e9220 which
seems to imply C2 does in fact compile to CMOVcc sometimes.  So then I
wondered whether a branchless binary search in Java is a possibility?  Has
anyone played with this?

Before Robert gets too upset :)  Even if we could build such a thing, the
right place for such optimizations is likely the JDK itself (see the
similar discussion about SIMD-optimized sorting:
https://github.com/apache/lucene/issues/12399).  Still, I'm curious if
anyone has explored this, and maybe saw some performance gains from way up
in javaland where we can barely see the bare metal shining beneath us :)

Sorry for the long post!

Mike McCandless

http://blog.mikemccandless.com


Re: [VOTE] Release PyLucene 9.7.0-rc1

2023-07-12 Thread Michael McCandless
+1

I ran the same exciting smoke test -- indexing first 100K enwiki docs,
running a few political searches, force merging, searching again.
Everything ran fine!

Arch Linux kernel 6.3.2, Java 17.0.7+7, Python 3.11.3.

Sorry for the delay!

Mike

On Sun, Jul 9, 2023 at 3:28 PM Dawid Weiss  wrote:

>
> +1 to release. Thanks Andi.
>
> On Thu, Jul 6, 2023 at 9:47 AM Andi Vajda  wrote:
>
>>
>> The PyLucene 9.7.0 (rc1) release tracking the recent release of
>> Apache Lucene 9.7.0 is ready.
>>
>> A release candidate is available from:
>> https://dist.apache.org/repos/dist/dev/lucene/pylucene/9.7.0-rc1/
>>
>> PyLucene 9.7.0 is built with JCC 3.13, included in these release
>> artifacts.
>>
>> JCC 3.13 supports Python 3.3 up to Python 3.11.
>> PyLucene may also be built with Python 2 but this configuration is no
>> longer
>> tested.
>>
>> Please vote to release these artifacts as PyLucene 9.7.0.
>> Anyone interested in this release can and should vote !
>>
>> Thanks !
>>
>> Andi..
>>
>> ps: the KEYS file for PyLucene release signing is at:
>> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
>> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
>>
>> pps: here is my +1
>>
> --
Mike McCandless

http://blog.mikemccandless.com


Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-17.0.5) - Build # 11322 - Unstable!

2023-06-27 Thread Michael McCandless
Thanks for digging Patrick!

I sort of think MaxScoreBulkScorer should be returning NO_MORE_DOCS in this
case?  But I'm far from an expert.  This may be related to the recent
MAXScore improvements for disjunctions?  (
https://github.com/apache/lucene/commit/8703e449cee0693e50a7922a86c1cbc7dcf95d13
)

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jun 27, 2023 at 2:34 AM Patrick Zhai  wrote:

> The exception was thrown because TimeLimitingBulkScorer passed in a "max"
> which is larger than the maxDoc in the segment. And then MaxScoreBulkScorer
> directly returns the rangeEnd as the next estimation here
> 
>  and
> finally makes AssertingBulkScorer unhappy because it expects a NO_MORE_DOC
> in case that the "max" or "next" is larger than maxDoc. (here
> 
> )
>
> I'm not super sure what's the right fix, seems to me neither
> TimeLimitingBulkScorer nor MaxScoreBulkScorer has violated the contract (as
> we never state in javadoc guarantee that if there's no more doc the method
> will return NO_MORE_DOC), so perhaps we should just let AssertingBulkScorer
> tolerate the case?
>
> Patrick
>
> On Mon, Jun 26, 2023 at 10:54 PM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/11322/
>> Java: 64bit/hotspot/jdk-17.0.5 -XX:-UseCompressedOops -XX:+UseSerialGC
>>
>> 1 tests failed.
>> FAILED:  org.apache.lucene.expressions.TestExpressionSorts.testQueries
>>
>> Error Message:
>> java.lang.AssertionError
>>
>> Stack Trace:
>> java.lang.AssertionError
>> at
>> __randomizedtesting.SeedInfo.seed([9D337074B96D1F8C:C1BDBCAFA304AA22]:0)
>> at org.apache.lucene.test_framework@9.8.0-SNAPSHOT
>> /org.apache.lucene.tests.search.AssertingBulkScorer.score(AssertingBulkScorer.java:105)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.TimeLimitingBulkScorer.score(TimeLimitingBulkScorer.java:82)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:776)
>> at org.apache.lucene.test_framework@9.8.0-SNAPSHOT
>> /org.apache.lucene.tests.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:78)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:694)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:688)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:668)
>> at org.apache.lucene.core@9.8.0-SNAPSHOT
>> /org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
>> at
>> org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:119)
>> at
>> org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:113)
>> at
>> org.apache.lucene.expressions.TestExpressionSorts.testQueries(TestExpressionSorts.java:92)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at org.apache.lucene.test_framework@9.8.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at org.apache.lucene.test_framework@9.8.0-SNAPSHOT
>> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at org.apache.lucene.test_framework@9.8.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at org.apache.lucene.test_framework@9.8.0-SNAPSHOT
>> 

Re: [VOTE] Release Lucene 9.7.0 RC1

2023-06-24 Thread Michael McCandless
+1

SUCCESS! [0:16:13.144051]

Mike

On Fri, Jun 23, 2023, 11:48 PM Gautam Worah  wrote:

> SUCCESS! [0:32:53.769993]
>
> +1 (non-binding)
>
> Regards,
> Gautam Worah.
>
>
> On Fri, Jun 23, 2023 at 3:50 PM Mayya Sharipova
>  wrote:
>
>> Thank you  Adrien!
>>
>> SUCCESS! [0:59:16.681584]
>> +1
>>
>> On Fri, Jun 23, 2023 at 3:35 AM Uwe Schindler  wrote:
>>
>>> Hi,
>>>
>>> SUCCESS! [1:04:57.975885]
>>> https://jenkins.thetaphi.de/job/Lucene-Release-Tester/27/console
>>>
>>> Smoke tester ran with Java 11 and Java 17. Unfortunately theres still no
>>> support by Smoketester to run it with a set of arbitrary JDKs (some limited
>>> conformance tests with gradle should be executed to not make it take
>>> forever). We should open issue for that, I would have created a PR already
>>> but my Python knowledge is minimal and my brain only supports copypaste!
>>>
>>> I verified in addition the following:
>>>
>>>- Changes for completeness; I also updated the release notes
>>>(function query support for vectors was missing)
>>>- I regenerated the JDK 21 API signatures with latest JDK21 EA build
>>>28, no changes - all fine.
>>>- I started Luke with Java 21, MMapDirectory was using memory
>>>segments.
>>>- I did not specifically test Java 20/21 vector support (see
>>>smoketester issue above).
>>>
>>> +1 to release!
>>>
>>> Uwe
>>> Am 21.06.2023 um 16:36 schrieb Adrien Grand:
>>>
>>> Please vote for release candidate 1 for Lucene 9.7.0
>>>
>>> The artifacts can be downloaded from:
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.7.0-RC1-rev-ccf4b198ec328095d45d2746189dc8ca633e8bcf
>>>
>>> You can run the smoke tester directly with this command:
>>>
>>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.7.0-RC1-rev-ccf4b198ec328095d45d2746189dc8ca633e8bcf
>>>
>>> The vote will be open for at least 72 hours i.e. until 2023-06-24 15:00
>>> UTC.
>>>
>>> [ ] +1  approve
>>> [ ] +0  no opinion
>>> [ ] -1  disapprove (and reason why)
>>>
>>> Here is my +1
>>>
>>> --
>>> Adrien
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>>
>>>


Re: Welcome Chris Hegarty to the Lucene PMC

2023-06-19 Thread Michael McCandless
Welcome aboard Chris!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jun 19, 2023 at 7:16 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Congratulations Chris!
>
> On Mon, 19 Jun, 2023, 3:23 pm Adrien Grand,  wrote:
>
>> I'm pleased to announce that Chris Hegarty has accepted an invitation to
>> join the Lucene PMC!
>>
>> Congratulations Chris, and welcome aboard!
>>
>> --
>> Adrien
>>
>


  1   2   3   4   5   6   7   8   9   10   >