from:"Greg Miller"

Re: Lucene 10.0 and 9.12 blockers

2024-08-22 Thread Greg Miller

To make sure we don't lose track of this,
https://github.com/apache/lucene/issues/13671 is a blocker for 10.0.
There's a quick path forward here if we get into a time-crunch situation;
we can revert some recent changes to drill-sideways (specifically
DrillSideways and DrillSidewaysQuery) along with the new method that was
added in IndexSearcher and ship—but then the newly-added sandbox faceting
code (GH#13568) won't be able to work with drill-sideways.

As a more preferred solution, with the time we have before freeze I'd like
to see if we can find a path forward that allows the new faceting code to
work with drill-sideways while not adding API surface area to
IndexSearcher. I've marked GH#13671 with the 10.0 milestone. Not sure if
there's a better way to capture it as a blocker.

Cheers,
-Greg

On Thu, Aug 22, 2024 at 6:06 AM Chris Hegarty
 wrote:

>
>
> > On 15 Aug 2024, at 10:52, Chris Hegarty 
> wrote:
> >
> >> ...
> >>
> >> Chris, Uwe: I also wanted to check with you if this timeline works well
> with regards to supporting Java 23 in 9.last and 10.0?
> >
> > Yes, this works for JDK 23.
> >
> > While JDK 23 is scheduled to ship on 17th Sep, there is already an
> initial release candidate, which is all we need really need for testing and
> verification. The Memory Segment code should be fine as-is, but the code
> using the Panama Vector API will minimally need to be tested and the
> runtime check bumped. I’ll get this done soon.
>
> Done. Unless I missed something. See
> https://github.com/apache/lucene/pull/13678
>
> -Chris
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Baffling performance regression measured by luceneutil

2024-08-15 Thread Greg Miller

Hi folks-

Egor Potemkin and I have been digging into a baffling performance
regression we're seeing in response to a one-line change that doesn't
rationally seem like it should have any performance impact what-so-ever.
There's more background on why we're trying to understand this, but I'll
save the broader context for now and just focus on the confusing issue
we're trying to understand.

Inside IndexSearcher, we've staged a change that initializes an ArrayList
of Collectors slightly earlier than what we do today (see:
https://github.com/apache/lucene/pull/13657/files). We end up with code
that looks like this (note the isolated line that's initializing
`collectors`):

```
  public  T search(Query query, CollectorManager collectorManager)
  throws IOException {
final LeafSlice[] leafSlices = getSlices();
final C firstCollector = collectorManager.newCollector();
query = rewrite(query, firstCollector.scoreMode().needsScores());
final Weight weight = createWeight(query, firstCollector.scoreMode(),
1);

final List collectors = new ArrayList<>(leafSlices.length);

return search(weight, collectorManager, firstCollector, collectors,
leafSlices);
  }
```

What's baffling is that if we initialize the `collectors` list _after_ the
call to `createWeight` (as shown here), there's no performance impact at
all (as expected). But if all we do is initialize `collectors` _before_ the
call to `createWeight`, we see a very significant regression on LowTerm,
MedTerm, HighTerm tasks in luceneutil (e.g., %15 - 30%). At the other end,
we see a significant improvement to OrHighNotLow, OrHighNotMed,
OrHighNotHigh (e.g., 7% - 15%). (This is running wikimedium10m on an
x86-based AWS ec2 host, but results reproduced separately for Egor and in
our nightly benchmark runs; full luceneutil output at the bottom of this
email [1]). Some additional context and conversation is captured in this
"demo" PR: https://github.com/apache/lucene/pull/13657.

My only hunch here is this has something to do with hotspot's decision
making or some other such runtime optimization, but I'm getting out of my
depth and hoping someone in this community will have ideas on ways to
continue this investigation. Anyone have a clue what might be going on? Or
any suggestions on other things to look at? This isn't a purely academic
exercise for what it's worth. This oddity has caused us to duplicate some
code in IndexSearcher to work with a new sandbox faceting module, so it
would be nice to figure this out so we can remove the code duplication.
(The code duplication is pretty minor, but it's still really frustrating
and it's a trap waiting to be hit by someone in the future that tries to
consolidate the code duplication and runs into this)

Thanks for reading, and thanks in advance for any ideas!

Cheers,
-Greg


[1] Full Lucene util output:
```
TaskQPS baseline  StdDevQPS
my_modified_version  StdDevPct diff p-value
 MedTerm  513.21  (4.9%)  369.43
 (4.8%)  -28.0% ( -35% -  -19%) 0.000
HighTerm  523.20  (6.9%)  402.11
 (5.0%)  -23.1% ( -32% -  -12%) 0.000
 LowTerm  837.70  (3.9%)  715.94
 (3.9%)  -14.5% ( -21% -   -6%) 0.000
   BrowseDayOfYearSSDVFacets   11.97 (18.9%)   11.31
(11.9%)   -5.5% ( -30% -   31%) 0.273
MedTermDayTaxoFacets   23.03  (4.9%)   21.95
 (6.4%)   -4.7% ( -15% -6%) 0.009
  HighPhrase  143.93  (8.3%)  139.35
 (4.7%)   -3.2% ( -14% -   10%) 0.136
  Fuzzy2   53.03  (9.0%)   51.50
 (7.3%)   -2.9% ( -17% -   14%) 0.265
 MedSpanNear   50.70  (5.1%)   49.26
 (3.0%)   -2.8% ( -10% -5%) 0.032
   LowPhrase   70.38  (4.9%)   68.60
 (5.3%)   -2.5% ( -12% -8%) 0.118
   MedPhrase   88.15  (5.2%)   86.03
 (4.2%)   -2.4% ( -11% -7%) 0.105
  OrHighMedDayTaxoFacets7.01  (5.5%)6.86
 (5.4%)   -2.0% ( -12% -9%) 0.237
HighSpanNear   28.95  (2.7%)   28.42
 (2.9%)   -1.8% (  -7% -3%) 0.043
 MedSloppyPhrase  201.71  (3.3%)  198.58
 (3.1%)   -1.6% (  -7% -4%) 0.124
BrowseDateTaxoFacets   23.97 (28.7%)   23.62
(22.8%)   -1.5% ( -41% -   70%) 0.858
 AndHighMedDayTaxoFacets   32.81  (5.8%)   32.35
 (7.1%)   -1.4% ( -13% -   12%) 0.493
AndHighHighDayTaxoFacets   27.86  (6.1%)   27.50
 (6.5%)   -1.3% ( -13% -   12%) 0.507
 LowSloppyPhrase  149.20  (2.9%)  147.50
 (3.0%)   -1.1% (  -6% -4%) 0.227
HighTermTitleBDVSort   66.72  (6.6%)   66.04
 (5.7%)   -1.0% ( -12% -   12%) 0.604
 AndHighHigh  187.45  (7.4%)  185.75
 (6.7%)   -0.9%

Re: AbstractMultiTermQueryConstantScoreWrapper cost estimates (https://github.com/apache/lucene/issues/13029)

2024-08-02 Thread Greg Miller

Hey Froh-

I got some time to look through your PR (most of the time was actually
refreshing my memory on the change history leading up to your PR and
digesting the issue described). I think this makes a ton of sense. If I'm
understanding properly, the latest version of your PR essentially takes
advantage of Mayya's recent change (
https://github.com/apache/lucene/pull/13454) in the score supplier behavior
that is now doing _some_ up-front work to iterate the first <= 16 terms
when building the scoreSupplier and computes a more accurate/reasonable
cost based on that already-done work. Am I getting this right? If so, this
seems like it has no downsides and all upside.

I'll do a proper pass through the PR here shortly, but I love the idea
(assuming I'm understanding it properly on a Friday afternoon after a
long-ish week...).

Cheers,
-Greg

On Thu, Aug 1, 2024 at 7:47 PM Greg Miller  wrote:

> Hi Froh-
>
> Thanks for raising this and sorry I missed your tag in GH#13201 back in
> June (had some vacation and was generally away). I'd be interested to see
> what others think as well, but I'll at least commit to looking through your
> PR tomorrow or Monday to get a better handle on what's being proposed. We
> went through a few iterations of this originally before we landed on the
> current version. One promising approach was to have a more intelligent
> query that would load some number of terms up-front to get a better cost
> estimate before making a decision, but it required a custom query
> implementation that generally didn't get favorable feedback (it's nice to
> be able to use the existing IndexOrDocValuesQuery abstraction instead). I
> can dig up some of that conversation if it's helpful, but I'll better
> understand what you've got in mind first.
>
> Unwinding a bit though, I'm also in favor in general that we should be
> able to do a better job estimating cost here. I think the tricky part is
> how we go about doing that effectively. Thanks again for kicking off this
> thread!
>
> Cheers,
> -Greg
>
> On Thu, Aug 1, 2024 at 5:58 PM Michael Froh  wrote:
>
>> Hi there,
>>
>> For a few months, some of us have been running into issues with the cost
>> estimate from AbstractMultiTermQueryConstantScoreWrapper. (
>> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L300
>> )
>>
>> In https://github.com/apache/lucene/issues/13029, the problem was raised
>> in terms of queries not being cached, because the estimated cost was too
>> high.
>>
>> We've also run into problems in OpenSearch, since we started wrapping
>> MultiTermQueries in IndexOrDocValueQuery. The MTQ gets an exaggerated cost
>> estimate, so IndexOrDocValueQuery decides it should be a DV query, even
>> though the MTQ would really only match a handful of docs (and should be
>> lead iterator).
>>
>> I opened a PR back in March (https://github.com/apache/lucene/pull/13201)
>> to try to handle the case where a MultiTermQuery matches a small number of
>> terms. Since Mayya pulled the rewrite logic that expands up to 16 terms (to
>> rewrite as a Boolean disjunction) earlier in the workflow (in
>> https://github.com/apache/lucene/pull/13454), we get the better cost
>> estimate for MTQs on few terms "for free".
>>
>> What do folks think?
>>
>> Thanks,
>> Froh
>>
>

Re: AbstractMultiTermQueryConstantScoreWrapper cost estimates (https://github.com/apache/lucene/issues/13029)

2024-08-01 Thread Greg Miller

Hi Froh-

Thanks for raising this and sorry I missed your tag in GH#13201 back in
June (had some vacation and was generally away). I'd be interested to see
what others think as well, but I'll at least commit to looking through your
PR tomorrow or Monday to get a better handle on what's being proposed. We
went through a few iterations of this originally before we landed on the
current version. One promising approach was to have a more intelligent
query that would load some number of terms up-front to get a better cost
estimate before making a decision, but it required a custom query
implementation that generally didn't get favorable feedback (it's nice to
be able to use the existing IndexOrDocValuesQuery abstraction instead). I
can dig up some of that conversation if it's helpful, but I'll better
understand what you've got in mind first.

Unwinding a bit though, I'm also in favor in general that we should be able
to do a better job estimating cost here. I think the tricky part is how we
go about doing that effectively. Thanks again for kicking off this thread!

Cheers,
-Greg

On Thu, Aug 1, 2024 at 5:58 PM Michael Froh  wrote:

> Hi there,
>
> For a few months, some of us have been running into issues with the cost
> estimate from AbstractMultiTermQueryConstantScoreWrapper. (
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L300
> )
>
> In https://github.com/apache/lucene/issues/13029, the problem was raised
> in terms of queries not being cached, because the estimated cost was too
> high.
>
> We've also run into problems in OpenSearch, since we started wrapping
> MultiTermQueries in IndexOrDocValueQuery. The MTQ gets an exaggerated cost
> estimate, so IndexOrDocValueQuery decides it should be a DV query, even
> though the MTQ would really only match a handful of docs (and should be
> lead iterator).
>
> I opened a PR back in March (https://github.com/apache/lucene/pull/13201)
> to try to handle the case where a MultiTermQuery matches a small number of
> terms. Since Mayya pulled the rewrite logic that expands up to 16 terms (to
> rewrite as a Boolean disjunction) earlier in the workflow (in
> https://github.com/apache/lucene/pull/13454), we get the better cost
> estimate for MTQs on few terms "for free".
>
> What do folks think?
>
> Thanks,
> Froh
>

Re: Welcome Armin Braun as Lucene comitter

2024-07-26 Thread Greg Miller

Welcome Armin!

On Fri, Jul 26, 2024 at 10:51 AM Patrick Zhai  wrote:

> Congrats and welcome, Armin!
>
> On Fri, Jul 26, 2024, 10:30 Vigya Sharma  wrote:
>
>> Congratulations and welcome, Armin! Volunteering as a firefighter is
>> amazing, respect!
>>
>> On Fri, Jul 26, 2024 at 1:46 AM Ignacio Vera  wrote:
>>
>>> Welcome Armin!
>>>
>>> On Fri, Jul 26, 2024 at 10:16 AM Chris Hegarty
>>>  wrote:
>>> >
>>> > Welcome Armin!
>>> >
>>> > -Chris.
>>> >
>>> > > On 26 Jul 2024, at 05:24, Anshum Gupta 
>>> wrote:
>>> > >
>>> > > Congratulations and welcome, Armin!
>>> > >
>>> > > On Thu, Jul 25, 2024 at 2:10 AM Luca Cavanna 
>>> wrote:
>>> > > I'm pleased to announce that Armin Braun has accepted the PMC's
>>> invitation to become a Lucene committer.
>>> > >
>>> > > Armin, the tradition is that new committers introduce themselves
>>> with a brief bio.
>>> > >
>>> > > Thanks for your contributions so far and looking forward to the
>>> upcoming ones :)
>>> > >
>>> > > Congratulations and welcome!
>>> > >
>>> > >
>>> > > --
>>> > > Anshum Gupta
>>> >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> --
>> - Vigya
>>
>

Re: Please help me find a good first issue

2024-07-26 Thread Greg Miller

Hi Lucas!

Thanks for your interest and for reaching out. Sounds like your background
could provide you with a useful set of fresh eyes to view our codebase
through!

This question comes up a lot (good starter issues) and I don't think our
answers are ever all that satisfying. I will share a few thoughts though
(all just my personal opinions):

1. Issues that familiarize you with parts of the codebase you're interested
in are always useful to seek out. Sounds like a lot of your interest may
gravitate towards the "core" module? Just a guess.
2. Looking at active PRs that are being iterated on is actually a really
good way to get started. It will be slow going at first, but will force you
to go understand parts of the codebase. Said differently, it can be useful
to focus on providing feedback on changes other people are working on at
first as opposed to setting off and making changes (although I know that's
less fun a lot of the time).

For something more concrete, Mike M. (long time community member) has a
nice tool for searching issues and PRs. I added some filters to look for
open issues that have had no comments and are not assigned to anyone (good
sign they are not being worked on) along with a couple other filters. This
could provide a good list to start with:

https://githubsearch.mikemccandless
.com/search.py?chg=page&text=&a1=1&a2=undefined&page=0&searcher=36672&sort=recentlyUpdated&format=list&id=vjf5tu0klway&dd=status%3AOpen&dd=issue_or_pr%
3AIssue&dd=comment_count%3A0&dd=issue_type%3Aenhancement&dd=assignee%
3AUnassigned&newText=

Just scanning through that list briefly, here are some that jumped out as
possibly good starting points (but I didn't look in detail so I would
suggest asking on the issue if it's still relevant and check that nobody is
working on it).

* https://github.com/apache/lucene/issues/13207
* https://github.com/apache/lucene/issues/13598
* https://github.com/apache/lucene/issues/13084
* https://github.com/apache/lucene/issues/12919

Best of luck and have fun!

Cheers,
-Greg

On Wed, Jul 24, 2024 at 2:56 AM Lucas Wolf  wrote:

> Hi everyone,
>
> My name is Lucas and I am interested in contributing to Lucene.
>
> I have read through the issues list on GitHub but felt that I was lacking
> a bit of context on what is achievable/impactful to tackle as a newcomer.
> Perhaps someone here can help me out. :)
>
> My background is mostly in main-memory relational database (performance)
> engineering in C++. However, I recently became interested in JVM/OpenJDK
> internals and am looking for a project to put my knowledge to good use.
>
> I'm generally open to anything, except perhaps Vector Search, as that
> would likely pose a conflict of interest with my day job.
>
> Thanks!
>
> Best,
> Lucas Wolf
>

New Lucene PMC Chair: Chris Hegarty

2024-01-19 Thread Greg Miller

Hello Lucene developers-

I wanted to let you know that the Lucene PMC has elected a new Chair—Chris
Hegarty—and the board has approved the appointment. It's been an honor to
fill this role for the past year, but it's time to pass the torch to
someone new.

Chris- thank you for stepping up for this role and congratulations!

Cheers,
-Greg

Re: Welcome Stefan Vodita as Lucene committter

2024-01-19 Thread Greg Miller

Welcome Stefan! Glad to have you!

On Fri, Jan 19, 2024 at 08:00 Michael Sokolov  wrote:

> Hello Stefan, welcome!
>
> On Fri, Jan 19, 2024 at 10:41 AM Martin Gainty 
> wrote:
>
>> Congratulations Stefan!
>>
>> I look forward to reading your posts
>>
>> ~martin
>> --
>> *From:* Michael McCandless 
>> *Sent:* Thursday, January 18, 2024 10:53 AM
>> *To:* dev@lucene.apache.org 
>> *Subject:* Welcome Stefan Vodita as Lucene committter
>>
>> Hi Team,
>>
>> I'm pleased to announce that Stefan Vodita has accepted the Lucene PMC's
>> invitation to become a committer!
>>
>> Stefan, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations, welcome, and thank you for all your improvements to
>> Lucene and our community,
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>

Re: [VOTE] Release Lucene 9.9.1 RC1

2023-12-13 Thread Greg Miller

SUCCESS! [2:27:01.875939]

+1

Thanks!
-Greg

On Wed, Dec 13, 2023 at 3:58 AM Chris Hegarty
 wrote:

> And (short) release note:
>
>   https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote9_9_1
>
> -Chris.
>
> > On 13 Dec 2023, at 11:55, Chris Hegarty 
> wrote:
> >
> > Hi,
> >
> > Please vote for release candidate 1 for Lucene 9.9.1
> >
> > The artifacts can be downloaded from:
> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.1-RC1-rev-eee32cbf5e072a8c9d459c349549094230038308
> >
> > You can run the smoke tester directly with this command:
> >
> > python3 -u dev-tools/scripts/smokeTestRelease.py \
> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.9.1-RC1-rev-eee32cbf5e072a8c9d459c349549094230038308
> >
> > The vote will be open for at least 72 hours i.e. until 2023-12-16 12:00
> UTC.
> >
> > [ ] +1  approve
> > [ ] +0  no opinion
> > [ ] -1  disapprove (and reason why)
> >
> > Here is my +1
> >
> > -Chris.
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [JENKINS] Lucene » Lucene-Check-9.x - Build # 7134 - Unstable!

2023-12-11 Thread Greg Miller

I was able to repro locally and track this down to an invalid test
assumption that incorrectly relies on doc ordering. We can relax some of
the randomness in the test case to ensure the doc ordering, but I think the
right thing to do is to relax the test assertions instead. The failing
assertion isn’t really necessary to the test, and I like keeping the
randomness.

I pushed a fix for this on main and branch_9x (captured in
https://github.com/apache/lucene/pull/12920). If anyone disagrees with the
approach to the fix, I'm happy to iterate.

Cheers,
-Greg

On Mon, Dec 11, 2023 at 12:49 Greg Miller  wrote:

> Shoot, looking into this. This is a new test I added a month ago in
> GH#12640. Not sure if we hit a random bug in the test or if the bug fix I
> made in that change was incomplete somehow and this uncovered it.
>
> Cheers,
> -Greg
>
> On Mon, Dec 11, 2023 at 11:17 AM Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
>> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/7134/
>>
>> 2 tests failed.
>> FAILED:
>> org.apache.lucene.facet.TestDrillSideways.testCollectionTerminated
>>
>> Error Message:
>> java.lang.AssertionError: expected:<1> but was:<2>
>>
>> Stack Trace:
>> java.lang.AssertionError: expected:<1> but was:<2>
>> at
>> __randomizedtesting.SeedInfo.seed([18C2A46594D4CAB4:8AD53C0B3D6F9A5]:0)
>> at junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
>> at junit@4.13.1/org.junit.Assert.failNotEquals(Assert.java:835)
>> at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:647)
>> at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:633)
>> at
>> org.apache.lucene.facet.TestDrillSideways.testCollectionTerminated(TestDrillSideways.java:332)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
>> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at junit@4.13.1
>> /org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>> at randomizedtesting.runner@2.8.1
>> /com.ca

Re: [JENKINS] Lucene » Lucene-Check-9.x - Build # 7134 - Unstable!

2023-12-11 Thread Greg Miller

Shoot, looking into this. This is a new test I added a month ago in
GH#12640. Not sure if we hit a random bug in the test or if the bug fix I
made in that change was incomplete somehow and this uncovered it.

Cheers,
-Greg

On Mon, Dec 11, 2023 at 11:17 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/7134/
>
> 2 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testCollectionTerminated
>
> Error Message:
> java.lang.AssertionError: expected:<1> but was:<2>
>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
> at
> __randomizedtesting.SeedInfo.seed([18C2A46594D4CAB4:8AD53C0B3D6F9A5]:0)
> at junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
> at junit@4.13.1/org.junit.Assert.failNotEquals(Assert.java:835)
> at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:647)
> at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:633)
> at
> org.apache.lucene.facet.TestDrillSideways.testCollectionTerminated(TestDrillSideways.java:332)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1
> /org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.test_framework@9.10.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.

Re: [JENKINS-EA] Lucene-main-Windows (64bit/hotspot/jdk-22-ea+26) - Build # 13501 - Unstable!

2023-12-08 Thread Greg Miller

Saw this a couple times so I reproduced locally. Since it repo'd for me as
well (against main), I opened an issue:
https://github.com/apache/lucene/issues/12896

On Fri, Dec 8, 2023 at 11:00 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Windows/13501/
> Java: 64bit/hotspot/jdk-22-ea+26 -XX:-UseCompressedOops -XX:+UseParallelGC
>
> 2 tests failed.
> FAILED:
> org.apache.lucene.search.join.TestParentBlockJoinByteKnnVectorQuery.testScoringWithMultipleChildren
>
> Error Message:
> java.lang.AssertionError: expected:<1> but was:<2>
>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
> at
> __randomizedtesting.SeedInfo.seed([98B9F347D87BF07E:B6A53152CB7F318C]:0)
> at junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
> at junit@4.13.1/org.junit.Assert.failNotEquals(Assert.java:835)
> at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:647)
> at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:633)
> at
> org.apache.lucene.search.join.ParentBlockJoinKnnVectorQueryTestCase.testScoringWithMultipleChildren(ParentBlockJoinKnnVectorQueryTestCase.java:205)
> at
> org.apache.lucene.search.join.TestParentBlockJoinByteKnnVectorQuery.testScoringWithMultipleChildren(TestParentBlockJoinByteKnnVectorQuery.java:25)
> at
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> at java.base/java.lang.reflect.Method.invoke(Method.java:580)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1
> /org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at randomizedtesting.runner@2.

Re: [VOTE] Release Lucene 9.9.0 RC1

2023-12-05 Thread Greg Miller

Thanks Adrien, that makes sense. I was wondering how we'd ensure all API
breakages in a major release were covered with deprecation messages. Sounds
like this is the answer.

Cheers,
-Greg

On Thu, Nov 30, 2023 at 11:14 AM Adrien Grand  wrote:

> My expectation is that we will do a 9.x minor at about the same time as
> 10.0 anyway, this is what we have done in the past for new majors. This
> will give an opportunity to make sure we have deprecation warnings for all
> breaking changes in 10.0.
>
> Le jeu. 30 nov. 2023, 10:43, Chris Hegarty
>  a écrit :
>
>> For clarity, consider this vote cancelled. A new vote has been started on
>> an RC2 build.
>>
>> On 30 Nov 2023, at 16:22, Greg Miller  wrote:
>>
>> If we're spinning a new RC, I'd like to ask this group if it would make
>> sense to pull this very small method deprecation in:
>> https://github.com/apache/lucene/pull/12854
>>
>> If there's a chance we don't release a 9.10 and go directly to 10.0, this
>> would be our last opportunity to mark it deprecated on a 9.x version so we
>> can actually remove it in 10.0. It's really minor though, so I don't want
>> to create churn, but if we can get it into 9.9 without much issue, it would
>> be nice. If folks agree, I can get it merged onto 9.9.
>>
>>
>> Thanks for raising the issue. I don’t have a strong opinion on whether or
>> not to do the deprecation in this release, and since you say that it is
>> minor, then I don’t see that it necessitates another respin.
>>
>> Since I had already started an RC2 build, then I just continued with it
>> (and since the above issue is not yet reviewed ). If others feel like the
>> deprecation should absolutely be in, then we can do an RC3.
>>
>> -Chris.
>>
>> Cheers,
>> -Greg
>>
>> On Thu, Nov 30, 2023 at 7:58 AM Michael Sokolov 
>> wrote:
>>
>>> for the sake of posterity, I did get a successful smoketest:
>>>
>>> SUCCESS! [1:00:06.512261]
>>>
>>> but +0 to release I guess since it's moot...
>>>
>>> On Thu, Nov 30, 2023 at 10:38 AM Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>> On Thu, Nov 30, 2023 at 9:56 AM Chris Hegarty
>>>>  wrote:
>>>>
>>>> P.S. I’m less sure about this, but the RC 2 starts a 72hr voting time
>>>>> again? (Just so I know what TTL to put on that)
>>>>>
>>>>
>>>> Yeah a new 72 hour clock starts with each new RC :)
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>
>>

Re: [VOTE] Release Lucene 9.9.0 RC1

2023-11-30 Thread Greg Miller

> Thanks for raising the issue. I don’t have a strong opinion on whether or
not to do the deprecation in this release, and since you say that it is
minor, then I don’t see that it necessitates another respin. Since I had
already started an RC2 build, then I just continued with it (and since the
above issue is not yet reviewed ). If others feel like the deprecation
should absolutely be in, then we can do an RC3.

++, makes total sense. Not worth stalling the RC. If RC2 fails to go
forward for some other reason, I'd like to see if I can get this into RC3,
but I wouldn't block RC2 for this minor change. Thanks!

On Thu, Nov 30, 2023 at 10:43 AM Chris Hegarty
 wrote:

> For clarity, consider this vote cancelled. A new vote has been started on
> an RC2 build.
>
> On 30 Nov 2023, at 16:22, Greg Miller  wrote:
>
> If we're spinning a new RC, I'd like to ask this group if it would make
> sense to pull this very small method deprecation in:
> https://github.com/apache/lucene/pull/12854
>
> If there's a chance we don't release a 9.10 and go directly to 10.0, this
> would be our last opportunity to mark it deprecated on a 9.x version so we
> can actually remove it in 10.0. It's really minor though, so I don't want
> to create churn, but if we can get it into 9.9 without much issue, it would
> be nice. If folks agree, I can get it merged onto 9.9.
>
>
> Thanks for raising the issue. I don’t have a strong opinion on whether or
> not to do the deprecation in this release, and since you say that it is
> minor, then I don’t see that it necessitates another respin.
>
> Since I had already started an RC2 build, then I just continued with it
> (and since the above issue is not yet reviewed ). If others feel like the
> deprecation should absolutely be in, then we can do an RC3.
>
> -Chris.
>
> Cheers,
> -Greg
>
> On Thu, Nov 30, 2023 at 7:58 AM Michael Sokolov 
> wrote:
>
>> for the sake of posterity, I did get a successful smoketest:
>>
>> SUCCESS! [1:00:06.512261]
>>
>> but +0 to release I guess since it's moot...
>>
>> On Thu, Nov 30, 2023 at 10:38 AM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> On Thu, Nov 30, 2023 at 9:56 AM Chris Hegarty
>>>  wrote:
>>>
>>> P.S. I’m less sure about this, but the RC 2 starts a 72hr voting time
>>>> again? (Just so I know what TTL to put on that)
>>>>
>>>
>>> Yeah a new 72 hour clock starts with each new RC :)
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>
>

Re: [VOTE] Release Lucene 9.9.0 RC1

2023-11-30 Thread Greg Miller

If we're spinning a new RC, I'd like to ask this group if it would make
sense to pull this very small method deprecation in:
https://github.com/apache/lucene/pull/12854

If there's a chance we don't release a 9.10 and go directly to 10.0, this
would be our last opportunity to mark it deprecated on a 9.x version so we
can actually remove it in 10.0. It's really minor though, so I don't want
to create churn, but if we can get it into 9.9 without much issue, it would
be nice. If folks agree, I can get it merged onto 9.9.

Cheers,
-Greg

On Thu, Nov 30, 2023 at 7:58 AM Michael Sokolov  wrote:

> for the sake of posterity, I did get a successful smoketest:
>
> SUCCESS! [1:00:06.512261]
>
> but +0 to release I guess since it's moot...
>
> On Thu, Nov 30, 2023 at 10:38 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> On Thu, Nov 30, 2023 at 9:56 AM Chris Hegarty
>>  wrote:
>>
>> P.S. I’m less sure about this, but the RC 2 starts a 72hr voting time
>>> again? (Just so I know what TTL to put on that)
>>>
>>
>> Yeah a new 72 hour clock starts with each new RC :)
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>

Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-10 Thread Greg Miller

Congrats and welcome Patrick!

On Fri, Nov 10, 2023 at 12:05 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> I'm happy to announce that Patrick Zhai has accepted an invitation to join
> the Lucene Project Management Committee (PMC)!
>
> Congratulations Patrick, thank you for all your hard work improving
> Lucene's community and source code, and welcome aboard!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>

Re: LeafCollector#finish idempotency?

2023-10-09 Thread Greg Miller

Thanks Adrien & Mike! I hadn't seen the Solr email thread, but that's a
good reference (as is the AssertingLeafCollector implementation). I've
opened two follow-up PRs as a result:

1. Fix a buried drill-sideways bug where #finish can be called more than
once: https://github.com/apache/lucene/pull/12642
2. Add a small note to LeafCollector#finish javadoc:
https://github.com/apache/lucene/pull/12643

Cheers,
-Greg

On Mon, Oct 9, 2023 at 1:45 PM Mike Drob  wrote:

> Not sure if you saw this, Greg, but Alex ran into a similar question
> recently from Solr.
> https://lists.apache.org/thread/1gs3nsv1mcns1czdtdnqyz84f31tqm2x
>
> On Mon, Oct 9, 2023 at 10:47 AM Adrien Grand  wrote:
>
>> Hi Greg,
>>
>> I agree that LeafCollector implementations should be able to assume that
>> finish() only gets called once. The test framework already makes this
>> assumption:
>> https://github.com/apache/lucene/blob/dfff1e635805ffc61dd6029a8060e2635bfcbdb9/lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingLeafCollector.java#L95-L100
>> .
>>
>> On Mon, Oct 9, 2023 at 5:38 PM Greg Miller  wrote:
>>
>>> Hey folks-
>>>
>>> I'm curious if anyone has thoughts around idempotency concerns related
>>> to the LeafCollector#finish API added in GH#12380
>>> <https://github.com/apache/lucene/pull/12380>. My expectation would be
>>> that LeafCollector implementations should be able to assume #finish will
>>> only get called once. In fact, it looks like FacetsCollector is already
>>> making that assumption.
>>>
>>> Is this inline with other folks' expectations? If so, I'm going to, 1)
>>> address a small bug related to drill-sideways that results #finish being
>>> called multiple times on one of the collectors, and 2) propose some
>>> additional javadoc on LeafCollector#finish clarifying this.
>>>
>>> Make sense?
>>>
>>> Cheers,
>>> -Greg
>>>
>>
>>
>> --
>> Adrien
>>
>

LeafCollector#finish idempotency?

2023-10-09 Thread Greg Miller

Hey folks-

I'm curious if anyone has thoughts around idempotency concerns related to
the LeafCollector#finish API added in GH#12380
. My expectation would be that
LeafCollector implementations should be able to assume #finish will only
get called once. In fact, it looks like FacetsCollector is already making
that assumption.

Is this inline with other folks' expectations? If so, I'm going to, 1)
address a small bug related to drill-sideways that results #finish being
called multiple times on one of the collectors, and 2) propose some
additional javadoc on LeafCollector#finish clarifying this.

Make sense?

Cheers,
-Greg

IndexOrDocValuesQuery vs. "Index or Nothing Query"?

2023-10-03 Thread Greg Miller

Hi folks-

I've got what I suspect is a fairly uncommon use-case, but I wanted to
reach out to this group to see if it resonates with anyone else. I'll avoid
going into all the details for now to keep this email terse and to the
point, but I'm happy to elaborate further on the use-case if helpful.

Does anyone have a use-case for an "index or nothing" query that behaves
similarly to IndexOrDocValuesQuery but has a no-op query as the doc-values
query (i.e., MatchAllDocsQuery)? We have a fairly common use-case in
Amazon's Product Search engine where we want to optionally—based on query
heuristics—add a numeric range query, using the points index, to the
first-phase execution. We don't ever want to add a doc values-based
approximation though. Essentially, we have this separate way of doing more
costly post-filtering that we don't directly model as a single doc
values-based query. In cases where our numeric range approximation is very
selective, we'd like to add a points-based index query to the approximation
phase to help narrow down candidates, but in cases where our numeric range
is not very selective (relative to the rest of the query), we'd like to
skip using an approximation clause altogether.

We used to use IndexOrDocValuesQuery for this with a MatchAllDocsQuery
provided as the doc values-based query, but the behavior changed in Lucene
9.1 (GH#715 ) to rewrite the
query into a MatchAllDocsQuery if either provided query were MatchAllDocs,
which kills our ability to do this. I understand why we would assume that
the both queries provided to IndexOrDocValuesQuery are functionally
equivalent (and rewrite if either is MatchAllDocs), but it leads us down a
path of implementing something similar but with a well-understood
MatchAllDocs fallback behavior.

I'm guessing the answer is "no," but does anyone have a similar use-case to
this? Would there be any interest in an IndexOrNothingQuery (maybe in
sandbox)? Does anyone have other use-cases for IndexOrDocValuesQuery where
the two provided queries are _not_ functionally equivalent by design?

Cheers,
-Greg

Re: Lucene 9.8 Release

2023-09-22 Thread Greg Miller

Thanks Patrick! I added one more bullet point for this recent commit to
make expression evaluation lazier:
https://github.com/apache/lucene/pull/12560. I thought the change would be
fairly minor, but in the benchmarks we run internally for Amazon's Product
Search engine, we actually saw a ~23% redline QPS improvement with this
change (and 16% avg latency reduction). Granted, we make pretty heavy use
of expressions, so mileage may vary, but other Lucene users may find nice
wins with this change (I also added a note to that PR).

Cheers,
-g

On Thu, Sep 21, 2023 at 10:50 PM Patrick Zhai  wrote:

> Thank you Uwe!
>
> On Thu, Sep 21, 2023 at 3:27 PM Uwe Schindler  wrote:
>
>> Hi,
>>
>> I also enabled Jenkins jobs for the 9.8 branch today (a bit late, sorry).
>> See https://jenkins.thetaphi.de for the randomized jobs.
>>
>> Uwe
>> Am 21.09.2023 um 19:05 schrieb Patrick Zhai:
>>
>> Thanks Adrien,
>> I plan to start creating the RC tonight, I *think* I have finished all
>> the PGP key set up so
>> I hope it won't be too hard :)
>>
>> On Thu, Sep 21, 2023, 04:10 Adrien Grand  wrote:
>>
>>> Thanks Patrick. I expanded a bit on the optimization section to
>>> highlight the sort of speedup that nightly benchmarks reported, and
>>> moved this section first as I suspect that users would be especially
>>> interested in these speedups.
>>>
>>> Out of curiosity, do you know when you plan on creating a release
>>> candidate?
>>>
>>> On Thu, Sep 21, 2023 at 7:40 AM Patrick Zhai  wrote:
>>> >
>>> > Hi all,
>>> > Here's the draft release note:
>>> https://cwiki.apache.org/confluence/display/LUCENE/Draft+Release+Notes+9.8
>>> >
>>> > Please feel free to edit if you feel like to add anything
>>> >
>>> > Best
>>> > Patrick
>>> >
>>> > On Tue, Sep 19, 2023 at 12:05 AM Adrien Grand 
>>> wrote:
>>> >>
>>> >> Thanks Patrick, this PR is now merged.
>>> >>
>>> >> On Tue, Sep 19, 2023 at 6:22 AM Patrick Zhai 
>>> wrote:
>>> >> >
>>> >> > Update:
>>> >> > Will wait https://github.com/apache/lucene/pull/12568 to be merged
>>> to cut the branch
>>> >> >
>>> >> >
>>> >> > On Mon, Sep 18, 2023 at 11:00 AM Michael Sokolov <
>>> msoko...@gmail.com> wrote:
>>> >> >>
>>> >> >> +1 for a release soon, and thanks for volunteering, Patrick!
>>> >> >>
>>> >> >> On Tue, Sep 12, 2023 at 2:08 AM Patrick Zhai 
>>> wrote:
>>> >> >> >
>>> >> >> > Hi all,
>>> >> >> > It's been a while since the last release and we have quite a few
>>> good changes including new APIs, improvements and bug fixes. Should we
>>> release the 9.8?
>>> >> >> >
>>> >> >> > If there's no objections I volunteer to be the release manager
>>> and will cut the feature branch a week from now, which is Sep. 18th PST.
>>> >> >> >
>>> >> >> > Best
>>> >> >> > Patrick
>>> >> >>
>>> >> >>
>>> -
>>> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >> >>
>>> >>
>>> >>
>>> >> --
>>> >> Adrien
>>> >>
>>> >> -
>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >>
>>>
>>>
>>> --
>>> Adrien
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>

Re: Expressions greedy advanceExact implementation

2023-09-15 Thread Greg Miller

Reviving this thread with another thought…

I think we can improve on this last solution and lazily advance an
expression’s referenced double values without needing to push complexity
down into compiled expression.

What if we do something like this?
https://github.com/apache/lucene/pull/12560

Cheers,
-Greg



On Wed, Oct 26, 2022 at 11:06 Michael Sokolov  wrote:

> Thanks, yeah I thought so too. Merged
>
> On Wed, Oct 26, 2022 at 10:31 AM Robert Muir  wrote:
> >
> > I think deferring the advance call like this is fine and harmless,
> > only because this DoubleValues "caches" the result for the current
> > doc, so its idempotent anyway.
> >
> > Yes, about "advancing all the operands" as I mentioned, expressions
> > has no clue about this. If you wanted to change it, you'd have to push
> > both advancing AND caching down lower into the actual compiled
> > expression code.
> >
> > I think this would add way too much complexity, especially when it
> > would only improve the ternary "if" feature in such cases.
> >
> > On Wed, Oct 26, 2022 at 10:23 AM Michael Sokolov 
> wrote:
> > >
> > > see https://github.com/apache/lucene/pull/11878 ... it doesn't do what
> > > I initially asked for (still advances all of the operands), but it
> > > delays until doubleValue() is called, which is safe and could have
> > > some impact
> > >
> > > On Wed, Oct 26, 2022 at 9:58 AM Michael Sokolov 
> wrote:
> > > >
> > > > Hi, yes, makes sense Mikhail, that will address most of the problem.
> > > > But I also think, given the way Expressions work (they always return
> > > > true from advanceExact) there is no reason for them to advance their
> > > > operands. This shifts the burden/concern from the developer who no
> > > > longer has to think as hard about this :)  - let me post a PR that
> > > > shows
> > > >
> > > > On Wed, Oct 26, 2022 at 3:52 AM Mikhail Khludnev 
> wrote:
> > > > >
> > > > > Hello, Michael.
> > > > > I suppose you can bind f2 to custom lazy implementation of
> DoubleValuesSource, which defer advanceExact() by storing doc num and
> returning true always, and actually advancing on doubleValue() only.
> > > > >
> > > > > On Tue, Oct 25, 2022 at 8:13 PM Michael Sokolov <
> msoko...@gmail.com> wrote:
> > > > >>
> > > > >> ExpressionFunctionValueSource lazily evaluates in doubleValues: an
> > > > >> expression like
> > > > >>
> > > > >>condition ? f1 : f2
> > > > >>
> > > > >> will only evaluate one of f1 or f2.
> > > > >>
> > > > >> At the same time, the advanceExact() call is greedy -- when you
> > > > >> advance that expression it will also advance both f1 and f2. But
> > > > >> here's the thing: it always returns true, regardless of whether
> f1 and
> > > > >> f2 advance. Which makes sense from the point of view of the lazy
> > > > >> evaluation -- if condition is true we don't care whether f2
> advances
> > > > >> or not.
> > > > >>
> > > > >> My question is whether we could defer these child advanceExact
> calls
> > > > >> until ExpressionFunctionValues.doubleValue()?
> > > > >>
> > > > >>
> -
> > > > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > > >> For additional commands, e-mail: dev-h...@lucene.apache.org
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Lucene 9.8 Release

2023-09-12 Thread Greg Miller

+1 thanks Patrick!

On Mon, Sep 11, 2023 at 11:31 PM Adrien Grand  wrote:

> Thanks Patrick for volunteering as release manager!
>
> Le mar. 12 sept. 2023, 08:07, Patrick Zhai  a écrit :
>
>> Hi all,
>> It's been a while since the last release and we have quite a few good
>> changes including new APIs, improvements and bug fixes. Should we release
>> the 9.8?
>>
>> If there's no objections I volunteer to be the release manager and will
>> cut the feature branch a week from now, which is Sep. 18th PST.
>>
>> Best
>> Patrick
>>
>

Re: Welcome Chris Hegarty to the Lucene PMC

2023-06-19 Thread Greg Miller

Congrats! Welcome Chirs!

On Mon, Jun 19, 2023 at 5:02 AM Michael Sokolov  wrote:

> Welcome Chris!
>
> On Mon, Jun 19, 2023, 7:31 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Welcome aboard Chris!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Jun 19, 2023 at 7:16 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Congratulations Chris!
>>>
>>> On Mon, 19 Jun, 2023, 3:23 pm Adrien Grand,  wrote:
>>>
 I'm pleased to announce that Chris Hegarty has accepted an invitation
 to join the Lucene PMC!

 Congratulations Chris, and welcome aboard!

 --
 Adrien

>>>

Re: TermInSetQuery: seekExact vs. seekCeil

2023-05-09 Thread Greg Miller

Thanks for the feedback Robert. This approach sounds like a better path to
follow. I'll explore it. I agree that we should provide default behavior
that is overall best for our users, and not for one specific use-case such
as Amazon search :).

Mike- TermInSetQuery used to use seekExact, and now uses seekCeil. We
haven't used intersect... yet.

Thanks again for the feedback.

Cheers,
-Greg

On Tue, May 9, 2023 at 11:09 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Besides not being able to use the bloom filter, seekCeil is also just more
> costly than seekExact since it is essentially both .seekExact and .next in
> a single operation.
>
> Are either of the two approaches using the intersect method of TermsEnum?
> It might be faster if the number of terms is over some threshold.
>
> It would require building an Automaton out of the set of terms, which is
> fast with DaciukMihovAutomatonBuilder.  Hmm, I think we should rename this
> class maybe.  I'll open an issue.  Naming is the hardest part!
>
> The Codec can implement this quite efficiently since it can do the
> ping-pong skipping Patrick is referring to on a byte-by-byte basis in each
> of the sources of Term iteration.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, May 5, 2023 at 9:34 PM Patrick Zhai  wrote:
>
>> Hi Greg
>> IMO I still think the seekCeil is a better solution for the default
>> posting format, as it could potentially save time on traversing the FST by
>> doing the ping-pong skipping.
>> I can see that in the case of using bloom filter the seekExact might be
>> better but I'm not sure whether there is a better way than overriding the
>> `getTermsEnum`...
>>
>> Patrick
>>
>> On Fri, May 5, 2023 at 4:45 PM Greg Miller  wrote:
>>
>>> Hi folks-
>>>
>>> Back in GH#12156 (https://github.com/apache/lucene/pull/12156), we
>>> rewrote TermInSetQuery to extend MultiTermQuery. With this change,
>>> TermInSetQuery can now leverage the various "rewrite methods" available to
>>> MultiTermQuery, allowing users to customize the query evaluation strategy
>>> (e.g., postings vs. doc values, etc.), which was a nice win. In the
>>> benchmarks we ran, we didn't see any performance issues.
>>>
>>> In anticipation of 9.6 releasing, I've pulled this change into the
>>> Lucene snapshot we use for Amazon product search, and started running some
>>> additional benchmarks, which have surfaced an interesting issue. One
>>> use-case we have for TermInSetQuery creates a term disjunction over a field
>>> that's using bloom filtering (i.e., BloomFilterPostingsFormat). Because
>>> bloom filtering can only help with seekExact and not seekCeil, we're seeing
>>> a performance regression (primarily in red-line QPS).
>>>
>>> One way I can think to address this is to move back to a seekExact
>>> approach when creating the filtered TermsEnum used by MultiTermQuery (for
>>> the TermInSetQuery implementation). Because TermInSetQuery can provide all
>>> of its terms up-front, we can have a simpler term intersection
>>> implementation that relies on seekExact over seekCeil. Here's a quick take
>>> on what I'm thinking:
>>> https://github.com/gsmiller/lucene/commit/e527c5d9b26ee53826b56b270d7c96db18bfaee5.
>>> I've tested this internally and confirmed it solves our QPS regression
>>> problem.
>>>
>>> I'm curious if anyone has an objection to moving back to a seekExact
>>> term intersection approach for TermInSetQuery, or has alternative ideas. I
>>> wonder if I'm overlooking some important factors and focusing too much on
>>> this specific case where the bloom filter interaction is hurting
>>> performance? It seems like seekCeil could provide benefits in some cases
>>> over seekExact by skipping over multiple query terms at a time, so that's a
>>> possible consideration. If we solve for the most common cases by default, I
>>> suppose advanced users could always override TermInSetQuery#getTermsEnum as
>>> necessary (we could take this approach internally for example to work with
>>> our bloom filtering if the best default is to leverage seekCeil). I can
>>> easily turn my quick solution into a PR, but before I do, I wanted to poll
>>> this group for thoughts on the approach or other alternatives I might be
>>> overlooking. Thanks in advance!
>>>
>>> Cheers,
>>> -Greg
>>>
>>

Re: TermInSetQuery: seekExact vs. seekCeil

2023-05-09 Thread Greg Miller

Thanks Patrick. I tend to agree with you for the default behavior. Bloom
filter usage seems like a bit of a less-common case on the surface at least
(e.g., it's expected behavior for query terms to not be present in a given
segment with enough frequency to justify the additional codec layer). A
primary key-like field is sort of the exception here, where a
TermInSetQuery can be useful for allow/block-listing semantics—and where
bloom filtering can be helpful. As an aside, given that TermInSetQuery
already has some semi-special logic for recognizing primary key-like
fields, and with this additional consideration, it makes me wonder if a
special-purpose IDSet query or something might make sense at some point.

For now though, I like the idea of leaving TermInSetQuery as-is, since
users can extend it and change the behavior of getTerms if they really need
to.

Cheers,
-Greg

On Fri, May 5, 2023 at 6:33 PM Patrick Zhai  wrote:

> Hi Greg
> IMO I still think the seekCeil is a better solution for the default
> posting format, as it could potentially save time on traversing the FST by
> doing the ping-pong skipping.
> I can see that in the case of using bloom filter the seekExact might be
> better but I'm not sure whether there is a better way than overriding the
> `getTermsEnum`...
>
> Patrick
>
> On Fri, May 5, 2023 at 4:45 PM Greg Miller  wrote:
>
>> Hi folks-
>>
>> Back in GH#12156 (https://github.com/apache/lucene/pull/12156), we
>> rewrote TermInSetQuery to extend MultiTermQuery. With this change,
>> TermInSetQuery can now leverage the various "rewrite methods" available to
>> MultiTermQuery, allowing users to customize the query evaluation strategy
>> (e.g., postings vs. doc values, etc.), which was a nice win. In the
>> benchmarks we ran, we didn't see any performance issues.
>>
>> In anticipation of 9.6 releasing, I've pulled this change into the Lucene
>> snapshot we use for Amazon product search, and started running some
>> additional benchmarks, which have surfaced an interesting issue. One
>> use-case we have for TermInSetQuery creates a term disjunction over a field
>> that's using bloom filtering (i.e., BloomFilterPostingsFormat). Because
>> bloom filtering can only help with seekExact and not seekCeil, we're seeing
>> a performance regression (primarily in red-line QPS).
>>
>> One way I can think to address this is to move back to a seekExact
>> approach when creating the filtered TermsEnum used by MultiTermQuery (for
>> the TermInSetQuery implementation). Because TermInSetQuery can provide all
>> of its terms up-front, we can have a simpler term intersection
>> implementation that relies on seekExact over seekCeil. Here's a quick take
>> on what I'm thinking:
>> https://github.com/gsmiller/lucene/commit/e527c5d9b26ee53826b56b270d7c96db18bfaee5.
>> I've tested this internally and confirmed it solves our QPS regression
>> problem.
>>
>> I'm curious if anyone has an objection to moving back to a seekExact term
>> intersection approach for TermInSetQuery, or has alternative ideas. I
>> wonder if I'm overlooking some important factors and focusing too much on
>> this specific case where the bloom filter interaction is hurting
>> performance? It seems like seekCeil could provide benefits in some cases
>> over seekExact by skipping over multiple query terms at a time, so that's a
>> possible consideration. If we solve for the most common cases by default, I
>> suppose advanced users could always override TermInSetQuery#getTermsEnum as
>> necessary (we could take this approach internally for example to work with
>> our bloom filtering if the best default is to leverage seekCeil). I can
>> easily turn my quick solution into a PR, but before I do, I wanted to poll
>> this group for thoughts on the approach or other alternatives I might be
>> overlooking. Thanks in advance!
>>
>> Cheers,
>> -Greg
>>
>

Re: How to create a local build that targets Java 11, when building with 17?

2023-05-05 Thread Greg Miller

Hi Jonathan-

The main branch is the tip of development, and what will eventually become
10.0. It can use a later version of Java, make (some)
non-backwards-compatible API changes, etc. branch_9x tracks the latest 9.x
release, and must run on the version of Java supported by 9.x releases,
must be API backwards-compatible, etc. The general approach is to make
changes against main, and then backport those changes to branch_9x in a 9.x
friendly way if possible. Sometimes a change on main is complex enough that
backporting in a 9.x friendly manner isn't really feasible, in which case
the change will be released with 10.0. I'm sure I'm leaving out some
details, but hopefully this is helpful. You may also find this reference
useful:
https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility

Cheers,
-Greg

On Fri, May 5, 2023 at 12:00 PM Jonathan Ellis  wrote:

> Thanks.  What are the rules for what should go into main vs branch_9x?
>
> On Fri, May 5, 2023 at 1:54 PM Dawid Weiss  wrote:
>
>>
>> The main branch is on Java 17, see build.gradle:
>>
>>   // Minimum Java version required to compile and run Lucene.
>>   minJavaVersion = JavaVersion.VERSION_17
>>
>> Also, don't use the default gradle task created by convention; use this
>> one:
>>
>> ./gradlew mavenToLocal
>>
>> it's an alias but it publishes only a subset of relevant projects, not
>> all of them.
>>
>> Dawid
>>
>> On Fri, May 5, 2023 at 8:03 PM Jonathan Ellis  wrote:
>>
>>> Actually my hack doesn't work, the manifest file changes but the .class
>>> files do not.
>>>
>>> On Fri, May 5, 2023 at 12:38 PM Jonathan Ellis 
>>> wrote:
>>>
 `./gradlew publishToMavenLocal` gives me Java 17 class files by
 default, which surprises me since AFAIK 11 is still the minimum to run
 Lucene.

 I hacked it to work by editing javac.gradle
 sourceCompatibility = JavaVersion.VERSION_11
 targetCompatibility = JavaVersion.VERSION_11

 Is there a cleaner way to do this?

 --
 Jonathan Ellis
 co-founder, http://www.datastax.com
 @spyced

>>>
>>>
>>> --
>>> Jonathan Ellis
>>> co-founder, http://www.datastax.com
>>> @spyced
>>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>

TermInSetQuery: seekExact vs. seekCeil

2023-05-05 Thread Greg Miller

Hi folks-

Back in GH#12156 (https://github.com/apache/lucene/pull/12156), we rewrote
TermInSetQuery to extend MultiTermQuery. With this change, TermInSetQuery
can now leverage the various "rewrite methods" available to MultiTermQuery,
allowing users to customize the query evaluation strategy (e.g., postings
vs. doc values, etc.), which was a nice win. In the benchmarks we ran, we
didn't see any performance issues.

In anticipation of 9.6 releasing, I've pulled this change into the Lucene
snapshot we use for Amazon product search, and started running some
additional benchmarks, which have surfaced an interesting issue. One
use-case we have for TermInSetQuery creates a term disjunction over a field
that's using bloom filtering (i.e., BloomFilterPostingsFormat). Because
bloom filtering can only help with seekExact and not seekCeil, we're seeing
a performance regression (primarily in red-line QPS).

One way I can think to address this is to move back to a seekExact approach
when creating the filtered TermsEnum used by MultiTermQuery (for the
TermInSetQuery implementation). Because TermInSetQuery can provide all of
its terms up-front, we can have a simpler term intersection implementation
that relies on seekExact over seekCeil. Here's a quick take on what I'm
thinking:
https://github.com/gsmiller/lucene/commit/e527c5d9b26ee53826b56b270d7c96db18bfaee5.
I've tested this internally and confirmed it solves our QPS regression
problem.

I'm curious if anyone has an objection to moving back to a seekExact term
intersection approach for TermInSetQuery, or has alternative ideas. I
wonder if I'm overlooking some important factors and focusing too much on
this specific case where the bloom filter interaction is hurting
performance? It seems like seekCeil could provide benefits in some cases
over seekExact by skipping over multiple query terms at a time, so that's a
possible consideration. If we solve for the most common cases by default, I
suppose advanced users could always override TermInSetQuery#getTermsEnum as
necessary (we could take this approach internally for example to work with
our bloom filtering if the best default is to leverage seekCeil). I can
easily turn my quick solution into a PR, but before I do, I wanted to poll
this group for thoughts on the approach or other alternatives I might be
overlooking. Thanks in advance!

Cheers,
-Greg

Re: [VOTE] Release Lucene 9.6.0 RC2

2023-05-04 Thread Greg Miller

+1 SUCCESS! [1:02:49.795869]

Cheers,
-Greg

On Wed, May 3, 2023 at 5:14 PM Michael McCandless 
wrote:

> Er, +1 too ;)
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, May 3, 2023 at 8:07 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> SUCCESS! [0:25:52.108112]
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, May 3, 2023 at 2:50 PM Mayya Sharipova
>>  wrote:
>>
>>> +1
>>> SUCCESS! [0:34:12.604559]
>>>
>>> On Wed, May 3, 2023 at 7:38 AM Ignacio Vera  wrote:
>>>
 +1

 SUCCESS! [0:42:35.102645]

 On Wed, May 3, 2023 at 12:45 PM Alan Woodward 
 wrote:

> Please vote for release candidate 2 for Lucene 9.6.0
>
> The artifacts can be downloaded from:
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.6.0-RC2-rev-f94cd1750d198cd0294fb1d967c4e511a7035f1e
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.6.0-RC2-rev-f94cd1750d198cd0294fb1d967c4e511a7035f1e
>
> Given weekends and UK holidays, the vote will be open until next
> Monday, i.e. until 2023-05-08 11:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Fwd: TAC supporting Berlin Buzzwords

2023-03-27 Thread Greg Miller

[forwarding to dev@ and java-users@]

For anyone interested, please see the following note from Gavin on ASF
Travel Assistance for Berlin Buzzwords.

Cheers,
-Greg

-- Forwarded message -
From: Gavin McDonald 
Date: Fri, Mar 24, 2023 at 2:57 AM
Subject: TAC supporting Berlin Buzzwords
To: 


PMCs,

Please forward to your dev and user lists.

Hi All,

The ASF Travel Assistance Committee is supporting taking up to six (6)
people
to attend Berlin Buzzwords In June this year.

This includes Conference passes, and travel & accommodation as needed.

Please see our website at https://tac.apache.org for more information and
how to apply.

Applications close on 15th April.

Good luck to those that apply.

Gavin McDonald (VP TAC)

Re: First time contribution

2023-03-27 Thread Greg Miller

I've said this in the PR to some degree, but wanted to also respond here:

+1 to everything Mike said. THANK YOU for the debugging, filing a super
thorough bug report and PR for a fix. DrillSideways is not the easiest
place to start with Lucene, and kudos for jumping right into it! Great to
see another active participant in Lucene (and in faceting / DrillSideways)!

Cheers,
-Greg

On Thu, Mar 23, 2023 at 1:23 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Ahhh the best bugs come down to tiny fixes!  It could have been worse: it
> could have been a single character fix ;)
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 23, 2023 at 4:09 PM Frederic Thevenet 
> wrote:
>
>> Thanks Michael!
>>
>> Well yeah, it did take me a couple of late night hacking sessions to get
>> to the bottom of this one!
>> The fact that all I got to show for my efforts is literally *a single
>> word* change, is both  disheartening and kinda brilliant at the same time
>> ;-)
>>
>> --
>> Cheers,
>> Frederic
>>
>> On 23/03/2023 19:51, Michael McCandless wrote:
>>
>> Thank you Frederic!  Welcome, and it's great to e-meet you.
>>
>> Debugging DrillSideways must've been great fun ;)
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Mar 23, 2023 at 1:36 PM Frederic Thevenet 
>> wrote:
>>
>>> Hi!
>>>
>>> My name is Frederic Thevenet and I am the maintainer of a FOSS time
>>> series browser and log viewer tool called binjr [0] which uses Lucene to
>>> do some pretty neat things.
>>>
>>> As part of that, I became aware of a raster nasty bug that would cause
>>> searches made via DrillSideways miss documents that should match the
>>> query [1].
>>> After (a lot) of digging I believe I found the issue and therefore
>>> submitted a PR to hopefully fix it[2].
>>>
>>> This is my first attempt at contributing to this project and although I
>>> did read the contribution guidelines over on github, it didn't seem to
>>> contain much, other than opening a PR.
>>>
>>> So I thought I'd start with a short introduction here, thinking it
>>> wouldn't hurt :-)
>>>
>>> Please let me know if I have missed anything, and looking forward to
>>> getting a review on that PR.
>>>
>>> --
>>> Cheers,
>>> Frederic
>>>
>>> [0] https://github.com/binjr/binjr
>>> [1] https://github.com/apache/lucene/issues/12211
>>> [2] https://github.com/apache/lucene/pull/12212
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>

Re: Lucene PMC Chair Greg Miller

2023-03-08 Thread Greg Miller

Thanks everyone! And thanks Bruno!

On Tue, Mar 7, 2023 at 12:32 PM Houston Putman  wrote:

> Thanks Bruno, and good luck Greg!
>
> - Houston
>
> On Tue, Mar 7, 2023 at 3:29 PM Gus Heck  wrote:
>
>> Congratulations Greg and thanks Bruno!
>>
>> On Tue, Mar 7, 2023 at 3:13 PM Tomás Fernández Löbbe <
>> tomasflo...@gmail.com> wrote:
>>
>>> Thanks Bruno! and Congratulations Greg!
>>>
>>> On Tue, Mar 7, 2023 at 10:49 AM Patrick Zhai  wrote:
>>>
>>>> Thank you Bruno and Greg!
>>>>
>>>> On Tue, Mar 7, 2023, 10:40 Mikhail Khludnev  wrote:
>>>>
>>>>> Thank you, Bruno. Congratulations, Greg.
>>>>>
>>>>> On Mon, Mar 6, 2023 at 8:16 PM Bruno Roustant 
>>>>> wrote:
>>>>>
>>>>>> Hello Lucene developers,
>>>>>>
>>>>>> Lucene Program Management Committee has elected a new chair, Greg
>>>>>> Miller, and the Board has approved.
>>>>>>
>>>>>> Greg, thank you for stepping up, and congratulations!
>>>>>>
>>>>>>
>>>>>> - Bruno
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> https://t.me/MUST_SEARCH
>>>>> A caveat: Cyrillic!
>>>>>
>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-19) - Build # 40479 - Still Unstable!

2023-03-06 Thread Greg Miller

I’m sure this was my fault. I’ll look into it.

 Cheers,
-Greg

On Mon, Mar 6, 2023 at 13:32 Policeman Jenkins Server 
wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/40479/
> Java: 64bit/hotspot/jdk-19 -XX:-UseCompressedOops -XX:+UseSerialGC
>
> 1 tests failed.
> FAILED:
> org.apache.lucene.facet.TestLongValueFacetCounts.testRandomSingleValued
>
> Error Message:
> java.lang.IllegalArgumentException: topN must be > 0 (got: 0)
>
> Stack Trace:
> java.lang.IllegalArgumentException: topN must be > 0 (got: 0)
> at
> __randomizedtesting.SeedInfo.seed([E3377D571BFDC4CD:A61F45C328116EF2]:0)
> at org.apache.lucene.facet.Facets.validateTopN(Facets.java:82)
> at
> org.apache.lucene.facet.LongValueFacetCounts.getTopChildren(LongValueFacetCounts.java:378)
> at
> org.apache.lucene.facet.TestLongValueFacetCounts.testRandomSingleValued(TestLongValueFacetCounts.java:402)
> at
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
> at java.base/java.lang.reflect.Method.invoke(Method.java:578)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1
> /org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at randomizedtesting.runner@2.8.1
> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
> /org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at org.apache.lucene.test_fra

Re: Welcome Ben Trent as Lucene committer

2023-01-30 Thread Greg Miller

Congrats and welcome Ben!

On Mon, Jan 30, 2023 at 12:26 PM Alessandro Benedetti 
wrote:

> Welcome, Ben!
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Mon, 30 Jan 2023 at 10:14, Alan Woodward  wrote:
>
>> Congratulations Ben!
>>
>> > On 27 Jan 2023, at 15:18, Adrien Grand  wrote:
>> >
>> > I'm pleased to announce that Ben Trent has accepted the PMC's
>> > invitation to become a committer.
>> >
>> > Ben, the tradition is that new committers introduce themselves with a
>> > brief bio.
>> >
>> > Congratulations and welcome!
>> >
>> > --
>> > Adrien
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 39124 - Still Unstable!

2022-12-30 Thread Greg Miller

Ah, it does indeed. Thank you for the quick fix Patrick!

Cheers,
-Greg

On Fri, Dec 30, 2022 at 10:01 AM Patrick Zhai  wrote:

> Seems related to the commit just pushed? I put a quick fix:
> https://github.com/apache/lucene/pull/12049
>
> On Fri, Dec 30, 2022 at 9:35 AM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/39124/
>> Java: 64bit/jdk-17.0.3 -XX:+UseCompressedOops -XX:+UseG1GC
>>
>> 2 tests failed.
>> FAILED:
>> org.apache.lucene.facet.rangeonrange.TestRangeOnRangeFacetCounts.testRandomMultiDimDoubles
>>
>> Error Message:
>> java.lang.IllegalArgumentException: DoubleRange does not support greater
>> than 4 dimensions
>>
>> Stack Trace:
>> java.lang.IllegalArgumentException: DoubleRange does not support greater
>> than 4 dimensions
>> at
>> __randomizedtesting.SeedInfo.seed([978417F05E30CBDA:D9132730633ABE7C]:0)
>> at org.apache.lucene.core@10.0.0-SNAPSHOT
>> /org.apache.lucene.document.DoubleRange.checkArgs(DoubleRange.java:117)
>> at org.apache.lucene.core@10.0.0-SNAPSHOT
>> /org.apache.lucene.document.DoubleRange.encode(DoubleRange.java:123)
>> at org.apache.lucene.core@10.0.0-SNAPSHOT
>> /org.apache.lucene.document.DoubleRangeDocValuesField.(DoubleRangeDocValuesField.java:33)
>> at
>> org.apache.lucene.facet.rangeonrange.TestRangeOnRangeFacetCounts.testRandomMultiDimDoubles(TestRangeOnRangeFacetCounts.java:1340)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at junit@4.13.1
>> /org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at randomizedtesting.runner@2.8.1
>> /com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at org.apache.lucene.test_framework@10.0.0-SNAPSHOT
>> /org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>> at randomizedtesting.runner@2.8.1
>>

Re: Request for naming help

2022-12-30 Thread Greg Miller

OK, great! Thanks Marc. I plan on merging the PR today.

Cheers,
-Greg

On Thu, Dec 29, 2022 at 3:23 PM Marc D'Mello  wrote:

> Hi Greg,
>
> I'm also OK merging as is since this is a new feature and doesn't affect
> any of the current functionality. I also think there are no glaring issues
> with the API in its current state. However, I do think that merging the
> range and rangeonrange functionality makes sense and I like Adrien's
> suggestion of providing factory methods. I think if we merge in its current
> state we should create a new issue to refactor the range and
> rangeonrange faceting package into one and follow the RangeFieldQuery model
> more closely.
>
> On Thu, Dec 29, 2022 at 2:58 PM Greg Miller  wrote:
>
>> Hey Marc-
>>
>> I don't want to speak for Adrien as he might have something different in
>> mind, but I think that's more-or-less the idea. I'm not sure the factory
>> methods belong on the LongRange/DoubleRange classes, or if separate classes
>> should be created for this purpose (which is more how I thought of it)?
>>
>> To do this cleanly though, I'd really like us to try to consolidate all
>> the "range related" faceting functionality into one java package and
>> consolidate the API a bit. As part of this, I think we can be a little
>> smarter about not duplicating the "range" classes themselves.
>>
>> All this said, given that I think your "range on range" faceting PR is
>> ready to be merged as it currently exists, and has been through a number of
>> iteration already, I'm OK if we want to merge that work as it stands and
>> follow up with revisiting the API/naming/etc. as a future project. What do
>> you think?
>>
>> Cheers,
>> -Greg
>>
>> On Tue, Dec 13, 2022 at 7:23 PM Marc D'Mello  wrote:
>>
>>> Hi,
>>>
>>> I'm a bit unsure about what is being suggested. Is the idea to rename
>>> range#LongRange and rangeonrange#LongRange to LongFieldFacets and
>>> LongRangeFacets respectively and stick the static getters in there? In that
>>> case, I also think that the idea makes a lot of sense and that it would
>>> match our current range query API much better.
>>>
>>> In addition, looking at document#LongRange, there are queries like
>>> newContainsQuery() and newWithinQuery() that we can probably mimic to
>>> avoid exposing RangeFieldQuery.QueryType to the user.
>>>
>>> On Tue, Dec 13, 2022 at 5:04 PM Greg Miller  wrote:
>>>
>>>> Thanks for the suggestion Adrien. I like this idea! Marc- what do you
>>>> think?
>>>>
>>>> We might need to rework the package structure under the facets module
>>>> to make this clean, but that might not be a terrible thing anyway. The
>>>> existing sub-packages will make it challenging to get the visibility right.
>>>> I think it would be ideal to flatten the package so we can reduce
>>>> visibility of the class definitions and only expose the factory methods.
>>>>
>>>> Cheers,
>>>> -Greg
>>>>
>>>> On Tue, Dec 13, 2022 at 01:18 Adrien Grand  wrote:
>>>>
>>>>> I wonder if the facets actually require a different name, since they
>>>>> look to me like a generalization of range facets for range fields,
>>>>> while we previously only supported range facets on numeric fields. We
>>>>> could keep calling them range facets?
>>>>>
>>>>> Maybe we could use the same model we used for queries by not exposing
>>>>> query classes to users and providing factory methods, e.g. we could
>>>>> have something like:
>>>>>
>>>>> public class LongFieldFacets {
>>>>>
>>>>>   public static Facets getRangeFacetCounts(String field,
>>>>> FacetsCollector hits, LongRange... ranges) {
>>>>> return new LongRangeFacetCounts(...);
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>> public class LongRangeFacets {
>>>>>
>>>>>   // same function name
>>>>>   public static Facets getRangeFacetCounts(String field,
>>>>> FacetsCollector hits, RangeFieldQuery.QueryType queryType,
>>>>> LongRange... ranges) {
>>>>> return new LongRangeOnRangeFacetCounts(...);
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>> We'd still need to gi

Re: Request for naming help

2022-12-29 Thread Greg Miller

Hey Marc-

I don't want to speak for Adrien as he might have something different in
mind, but I think that's more-or-less the idea. I'm not sure the factory
methods belong on the LongRange/DoubleRange classes, or if separate classes
should be created for this purpose (which is more how I thought of it)?

To do this cleanly though, I'd really like us to try to consolidate all the
"range related" faceting functionality into one java package and
consolidate the API a bit. As part of this, I think we can be a little
smarter about not duplicating the "range" classes themselves.

All this said, given that I think your "range on range" faceting PR is
ready to be merged as it currently exists, and has been through a number of
iteration already, I'm OK if we want to merge that work as it stands and
follow up with revisiting the API/naming/etc. as a future project. What do
you think?

Cheers,
-Greg

On Tue, Dec 13, 2022 at 7:23 PM Marc D'Mello  wrote:

> Hi,
>
> I'm a bit unsure about what is being suggested. Is the idea to rename
> range#LongRange and rangeonrange#LongRange to LongFieldFacets and
> LongRangeFacets respectively and stick the static getters in there? In that
> case, I also think that the idea makes a lot of sense and that it would
> match our current range query API much better.
>
> In addition, looking at document#LongRange, there are queries like
> newContainsQuery() and newWithinQuery() that we can probably mimic to
> avoid exposing RangeFieldQuery.QueryType to the user.
>
> On Tue, Dec 13, 2022 at 5:04 PM Greg Miller  wrote:
>
>> Thanks for the suggestion Adrien. I like this idea! Marc- what do you
>> think?
>>
>> We might need to rework the package structure under the facets module to
>> make this clean, but that might not be a terrible thing anyway. The
>> existing sub-packages will make it challenging to get the visibility right.
>> I think it would be ideal to flatten the package so we can reduce
>> visibility of the class definitions and only expose the factory methods.
>>
>> Cheers,
>> -Greg
>>
>> On Tue, Dec 13, 2022 at 01:18 Adrien Grand  wrote:
>>
>>> I wonder if the facets actually require a different name, since they
>>> look to me like a generalization of range facets for range fields,
>>> while we previously only supported range facets on numeric fields. We
>>> could keep calling them range facets?
>>>
>>> Maybe we could use the same model we used for queries by not exposing
>>> query classes to users and providing factory methods, e.g. we could
>>> have something like:
>>>
>>> public class LongFieldFacets {
>>>
>>>   public static Facets getRangeFacetCounts(String field,
>>> FacetsCollector hits, LongRange... ranges) {
>>> return new LongRangeFacetCounts(...);
>>>   }
>>>
>>> }
>>>
>>> public class LongRangeFacets {
>>>
>>>   // same function name
>>>   public static Facets getRangeFacetCounts(String field,
>>> FacetsCollector hits, RangeFieldQuery.QueryType queryType,
>>> LongRange... ranges) {
>>> return new LongRangeOnRangeFacetCounts(...);
>>>   }
>>>
>>> }
>>>
>>> We'd still need to give a name for these classes, but the name would
>>> be less important since these class names would be only for ourselves.
>>> Users would never see them and refer to this new functionality as
>>> range facets on range fields?
>>>
>>> On Mon, Dec 12, 2022 at 10:11 PM Gus Heck  wrote:
>>> >
>>> > In that case, maybe "Range Logic Faceting" ?
>>> >
>>> > Relation seems too broad and too overloaded elsewhere, makes me think
>>> of RDBMS, related-ness, joins and such via word associations.
>>> >
>>> > On Mon, Dec 12, 2022 at 3:27 PM Greg Miller 
>>> wrote:
>>> >>
>>> >> Thank for the suggestion! I like the descriptiveness of it. My only
>>> hesitation is that is supports more than range intersection based on the
>>> provided QueryType instance (e.g., within, contains). I _imagine_ that
>>> intersection will be most common, but I don’t really know of course. I
>>> thought about generalizing your suggestion to something like “Range
>>> Relation Faceting,” but fear that would be confusing.
>>> >>
>>> >> Thanks again!
>>> >>
>>> >> Cheers,
>>> >> -Greg
>>> >>
>>> >> On Mon, Dec 12, 2022 at 10:19 Gus Heck  wrote:
>

Re: Request for naming help

2022-12-13 Thread Greg Miller

Thanks for the suggestion Adrien. I like this idea! Marc- what do you think?

We might need to rework the package structure under the facets module to
make this clean, but that might not be a terrible thing anyway. The
existing sub-packages will make it challenging to get the visibility right.
I think it would be ideal to flatten the package so we can reduce
visibility of the class definitions and only expose the factory methods.

Cheers,
-Greg

On Tue, Dec 13, 2022 at 01:18 Adrien Grand  wrote:

> I wonder if the facets actually require a different name, since they
> look to me like a generalization of range facets for range fields,
> while we previously only supported range facets on numeric fields. We
> could keep calling them range facets?
>
> Maybe we could use the same model we used for queries by not exposing
> query classes to users and providing factory methods, e.g. we could
> have something like:
>
> public class LongFieldFacets {
>
>   public static Facets getRangeFacetCounts(String field,
> FacetsCollector hits, LongRange... ranges) {
> return new LongRangeFacetCounts(...);
>   }
>
> }
>
> public class LongRangeFacets {
>
>   // same function name
>   public static Facets getRangeFacetCounts(String field,
> FacetsCollector hits, RangeFieldQuery.QueryType queryType,
> LongRange... ranges) {
> return new LongRangeOnRangeFacetCounts(...);
>   }
>
> }
>
> We'd still need to give a name for these classes, but the name would
> be less important since these class names would be only for ourselves.
> Users would never see them and refer to this new functionality as
> range facets on range fields?
>
> On Mon, Dec 12, 2022 at 10:11 PM Gus Heck  wrote:
> >
> > In that case, maybe "Range Logic Faceting" ?
> >
> > Relation seems too broad and too overloaded elsewhere, makes me think of
> RDBMS, related-ness, joins and such via word associations.
> >
> > On Mon, Dec 12, 2022 at 3:27 PM Greg Miller  wrote:
> >>
> >> Thank for the suggestion! I like the descriptiveness of it. My only
> hesitation is that is supports more than range intersection based on the
> provided QueryType instance (e.g., within, contains). I _imagine_ that
> intersection will be most common, but I don’t really know of course. I
> thought about generalizing your suggestion to something like “Range
> Relation Faceting,” but fear that would be confusing.
> >>
> >> Thanks again!
> >>
> >> Cheers,
> >> -Greg
> >>
> >> On Mon, Dec 12, 2022 at 10:19 Gus Heck  wrote:
> >>>
> >>> Maybe "Range Intersect Faceting"?
> >>>
> >>> On Mon, Dec 12, 2022 at 1:11 PM Greg Miller 
> wrote:
> >>>>
> >>>> Folks-
> >>>>
> >>>> Naming is hard! (But you all know that already).
> >>>>
> >>>> Marc D'Mello and I have been working on a new faceting implementation
> that's meant to complement Lucene's existing range-relation queries (e.g.,
> LongRange#newIntersectsQuery, DoubleRange#newContainsQuery,
> LongRangeDocValuesField#newSlowIntersectsQuery, etc.). Well, I should say
> Marc is working on the change and I'm just providing nit-picky feedback on
> his PR, which is here: https://github.com/apache/lucene/pull/11901. The
> general idea of this feature is to allow users to get facet counts for
> these sorts of range-relation filters before they're applied. For example,
> if a user is indexing ranges with their documents, they may have a set of
> query-ranges they want to facet on, based on some range relationship (e.g.,
> intersection, contains, etc.).
> >>>>
> >>>> As a concrete example, imagine that documents contain a price range
> (maybe a document represents some e-commerce product but the price varies
> based on some configuration options), and a user wants to build a price
> range filter that applies filtering based on whether-or-not the two ranges
> intersect (i.e., DoubleRange#newIntersectsQuery to apply a price range
> filter). This user wants faceting capabilities over the different price
> ranges they want to make available, so they need a way to facet over a list
> of provided query-ranges, based on the "intersect" relationship with the
> doc-encoded ranges. That's what Marc's "RangeOnRange" faceting is trying to
> accomplish.
> >>>>
> >>>> In my opinion, the PR is really close to being ready (thanks again
> Marc!), but I'm wondering if we can come up with a more descriptive name.
> As it currently stands, the feature is

Re: Request for naming help

2022-12-12 Thread Greg Miller

Thank for the suggestion! I like the descriptiveness of it. My only
hesitation is that is supports more than range intersection based on the
provided QueryType instance (e.g., within, contains). I _imagine_ that
intersection will be most common, but I don’t really know of course. I
thought about generalizing your suggestion to something like “Range
Relation Faceting,” but fear that would be confusing.

Thanks again!

Cheers,
-Greg

On Mon, Dec 12, 2022 at 10:19 Gus Heck  wrote:

> Maybe "Range Intersect Faceting"?
>
> On Mon, Dec 12, 2022 at 1:11 PM Greg Miller  wrote:
>
>> Folks-
>>
>> Naming is hard! (But you all know that already).
>>
>> Marc D'Mello and I have been working on a new faceting implementation
>> that's meant to complement Lucene's existing range-relation queries (e.g.,
>> LongRange#newIntersectsQuery, DoubleRange#newContainsQuery,
>> LongRangeDocValuesField#newSlowIntersectsQuery, etc.). Well, I should say
>> Marc is working on the change and I'm just providing nit-picky feedback on
>> his PR, which is here: https://github.com/apache/lucene/pull/11901. The
>> general idea of this feature is to allow users to get facet counts for
>> these sorts of range-relation filters before they're applied. For example,
>> if a user is indexing ranges with their documents, they may have a set of
>> query-ranges they want to facet on, based on some range relationship (e.g.,
>> intersection, contains, etc.).
>>
>> As a concrete example, imagine that documents contain a price range
>> (maybe a document represents some e-commerce product but the price varies
>> based on some configuration options), and a user wants to build a price
>> range filter that applies filtering based on whether-or-not the two ranges
>> intersect (i.e., DoubleRange#newIntersectsQuery to apply a price range
>> filter). This user wants faceting capabilities over the different
>> price ranges they want to make available, so they need a way to facet over
>> a list of provided query-ranges, based on the "intersect" relationship with
>> the doc-encoded ranges. That's what Marc's "RangeOnRange" faceting is
>> trying to accomplish.
>>
>> In my opinion, the PR is really close to being ready (thanks again
>> Marc!), but I'm wondering if we can come up with a more descriptive name.
>> As it currently stands, the feature is termed "RangeOnRange Faceting,"
>> which feels just a bit wonky to me. That said, I can't really come up with
>> anything better.
>>
>> ** Does anyone have suggestions on a better name? **
>>
>> Any / all suggestions appreciated! (And of course, any other input on the
>> PR is welcome if anyone is interested).
>>
>> Cheers,
>> -Greg
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Request for naming help

2022-12-12 Thread Greg Miller

Folks-

Naming is hard! (But you all know that already).

Marc D'Mello and I have been working on a new faceting implementation
that's meant to complement Lucene's existing range-relation queries (e.g.,
LongRange#newIntersectsQuery, DoubleRange#newContainsQuery,
LongRangeDocValuesField#newSlowIntersectsQuery, etc.). Well, I should say
Marc is working on the change and I'm just providing nit-picky feedback on
his PR, which is here: https://github.com/apache/lucene/pull/11901. The
general idea of this feature is to allow users to get facet counts for
these sorts of range-relation filters before they're applied. For example,
if a user is indexing ranges with their documents, they may have a set of
query-ranges they want to facet on, based on some range relationship (e.g.,
intersection, contains, etc.).

As a concrete example, imagine that documents contain a price range (maybe
a document represents some e-commerce product but the price varies based on
some configuration options), and a user wants to build a price range filter
that applies filtering based on whether-or-not the two ranges intersect
(i.e., DoubleRange#newIntersectsQuery to apply a price range filter). This
user wants faceting capabilities over the different price ranges they want
to make available, so they need a way to facet over a list of provided
query-ranges, based on the "intersect" relationship with the doc-encoded
ranges. That's what Marc's "RangeOnRange" faceting is trying to accomplish.

In my opinion, the PR is really close to being ready (thanks again Marc!),
but I'm wondering if we can come up with a more descriptive name. As it
currently stands, the feature is termed "RangeOnRange Faceting," which
feels just a bit wonky to me. That said, I can't really come up with
anything better.

** Does anyone have suggestions on a better name? **

Any / all suggestions appreciated! (And of course, any other input on the
PR is welcome if anyone is interested).

Cheers,
-Greg

Re: IntelliJ Project Generation?

2022-11-21 Thread Greg Miller

Thank you both. I missed that updated info in the CONTRIBUTING guide.

Cheers,
-Greg

On Mon, Nov 21, 2022 at 10:45 AM Dawid Weiss  wrote:

>
> https://github.com/apache/lucene/blob/main/CONTRIBUTING.md#ide-support
>>
>
> Precisely. I also highly recommend using intellij compiler instead of full
> gradle integration - it works much faster for me (but both modes work).
>
> Dawid
>

IntelliJ Project Generation?

2022-11-21 Thread Greg Miller

Hi folks-

Apologies if I missed a discussion somewhere (I tried searching the list
and issues, but came up short). Was support for generating IntelliJ project
files removed as a gradle task at some point? We used to support generation
of both Eclipse and IntelliJ project files, but I only see Eclipse support
under `./gradlew tasks` now. I need to re-setup my IntelliJ project and
just noticed this convenient functionality missing.

Again, apologies in advance if I'm overlooking something obvious here.

Cheers,
-Greg

Re: Dense union of doc IDs

2022-11-04 Thread Greg Miller

+1 to exploring this idea. A couple additional thoughts...

I can imagine real world use cases that would benefit from this beyond the
super pathological N-1 case. With dense terms, I can believe that the
disjunction would start to accumulate "blocks" of documents that all match
as the bitset gets populated. As that starts to happen, it could be very
beneficial to `advance` other term postings beyond the "block." For
example, in an ecommerce search engine, a disjunction of "product types"
would likely exhibit this behavior (where some product types are likely
pretty dense).

Also, it would be nice to try this same idea in TermInSetQuery. It's very
similar, but still has its own implementation.

Thanks for raising this idea Froh! I'd be excited to see what you come up
with, and may have a use-case in mind to benchmark with if you end up with
a patch to test.

Cheers,
-Greg

On Fri, Nov 4, 2022 at 6:46 AM Michael Sokolov  wrote:

> It sounds like a lot of complexity to handle an unusual edge case, but
> ... I guess this actually happened? Can you give any sense of the
> end-user behavior that caused it?
>
> On Fri, Nov 4, 2022 at 2:26 AM Patrick Zhai  wrote:
> >
> > Hi Froh,
> >
> > The idea sounds reasonable to me, altho I wonder whether using
> CONSTANT_SCORE_BOOLEAN_REWRITE would help with your case since that dense
> union case should be already handled by disjunction query I suppose?
> > But that boolean rewrite is subject to max clause limit so it may have
> some other problems depending on the use case I guess.
> >
> > Best
> > Patrick
> >
> >
> > On Thu, Nov 3, 2022 at 5:15 PM Michael Froh  wrote:
> >>
> >> Hi,
> >>
> >> I was recently poking around in the createWeight implementation for
> MultiTermQueryConstantScoreWrapper to get to the bottom of some slow
> queries, and I realized that the worst-case performance could be pretty
> bad, but (maybe) possible to optimize for.
> >>
> >> Imagine if we have a segment with N docs and our MultiTermQuery expands
> to hit M terms, where each of the M terms matches N-1 docs. (If we matched
> all N docs, then Greg Miller's recent optimization to replace the
> MultiTermQuery with a TermQuery would kick in.) In this case, my
> understanding is that we would iterate through all the terms and pass each
> one's postings to a DocIdSetBuilder, which will iterate through the
> postings to set bits. This whole thing would be O(MN), I think.
> >>
> >> I was thinking that it would be cool if the DocIdSetBuilder could
> detect long runs of set bits and advance() each DISI to skip over them
> (since they're guaranteed not to contribute anything new to the union). In
> the worst case that I described above, I think it would make the whole
> thing O(M log N) (assuming advance() takes log time).
> >>
> >> At the risk of overcomplicating things, maybe DocIdSetBuilder could use
> a third ("dense") BulkAdder implementation that kicks in once enough bits
> are set, to efficiently implement the "or" operation to skip over known
> (sufficiently long) runs of set bits?
> >>
> >> Would something like that be useful? Is the "dense union of doc IDs"
> case common enough to warrant it?
> >>
> >> Thanks,
> >> Froh
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Welcome Vigya Sharma as Lucene committer

2022-07-28 Thread Greg Miller

Welcome and congrats Vigya! Well earned!

Cheers,
-Greg

On Thu, Jul 28, 2022 at 12:45 PM Vigya Sharma  wrote:
>
> Thanks everyone for the warm welcome. It is an honor to be invited as a 
> Lucene committer, and I look forward to contributing more to the community.
>
> A little bit about me - I currently work for the Product Search team at 
> Amazon, and am based out of the San Francisco Bay Area in California, US.
> I am interested in a wide variety of computer science areas, and, in the last 
> few years, have focused more on distributed systems, concurrency, system 
> software and performance. Outside of tech., I like spending my time outdoors 
> - running, skiing, and long road trips. I completed my first marathon (the 
> SFMarathon) last week, and now, getting this invitation has made this month a 
> highlight of the year.
>
> I had known that Lucene powers some of the most popular search and analytics 
> use cases across the globe, but as I've gotten more involved, the depth and 
> breadth of this software has blown my mind. I am deeply impressed by what 
> this community has built, and how it continues to work together and grow. It 
> is a great honor to be trusted with committer privileges, and I look forward 
> to learning and contributing to multiple different parts of the library.
>
> Thank you,
> Vigya
>
>
> On Thu, Jul 28, 2022 at 12:20 PM Anshum Gupta  wrote:
>>
>> Congratulations and welcome, Vigya!
>>
>> On Thu, Jul 28, 2022 at 12:34 AM Adrien Grand  wrote:
>>>
>>> I'm pleased to announce that Vigya Sharma has accepted the PMC's
>>> invitation to become a committer.
>>>
>>> Vigya, the tradition is that new committers introduce themselves with a
>>> brief bio.
>>>
>>> Congratulations and welcome!
>>>
>>> --
>>> Adrien
>>
>>
>>
>> --
>> Anshum Gupta
>
>
>
> --
> - Vigya

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: New branch and feature freeze for Lucene 9.3.0

2022-07-22 Thread Greg Miller

OK, as frustrating as this is, it looks like there was still another
bug in this same test uncovered last night. I have another PR open to
patch this into 9.3 here: https://github.com/apache/lucene/pull/1044

I also reopened https://issues.apache.org/jira/browse/LUCENE-10659 as
a proposed 9.3 block to pick up this fix. If the test remains
problematic after this patch, I'll just revert it out of 9.3 until it
proves more stable. Apologies.

Cheers,
-Greg

On Thu, Jul 21, 2022 at 7:59 AM Greg Miller  wrote:
>
> Ack, thanks Ignacio for the quick approval. Just merged the fix onto
> the 9.3 branch.
>
> Cheers,
> -Greg
>
> On Thu, Jul 21, 2022 at 12:32 AM Ignacio Vera  wrote:
> >
> > Hi Greg,
> >
> > Yes please fix the test in branch 9.3, I have approved the PR.
> >
> > On Thu, Jul 21, 2022 at 12:08 AM Greg Miller  wrote:
> >>
> >> Thanks Ignacio! I just created
> >> https://issues.apache.org/jira/browse/LUCENE-10659 as a proposed
> >> blocker for 9.3. It's a small bug fix for a unit test I recently
> >> introduced on the 9x branch (one of the last things to get pulled into
> >> the 9.3 candidate). I think we ought to fix this test before cutting a
> >> 9.3 release. There's a PR associated with the issue already (the
> >> change is already patched into main/branch_9x).
> >>
> >> Cheers,
> >> -Greg
> >>
> >> On Wed, Jul 20, 2022 at 4:42 AM Ignacio Vera  wrote:
> >> >
> >> > Please find here the draft for the release highlights. I have probably 
> >> > missed things that should be included so please feel free to add them.
> >> >
> >> > https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=217391905
> >> >
> >> > On Wed, Jul 20, 2022 at 10:17 AM Ignacio Vera  wrote:
> >> >>
> >> >> NOTICE:
> >> >>
> >> >>
> >> >> Branch branch_9_3 has been cut and versions updated to 9.4 on stable 
> >> >> branch.
> >> >>
> >> >>
> >> >> Please observe the normal rules:
> >> >>
> >> >>
> >> >> * No new features may be committed to the branch.
> >> >>
> >> >> * Documentation patches, build patches and serious bug fixes may be
> >> >>
> >> >>   committed to the branch. However, you should submit all patches you
> >> >>
> >> >>   want to commit to Jira first to give others the chance to review
> >> >>
> >> >>   and possibly vote against the patch. Keep in mind that it is our
> >> >>
> >> >>   main intention to keep the branch as stable as possible.
> >> >>
> >> >> * All patches that are intended for the branch should first be committed
> >> >>
> >> >>   to the unstable branch, merged into the stable branch, and then into
> >> >>
> >> >>   the current release branch.
> >> >>
> >> >> * Normal unstable and stable branch development may continue as usual.
> >> >>
> >> >>   However, if you plan to commit a big change to the unstable branch
> >> >>
> >> >>   while the branch feature freeze is in effect, think twice: can't the
> >> >>
> >> >>   addition wait a couple more days? Merges of bug fixes into the branch
> >> >>
> >> >>   may become more difficult.
> >> >>
> >> >> * Only Jira issues with Fix version 9.3 and priority "Blocker" will 
> >> >> delay
> >> >>
> >> >>   a release candidate build.
> >> >>
> >> >>
> >> >>
> >> >> The only step missing is to add the Jenkins job on the release branch 
> >> >> which is something I don't really know how to do it, hopefully someone 
> >> >> can help here.
> >> >>
> >> >>
> >> >> I am planning to build the first RC next Monday if there are no issues.
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-EA] Lucene-main-Linux (64bit/jdk-19-ea+32) - Build # 35897 - Unstable!

2022-07-22 Thread Greg Miller

Sorry all- looks like there's still a bug in this test. I'm working on
a fix now and will get it patched. I've reopened LUCENE-10659 as a 9.3
blocker to make sure we get the fix in. I'll have a PR up soon.
Apologies.

Cheers,
-Greg

On Fri, Jul 22, 2022 at 1:34 PM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/35897/
> Java: 64bit/jdk-19-ea+32 -XX:-UseCompressedOops -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.search.TestDisiPriorityQueue.testRandom
>
> Error Message:
> java.lang.IllegalArgumentException: bound must be greater than origin
>
> Stack Trace:
> java.lang.IllegalArgumentException: bound must be greater than origin
> at 
> __randomizedtesting.SeedInfo.seed([3098DA49DB159ABA:42D4FF466A752CC9]:0)
> at 
> java.base/jdk.internal.util.random.RandomSupport.checkRange(RandomSupport.java:236)
> at 
> java.base/java.util.random.RandomGenerator.nextInt(RandomGenerator.java:678)
> at 
> org.apache.lucene.search.TestDisiPriorityQueue.testRandom(TestDisiPriorityQueue.java:47)
> at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
> at java.base/java.lang.reflect.Method.invoke(Method.java:578)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at 
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at 
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at 
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at 
> com.carrotsearch.randomizedtesting.Threa

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 35849 - Unstable!

2022-07-21 Thread Greg Miller

Ha, yes-- it's now patched on 9.3 (in addition to 9x / main). Thanks Mike!

Cheers,
-Greg

On Thu, Jul 21, 2022 at 3:31 AM Michael McCandless
 wrote:
>
> Oh, nevermind!  I see the PR/blocker issue, thanks Greg.
>
> EventuallyConsistentMikeException!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Jul 21, 2022 at 6:28 AM Michael McCandless 
>  wrote:
>>
>> Should this maybe also be backported to the 9.3.0 branch?  Did the original 
>> change land before that branch was cut?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Jul 20, 2022 at 5:46 PM Greg Miller  wrote:
>>>
>>> OK, I think these test failures should now be resolved (on both main
>>> and branch_9x). But I'll keep an eye on nightly builds/tests.
>>>
>>> Cheers,
>>> -g
>>>
>>> On Wed, Jul 20, 2022 at 9:17 AM Greg Miller  wrote:
>>> >
>>> > I'll dig into this soon. Looks like a new test I recently added hit an
>>> > issue. Apologies.
>>> >
>>> > Cheers,
>>> > -g
>>> >
>>> > On Wed, Jul 20, 2022 at 2:32 AM Policeman Jenkins Server
>>> >  wrote:
>>> > >
>>> > > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/35849/
>>> > > Java: 64bit/jdk-17.0.3 -XX:+UseCompressedOops -XX:+UseSerialGC
>>> > >
>>> > > 1 tests failed.
>>> > > FAILED:  org.apache.lucene.search.TestDisiPriorityQueue.testRandom
>>> > >
>>> > > Error Message:
>>> > > java.lang.IllegalArgumentException: bound must be greater than origin
>>> > >
>>> > > Stack Trace:
>>> > > java.lang.IllegalArgumentException: bound must be greater than origin
>>> > > at 
>>> > > __randomizedtesting.SeedInfo.seed([3E14AC43B544B726:4C58894C04240155]:0)
>>> > > at 
>>> > > java.base/jdk.internal.util.random.RandomSupport.checkRange(RandomSupport.java:232)
>>> > > at 
>>> > > java.base/java.util.random.RandomGenerator.nextInt(RandomGenerator.java:679)
>>> > > at 
>>> > > org.apache.lucene.search.TestDisiPriorityQueue.testRandom(TestDisiPriorityQueue.java:47)
>>> > > at 
>>> > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>> > > Method)
>>> > > at 
>>> > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>>> > > at 
>>> > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> > > at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>>> > > at 
>>> > > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
>>> > > at 
>>> > > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
>>> > > at 
>>> > > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
>>> > > at 
>>> > > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
>>> > > at 
>>> > > org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>>> > > at 
>>> > > org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>>> > > at 
>>> > > org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>>> > > at 
>>> > > org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>>> > > at 
>>> > > org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>>> > > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>> > > at 
>>> > > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>> > > at 
>>> > > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
>>> > > at 
>>> > &

Re: New branch and feature freeze for Lucene 9.3.0

2022-07-21 Thread Greg Miller

Ack, thanks Ignacio for the quick approval. Just merged the fix onto
the 9.3 branch.

Cheers,
-Greg

On Thu, Jul 21, 2022 at 12:32 AM Ignacio Vera  wrote:
>
> Hi Greg,
>
> Yes please fix the test in branch 9.3, I have approved the PR.
>
> On Thu, Jul 21, 2022 at 12:08 AM Greg Miller  wrote:
>>
>> Thanks Ignacio! I just created
>> https://issues.apache.org/jira/browse/LUCENE-10659 as a proposed
>> blocker for 9.3. It's a small bug fix for a unit test I recently
>> introduced on the 9x branch (one of the last things to get pulled into
>> the 9.3 candidate). I think we ought to fix this test before cutting a
>> 9.3 release. There's a PR associated with the issue already (the
>> change is already patched into main/branch_9x).
>>
>> Cheers,
>> -Greg
>>
>> On Wed, Jul 20, 2022 at 4:42 AM Ignacio Vera  wrote:
>> >
>> > Please find here the draft for the release highlights. I have probably 
>> > missed things that should be included so please feel free to add them.
>> >
>> > https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=217391905
>> >
>> > On Wed, Jul 20, 2022 at 10:17 AM Ignacio Vera  wrote:
>> >>
>> >> NOTICE:
>> >>
>> >>
>> >> Branch branch_9_3 has been cut and versions updated to 9.4 on stable 
>> >> branch.
>> >>
>> >>
>> >> Please observe the normal rules:
>> >>
>> >>
>> >> * No new features may be committed to the branch.
>> >>
>> >> * Documentation patches, build patches and serious bug fixes may be
>> >>
>> >>   committed to the branch. However, you should submit all patches you
>> >>
>> >>   want to commit to Jira first to give others the chance to review
>> >>
>> >>   and possibly vote against the patch. Keep in mind that it is our
>> >>
>> >>   main intention to keep the branch as stable as possible.
>> >>
>> >> * All patches that are intended for the branch should first be committed
>> >>
>> >>   to the unstable branch, merged into the stable branch, and then into
>> >>
>> >>   the current release branch.
>> >>
>> >> * Normal unstable and stable branch development may continue as usual.
>> >>
>> >>   However, if you plan to commit a big change to the unstable branch
>> >>
>> >>   while the branch feature freeze is in effect, think twice: can't the
>> >>
>> >>   addition wait a couple more days? Merges of bug fixes into the branch
>> >>
>> >>   may become more difficult.
>> >>
>> >> * Only Jira issues with Fix version 9.3 and priority "Blocker" will delay
>> >>
>> >>   a release candidate build.
>> >>
>> >>
>> >>
>> >> The only step missing is to add the Jenkins job on the release branch 
>> >> which is something I don't really know how to do it, hopefully someone 
>> >> can help here.
>> >>
>> >>
>> >> I am planning to build the first RC next Monday if there are no issues.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: New branch and feature freeze for Lucene 9.3.0

2022-07-20 Thread Greg Miller

Thanks Ignacio! I just created
https://issues.apache.org/jira/browse/LUCENE-10659 as a proposed
blocker for 9.3. It's a small bug fix for a unit test I recently
introduced on the 9x branch (one of the last things to get pulled into
the 9.3 candidate). I think we ought to fix this test before cutting a
9.3 release. There's a PR associated with the issue already (the
change is already patched into main/branch_9x).

Cheers,
-Greg

On Wed, Jul 20, 2022 at 4:42 AM Ignacio Vera  wrote:
>
> Please find here the draft for the release highlights. I have probably missed 
> things that should be included so please feel free to add them.
>
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=217391905
>
> On Wed, Jul 20, 2022 at 10:17 AM Ignacio Vera  wrote:
>>
>> NOTICE:
>>
>>
>> Branch branch_9_3 has been cut and versions updated to 9.4 on stable branch.
>>
>>
>> Please observe the normal rules:
>>
>>
>> * No new features may be committed to the branch.
>>
>> * Documentation patches, build patches and serious bug fixes may be
>>
>>   committed to the branch. However, you should submit all patches you
>>
>>   want to commit to Jira first to give others the chance to review
>>
>>   and possibly vote against the patch. Keep in mind that it is our
>>
>>   main intention to keep the branch as stable as possible.
>>
>> * All patches that are intended for the branch should first be committed
>>
>>   to the unstable branch, merged into the stable branch, and then into
>>
>>   the current release branch.
>>
>> * Normal unstable and stable branch development may continue as usual.
>>
>>   However, if you plan to commit a big change to the unstable branch
>>
>>   while the branch feature freeze is in effect, think twice: can't the
>>
>>   addition wait a couple more days? Merges of bug fixes into the branch
>>
>>   may become more difficult.
>>
>> * Only Jira issues with Fix version 9.3 and priority "Blocker" will delay
>>
>>   a release candidate build.
>>
>>
>>
>> The only step missing is to add the Jenkins job on the release branch which 
>> is something I don't really know how to do it, hopefully someone can help 
>> here.
>>
>>
>> I am planning to build the first RC next Monday if there are no issues.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 35849 - Unstable!

2022-07-20 Thread Greg Miller

OK, I think these test failures should now be resolved (on both main
and branch_9x). But I'll keep an eye on nightly builds/tests.

Cheers,
-g

On Wed, Jul 20, 2022 at 9:17 AM Greg Miller  wrote:
>
> I'll dig into this soon. Looks like a new test I recently added hit an
> issue. Apologies.
>
> Cheers,
> -g
>
> On Wed, Jul 20, 2022 at 2:32 AM Policeman Jenkins Server
>  wrote:
> >
> > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/35849/
> > Java: 64bit/jdk-17.0.3 -XX:+UseCompressedOops -XX:+UseSerialGC
> >
> > 1 tests failed.
> > FAILED:  org.apache.lucene.search.TestDisiPriorityQueue.testRandom
> >
> > Error Message:
> > java.lang.IllegalArgumentException: bound must be greater than origin
> >
> > Stack Trace:
> > java.lang.IllegalArgumentException: bound must be greater than origin
> > at 
> > __randomizedtesting.SeedInfo.seed([3E14AC43B544B726:4C58894C04240155]:0)
> > at 
> > java.base/jdk.internal.util.random.RandomSupport.checkRange(RandomSupport.java:232)
> > at 
> > java.base/java.util.random.RandomGenerator.nextInt(RandomGenerator.java:679)
> > at 
> > org.apache.lucene.search.TestDisiPriorityQueue.testRandom(TestDisiPriorityQueue.java:47)
> > at 
> > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> > Method)
> > at 
> > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> > at 
> > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> > at 
> > org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> > at 
> > org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > at 
> > org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> > at 
> > org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> > at 
> > org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > at 
> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at 
> > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> > at 
> > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> > at 
> > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> > at 
> > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> > at 
> > org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > at 
> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at 
> > org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> > at 
> > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > at 
> > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > at 
> > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at 
> > com.carrotsearch.randomizedtestin

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 35849 - Unstable!

2022-07-20 Thread Greg Miller

I'll dig into this soon. Looks like a new test I recently added hit an
issue. Apologies.

Cheers,
-g

On Wed, Jul 20, 2022 at 2:32 AM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/35849/
> Java: 64bit/jdk-17.0.3 -XX:+UseCompressedOops -XX:+UseSerialGC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.search.TestDisiPriorityQueue.testRandom
>
> Error Message:
> java.lang.IllegalArgumentException: bound must be greater than origin
>
> Stack Trace:
> java.lang.IllegalArgumentException: bound must be greater than origin
> at 
> __randomizedtesting.SeedInfo.seed([3E14AC43B544B726:4C58894C04240155]:0)
> at 
> java.base/jdk.internal.util.random.RandomSupport.checkRange(RandomSupport.java:232)
> at 
> java.base/java.util.random.RandomGenerator.nextInt(RandomGenerator.java:679)
> at 
> org.apache.lucene.search.TestDisiPriorityQueue.testRandom(TestDisiPriorityQueue.java:47)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at 
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at 
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at 
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(Th

Re: Store arrays in DocValues and keep the original order

2022-06-30 Thread Greg Miller

You're correct that these doc value fields are primarily meant for sorting,
as well as some other use-cases like faceting. And what you're discovered
is also correct, that these fields don't maintain the original ordering,
and SORTED_SET dedupes values (
https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/index/DocValuesType.html
).

There's no technical reason new doc value types couldn't be added that
maintain original ordering and don't dedupe, but whether-or-not there are
enough use-cases to support that need is a question that would need to be
considered. +1 to Shai's suggestion to build on BinaryDocValues. By
extending BinaryDocValuesField, you can encode the doc values however you
like. An example of this can be seen here:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/IntRangeDocValuesField.java

Hope this helps.

Cheers,
-Greg

On Tue, Jun 28, 2022 at 5:52 AM Shai Erera  wrote:

> Depending on what you use the field for, you can use BinaryDocValuesField
> which encodes a byte[] and lets you store the data however you want. But
> how are you using these fields later at search time?
>
> On Tue, Jun 28, 2022 at 3:46 PM linfeng lu  wrote:
>
>> Hi~
>>
>> We are trying to build an OLAP database based on lucene, and we heavily
>> use lucene's *DocValues* (as our column store).
>>
>> *We try to use DocValues to store the array type field. *For example, if
>> we want to store the *field1* and *feild2* in this json document into
>> *DocValues* respectively, SORTED_NUMERIC and SORTED_SET seem to be our
>> only option.
>>
>> *{*
>> *"field1": [ 3, 1, 1, 2 ], *
>> *"field2": [ "c", "a", "a", "b" ] *
>> *}*
>>
>>
>> When we store *field1* in SORTED_NUMERIC and *field2* in SORTED_SET, we
>> will get this result:
>>
>> *[image: Community Verified icon]*
>>
>> field1:
>>
>>- origin: [3, 1, 1, 2]
>>- in SORTED_NUMERIC: [1, 1, 2, 3]
>>
>> field2：
>>
>>- origin: [”c”, “a”, “a”, “b” ]
>>- in SORTED_SET: ords [0, 1, 2] terms [”a”, “b”, “c”]
>>
>>
>> The original ordering relationship of the elements in the array is lost.
>>
>> We're guessing that lucene's DocValues are designed primarily for sorting
>> and aggregation, so the original order of elements may not matter.
>>
>> But in our usage scene, it is important to keep the original order of
>> the elements in the array (we allow user to access the elements in the
>> array using the subscript operator).
>>
>> We wonder if lucene has plans to add new types of DocValues that can
>> store arrays and keep the original order of elements in the array?
>>
>> Thanks!
>>
>

Re: Welcome Greg Miller to the Lucene PMC

2022-06-09 Thread Greg Miller

Thanks everyone!

On Wed, Jun 8, 2022 at 1:46 AM Martin Gainty  wrote:
>
> Welcome Greg!
>
> ~martin
> 
> From: Gus Heck 
> Sent: Tuesday, June 7, 2022 9:29 PM
> To: dev 
> Subject: Re: Welcome Greg Miller to the Lucene PMC
>
> Welcome Greg :)
>
>
> On Tue, Jun 7, 2022 at 5:43 PM David Smiley  wrote:
>
> Welcome Greg!
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jun 7, 2022 at 2:44 AM Adrien Grand  wrote:
>
> I'm pleased to announce that Greg Miller has accepted an invitation to join 
> the Lucene PMC!
>
> Congratulations Greg, and welcome aboard!
>
> --
> Adrien
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Lu Xugang as Lucene committer

2022-06-01 Thread Greg Miller

Welcome Lu Xugang!

On Wed, Jun 1, 2022 at 2:04 PM Mayya Sharipova
 wrote:
>
> Congratulations and welcome, Lu Xugang!
>
> On Wed, Jun 1, 2022 at 7:53 AM Jan Høydahl  wrote:
>>
>> Welcome and congratulations!
>>
>> Jan
>>
>> > 1. jun. 2022 kl. 09:07 skrev Adrien Grand :
>> >
>> > I'm pleased to announce that Lu Xugang has accepted the PMC's
>> > invitation to become a committer.
>> >
>> > Xugang, the tradition is that new committers introduce themselves with a
>> > brief bio.
>> >
>> > Congratulations and welcome!
>> >
>> > --
>> > Adrien
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Chris Hegarty as Lucene committer

2022-06-01 Thread Greg Miller

Welcome Chris!

On Wed, Jun 1, 2022 at 2:04 PM Mayya Sharipova
 wrote:
>
> Welcome and congratulations, Chris!
>
> On Wed, Jun 1, 2022 at 7:53 AM Jan Høydahl  wrote:
>>
>> Welcome Chris!
>>
>> Jan
>>
>> > 1. jun. 2022 kl. 09:04 skrev Adrien Grand :
>> >
>> > I'm pleased to announce that Chris Hegarty has accepted the PMC's
>> > invitation to become a committer.
>> >
>> > Chris, the tradition is that new committers introduce themselves with a
>> > brief bio.
>> >
>> > Congratulations and welcome!
>> >
>> > --
>> > Adrien
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene » Lucene-Check-9.2 - Build # 65 - Unstable!

2022-05-29 Thread Greg Miller

OK, unfortunately I didn't see a quick fix (at least with the limited
time I had to dig in), so I opened an issue to track:
https://issues.apache.org/jira/browse/LUCENE-10595

Cheers,
-g

On Thu, May 26, 2022 at 9:32 PM Greg Miller  wrote:
>
> I was able to repro this locally and will try to see if I can put
> together a fix in the next couple days (or will at least create a Jira
> to track once I figure out what's going on).
>
> Cheers,
> -g
>
> On Mon, May 23, 2022 at 4:02 AM Apache Jenkins Server
>  wrote:
> >
> > Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.2/65/
> >
> > 1 tests failed.
> > FAILED:  
> > org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom
> >
> > Error Message:
> > java.lang.IndexOutOfBoundsException
> >
> > Stack Trace:
> > java.lang.IndexOutOfBoundsException
> > at 
> > __randomizedtesting.SeedInfo.seed([91EC8BE9DE2A5BAB:E3A0AEE66F4AEDD8]:0)
> > at java.base/java.nio.Buffer.checkBounds(Buffer.java:714)
> > at java.base/java.nio.HeapByteBuffer.get(HeapByteBuffer.java:179)
> > at 
> > org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.store.ByteBuffersDataInput.readBytes(ByteBuffersDataInput.java:155)
> > at 
> > org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.store.ByteBuffersIndexInput.readBytes(ByteBuffersIndexInput.java:85)
> > at 
> > org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.store.MockIndexInputWrapper.readBytes(MockIndexInputWrapper.java:149)
> > at 
> > org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.decompressBlock(Lucene90DocValuesProducer.java:1234)
> > at 
> > org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.next(Lucene90DocValuesProducer.java:1092)
> > at 
> > org.apache.lucene.search.grouping.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:438)
> > at 
> > org.apache.lucene.search.grouping.GroupFacetCollector.mergeSegmentResults(GroupFacetCollector.java:97)
> > at 
> > org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom(TestGroupFacetCollector.java:429)
> > at 
> > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> > Method)
> > at 
> > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at 
> > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > at 
> > randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> > at 
> > randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> > at 
> > randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> > at 
> > randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> > at 
> > org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> > at 
> > org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > at 
> > org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> > at 
> > org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> > at 
> > org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > at 
> > randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at 
> > randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> > at 
> > randomizedtesting.runner@2.7.6/com.car

Re: [JENKINS] Lucene » Lucene-Check-9.2 - Build # 65 - Unstable!

2022-05-26 Thread Greg Miller

I was able to repro this locally and will try to see if I can put
together a fix in the next couple days (or will at least create a Jira
to track once I figure out what's going on).

Cheers,
-g

On Mon, May 23, 2022 at 4:02 AM Apache Jenkins Server
 wrote:
>
> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.2/65/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom
>
> Error Message:
> java.lang.IndexOutOfBoundsException
>
> Stack Trace:
> java.lang.IndexOutOfBoundsException
> at 
> __randomizedtesting.SeedInfo.seed([91EC8BE9DE2A5BAB:E3A0AEE66F4AEDD8]:0)
> at java.base/java.nio.Buffer.checkBounds(Buffer.java:714)
> at java.base/java.nio.HeapByteBuffer.get(HeapByteBuffer.java:179)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.store.ByteBuffersDataInput.readBytes(ByteBuffersDataInput.java:155)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.store.ByteBuffersIndexInput.readBytes(ByteBuffersIndexInput.java:85)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.store.MockIndexInputWrapper.readBytes(MockIndexInputWrapper.java:149)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.decompressBlock(Lucene90DocValuesProducer.java:1234)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.next(Lucene90DocValuesProducer.java:1092)
> at 
> org.apache.lucene.search.grouping.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:438)
> at 
> org.apache.lucene.search.grouping.GroupFacetCollector.mergeSegmentResults(GroupFacetCollector.java:97)
> at 
> org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom(TestGroupFacetCollector.java:429)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util

Re: Adding a new PointDocValuesField

2022-05-26 Thread Greg Miller

> Users don't deal with low level docvalues codec APIs, so I see this
"as a user" as irrelevant, sorry. Higher-level classes (e.g. Field
class) could impl it this way as implementation detail.

Hmm, that's a different perspective than I had, but I understand where
you're coming from and I think I agree. I think I'm so used to
directly interacting with doc values that I haven't considered this
point-of-view (that users should really commonly be interacting with
DVs). As long as we provide a higher-level Field class that abstracts
the implementation details, I think I'm on the same page with you
here.

> +1 to build a field class in sandbox, using BDV behind the scenes. I
don't want to add any new DV types, trust me. I am just especially
opinionated against multidimensional stuff pushed down to docvalues
level, when it makes no sense from a DV perspective (column stride
fields). If you have 3 dimensions of numbers, at a low level it would
just  make 3 columns at the end of the day anyway: IMO it would only
make codec code more complicated with no benefit. So that's why I was
listing out other alternatives.

Got it. +1 from me as well. I think we're in agreement. Thanks for the
discussion!

Cheers,
-g

On Thu, May 26, 2022 at 9:04 AM Robert Muir  wrote:
>
> On Thu, May 26, 2022 at 11:49 AM Greg Miller  wrote:
> >
> > I agree that technically it's just as good. I also think it's less
> > clear for a user. The concept of "points" is something we've
> > established in Lucene, so I think it makes sense for users to think
> > about indexing points as a doc value as opposed to having to manage
> > multiple fields for all their dimensions in this sort of unsorted
> > field. But that's just my opinion as a user. But that's maybe a bit
> > philosophical at this point and I think we can "agree to disagree" for
> > now because...
>
> Users don't deal with low level docvalues codec APIs, so I see this
> "as a user" as irrelevant, sorry. Higher-level classes (e.g. Field
> class) could impl it this way as implementation detail.
>
> >
> > ... just to be clear, I'm _not_ suggesting we add a new doc value type
> > at this time. I'm not even necessarily advocating that we ever add it.
> > I think it's perfectly reasonable to define a new Field class that
> > builds on top of BDV (as Marc has done in his PR) that allows users to
> > add "point" fields to their documents that get indexed as doc values
> > (using BDV). This is very similar to LatLonDocValuesField,
> > LongRangeDocValuesField, etc. Is that an acceptable approach to you,
> > or are you advocating that we shouldn't do that and should instead
> > create these new "unsorted" numeric fields now? I'm even fine if we
> > put this in the sandbox module for now while we "kick the tires." In
> > fact, I think I'd advocate for that.
>
> +1 to build a field class in sandbox, using BDV behind the scenes. I
> don't want to add any new DV types, trust me. I am just especially
> opinionated against multidimensional stuff pushed down to docvalues
> level, when it makes no sense from a DV perspective (column stride
> fields). If you have 3 dimensions of numbers, at a low level it would
> just  make 3 columns at the end of the day anyway: IMO it would only
> make codec code more complicated with no benefit. So that's why I was
> listing out other alternatives.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Adding a new PointDocValuesField

2022-05-26 Thread Greg Miller

I agree that technically it's just as good. I also think it's less
clear for a user. The concept of "points" is something we've
established in Lucene, so I think it makes sense for users to think
about indexing points as a doc value as opposed to having to manage
multiple fields for all their dimensions in this sort of unsorted
field. But that's just my opinion as a user. But that's maybe a bit
philosophical at this point and I think we can "agree to disagree" for
now because...

... just to be clear, I'm _not_ suggesting we add a new doc value type
at this time. I'm not even necessarily advocating that we ever add it.
I think it's perfectly reasonable to define a new Field class that
builds on top of BDV (as Marc has done in his PR) that allows users to
add "point" fields to their documents that get indexed as doc values
(using BDV). This is very similar to LatLonDocValuesField,
LongRangeDocValuesField, etc. Is that an acceptable approach to you,
or are you advocating that we shouldn't do that and should instead
create these new "unsorted" numeric fields now? I'm even fine if we
put this in the sandbox module for now while we "kick the tires." In
fact, I think I'd advocate for that.

Thanks again for the feedback. It forced a deep examination of this
idea, which I appreciate.

Cheers,
-g

On Wed, May 25, 2022 at 11:41 AM Robert Muir  wrote:
>
> On Wed, May 25, 2022 at 2:08 PM Greg Miller  wrote:
> >
> >
> > I guess with an “unsorted” numeric DV type we could get there with aligned 
> > indices, as you describe, but that seems less appealing than supporting 
> > multi-dim points directly.
> >
>
> Name one technical reason why?
> Unsorted would be exactly just as good, except also more general
> purpose. The number of docvalues types should be kept to a strict
> minimum, and should be generally useful to a variety of common
> use-cases. Each type has a huge maintenance cost, and never goes away.
> Every codec must implement every type.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Adding a new PointDocValuesField

2022-05-25 Thread Greg Miller

I appreciate all the feedback, but disagree that we can accomplish what
we’re trying to do here with the existing fields.

It’s not sufficient to AND together multiple fields for this use-case
because of the fact that the different dimensions can be multi-valued and
not all combinations are valid. To go back to my example, imagine wiper
blades that fit 2010 Ford vehicles and 2011 Chevy vehicles but not 2010
Chevy or 2011 Ford. You have to index the combinations, not the separate
component values. I can’t see a way to retain this information with
separate fields. Am I missing something? I guess with an “unsorted” numeric
DV type we could get there with aligned indices, as you describe, but that
seems less appealing than supporting multi-dim points directly.

I’m in agreement though that there isn’t a compelling need to add a new
field type for this. I have no problem building on BDV and putting this in
the sandbox module to start. Makes sense to me. It sounds like we’d have
consensus to take that approach and re-evaluate if there are future needs?
Any objections?

Cheers,
-g


On Wed, May 25, 2022 at 10:05 Marc D'Mello  wrote:

> But adding a new type should be the last resort.
>
>
> I did not realize that was the case, that's good to know. It seems like I
> should just use BDV (which does make the code change easier/faster so I
> have no issues with it).
>
> As for Patrick's suggestion of using separate numeric fields instead of
> packing them together, that actually does sound like an interesting idea, I
> think the biggest issue with it though would be implementing a multivalued
> version of this. As Robert pointed out, we would need an UnsortedNumericDV.
>
> Thanks for all the feedback!
>
>
> On Wed, May 25, 2022 at 8:17 AM Robert Muir  wrote:
>
>> On Wed, May 25, 2022 at 12:17 AM Greg Miller  wrote:
>> >
>> >  A "two separate field approach" would
>> > consist of indexing year and make separately, and you'd lose the
>> > information that only certain combinations are valid. Am I overlooking
>> > something with your suggestion? Maybe there's something we can do with
>> > Lucene already that solves for this case and I'm just not aware of it?
>> > That's entirely possible and I'd love to learn more if there is!
>>
>> This makes no sense to me. If there are two dimensions, there's no
>> difference in faceting code calling fieldA.value and fieldB.value,
>> than calling field.valueA and field.valueB.
>>
>> In other words, doesn't make any sense to needlessly "pack dimensions
>> together" at docvalues level, especially for what should be a
>> column-stride field. There's really no difference from the app
>> perspective. Any issues you have here seem to be issues around facet
>> module and not docvalues...
>>
>> >
>> > As for MultiRangeQuery and the mention of sandbox modules, I think
>> > that's a bit of a different use-case. MultiRangeQuery lets you filter
>> > by a disjunction of ranges. The "multi" part doesn't relate to
>> > "multiple values in a doc" (but it does support that, as do the
>> > "standard" range queries).
>> >
>> > Where I see a gap right now, beyond just faceting, is that we can
>> > represent N-dim points in the points index and filter on them (using
>> > the points index), but we have no doc values equivalent. This means,
>> > 1) we can't facet, and 2) we can't create a "slow" query that does
>> > post-filtering instead of using the points index (which could be a
>> > very real advantage in cases with a sparse match set but a dense
>> > points index). So I like the idea of creating that concept and being
>> > able to facet and filter on it. Whether-or-not this is a "formal" doc
>> > values type or sits on top of BDV, I have less of a strong opinion.
>>
>> We shouldn't add new docvalues types because of "slow queries", I'm
>> really against that. The root problem is that points impl can't filter
>> well (like the inverted index can), and as a hack, docvalues "picks up
>> the slack". If its becoming a major issue, address this with points
>> directly?
>>
>> >
>> > And finally... it really should be multi-valued. The points index
>> > supports multiple points-per-field within a single document. Seems
>> > like a big gap that we wouldn't support that with a doc value field.
>> > Because BDV is inherently single-valued, I propose we come up with an
>> > encoding scheme that encodes multiple p

Re: Adding a new PointDocValuesField

2022-05-25 Thread Greg Miller

> then use LatLonDocValuesField

Right! Actually, LatLonDocValuesField is a good example of what we're
trying to do here, but specialized to the 2D, lat/long case. It stores
a doc value representation of a lat/long point that can be used for
"slow" queries—which complement the points-based queries—(e.g.,
LLDVF#newSlowBoxQuery, LLDVF#newSlowDistanceQuery, etc.). It could
also support faceting (although I don't think an implementation
exists?). And, it's multi-valued (which it achieves by packing a
lat/long tuple into a single long value and then encoding with
SORTED_NUMERIC. I think this is actually a great example we could
follow here, and supports the idea of _not_ adding a specific DV type,
but rather building on top of BDV.

In our use-case, we'd like to generalize to N-dims and not make
assumptions about lat/long data. Because of that, I don't see a way to
pack our dims into a single long value and build on SORTED_NUMERIC, so
I think we need to have a different encoding scheme on top of BDV.
Patrick, maybe this is what you were getting at in your last comment,
but please let me know if I'm mis-interpreting.

So, circling back to Marc's original question, I would suggest we
_not_ introduce a new doc values type (at least at this time), and
build in BDV.

Cheers,
-g

On Wed, May 25, 2022 at 5:23 AM Robert Muir  wrote:
>
> On Wed, May 25, 2022 at 8:04 AM Michael Sokolov  wrote:
> >
> > Also, there should be examples from other fields. Suppose you are
> > indexing map data and want to support a UI that shows "hot spots" on
> > the map where there is a lot of let's say ... activity of some sort.
> > You'd like to facet on 2-d areas.
>
> then use LatLonDocValuesField
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Adding a new PointDocValuesField

2022-05-24 Thread Greg Miller

Thanks for the comments Patrick, but I'm not sure I'm fully
understanding the suggestion here. I don't see a path forward that
uses different fields, but maybe I'm missing something. Imagine you're
running an ecommerce site selling automotive parts and you need to
index fitment information that consists of the year + make of vehicles
a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and
2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say
we want to facet on products that fit a 2011 Ford. We need to make
sure this product does _not_ count. We can achieve this with points in
two dimensions (year + make), but not as two separate fields (at least
as far as I can come up with). A "two separate field approach" would
consist of indexing year and make separately, and you'd lose the
information that only certain combinations are valid. Am I overlooking
something with your suggestion? Maybe there's something we can do with
Lucene already that solves for this case and I'm just not aware of it?
That's entirely possible and I'd love to learn more if there is!

As for MultiRangeQuery and the mention of sandbox modules, I think
that's a bit of a different use-case. MultiRangeQuery lets you filter
by a disjunction of ranges. The "multi" part doesn't relate to
"multiple values in a doc" (but it does support that, as do the
"standard" range queries).

Where I see a gap right now, beyond just faceting, is that we can
represent N-dim points in the points index and filter on them (using
the points index), but we have no doc values equivalent. This means,
1) we can't facet, and 2) we can't create a "slow" query that does
post-filtering instead of using the points index (which could be a
very real advantage in cases with a sparse match set but a dense
points index). So I like the idea of creating that concept and being
able to facet and filter on it. Whether-or-not this is a "formal" doc
values type or sits on top of BDV, I have less of a strong opinion.

And finally... it really should be multi-valued. The points index
supports multiple points-per-field within a single document. Seems
like a big gap that we wouldn't support that with a doc value field.
Because BDV is inherently single-valued, I propose we come up with an
encoding scheme that encodes multiple points on top of that "single"
BDV entry. This is where building on BDV started to feel a little icky
to me and it seemed like it might be a good use-case for actually
formalizing a format/encoding, but again, no strong preference. We
could certainly do something more quickly on top of BDV and formalize
an encoding later if/as necessary.

Thanks again for the discussion so far Marc, Partrick and Rob!

Cheers,
-Greg

On Tue, May 24, 2022 at 10:35 AM Patrick Zhai  wrote:
>
> As pointed out by Rob in the issue
>
>> I would also suggest to start with the simple 
>> separate-numeric-docvalues-fields case and use similar logic as the 
>> org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc
>
>
> I think that's a preferable solution to me, because:
> 1. It does not couple the dimensions together so that people can combine them 
> freely
> 2. It might be able to be compressed better
>
> Best
>
> On Tue, May 24, 2022 at 9:08 AM Marc D'Mello  wrote:
>>
>> Hi,
>>
>> Thanks for the responses! For Patrick's question, right now in faceting we 
>> don't have any good way to AND between two fields. I think the original 
>> hyper rectangle issue has a good example of a use case: 
>> https://issues.apache.org/jira/browse/LUCENE-10274.
>>
>> As for Robert's point, this feature would also allow us to use 
>> MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself in 
>> the sandbox module so I'm assuming that's a pretty exotic use case as well. 
>> I personally have no issues using BinaryDocValues for this, I was just 
>> wondering if it would be better to create a dedicated doc values, but it 
>> seems that is not that case.
>>
>> Thanks,
>> Marc
>>
>> On Tue, May 24, 2022 at 1:27 AM Robert Muir  wrote:
>>>
>>> This seems really exotic feature to add a dedicated docvalues field for.
>>>
>>> We should let BINARY be the catchall for stuff like this.
>>>
>>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello  wrote:
>>> >
>>> > Hi,
>>> >
>>> > Some background: I've been working on this PR to add hyper rectangle 
>>> > faceting capabilities to Lucene facets and I needed to create a new doc 
>>> > values field to support this feature. Initially, I had a field that just 
>>> > extended BinaryDocValues, but then a discussion came up about whether to 
>>> > add a completely new DocValues field, maybe something like 
>>> > PointDocValuesField (and SortedPointDocValuesField as the multivalued 
>>> > version) to add first class support for this new field. Here is the link 
>>> > to the discussion. I think there are a few benefits to this:
>>> >
>>> > Formalize how we would store points as doc values rather than just 
>>> > packing point

Re: Need for multiple entry points in HnswGraphSearcher?

2022-02-21 Thread Greg Miller

Ah, thanks Mayya. I'd overlooked this one line of code:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java#L160

That makes sense. Thanks again for the explanation!

Cheers,
-Greg

On Mon, Feb 21, 2022 at 1:19 AM Mayya Sharipova
 wrote:
>
> Hello Greg,
> This code is a close implementation of the original HNSW paper.
>
> Multiple entry points in HnswGraphSearcher are used during a graph 
> construction in HnswGraphBuilder for levels <= nodeLevel.
> In this case the number of entry points is equal to beamWidth.
> Closest neighbors found the previous layer are used as entry points for the 
> current layer.
>
> I hope this answers your question.
>
>
>
> On Sat, Feb 19, 2022 at 3:59 PM Greg Miller  wrote:
>>
>> Hi folks-
>>
>> I've been poking around some of the HNSW code out of curiosity and I
>> noticed that the HnswGraphSearcher#searchLevel methods accept an array
>> of entry point nodes (int[] eps), but as far as I can tell, only one
>> entry point is every provided (from both HnswGraphSearcher and
>> HnswGraphBuilder). Is there actually a use-case for executing a search
>> using multiple entry points? I ask purely out of curiosity and trying
>> to build up my own understanding in this space.
>>
>> I tried searching the dev list and Jira issues for anywhere this might
>> have been discussed and came up empty, but apologies if there's a
>> thread about this somewhere I missed.
>>
>> Cheers,
>> -Greg
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Need for multiple entry points in HnswGraphSearcher?

2022-02-19 Thread Greg Miller

Hi folks-

I've been poking around some of the HNSW code out of curiosity and I
noticed that the HnswGraphSearcher#searchLevel methods accept an array
of entry point nodes (int[] eps), but as far as I can tell, only one
entry point is every provided (from both HnswGraphSearcher and
HnswGraphBuilder). Is there actually a use-case for executing a search
using multiple entry points? I ask purely out of curiosity and trying
to build up my own understanding in this space.

I tried searching the dev list and Jira issues for anywhere this might
have been discussed and came up empty, but apologies if there's a
thread about this somewhere I missed.

Cheers,
-Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Hunspell regression tests still using JDK 11?

2022-02-05 Thread Greg Miller

Thanks for the pointer Dawid! I created a PR to bump the JDK version
to 17 but the github checks don't appear to have fired for the change,
so I'm not sure it's correct. If anyone has tips on testing this
change (e.g., getting the github checks to fire or otherwise), please
let me know. I'll keep messing with this on Monday.

https://github.com/apache/lucene/pull/651

Cheers,
-Greg

On Sat, Feb 5, 2022 at 3:15 AM Dawid Weiss  wrote:
>
> Hi Greg,
>
> I think it's a workflow definition, here:
> https://github.com/apache/lucene/blob/main/.github/workflows/hunspell.yml#L22
>
> Dawid
>
> On Sat, Feb 5, 2022 at 5:57 AM Greg Miller  wrote:
> >
> > Hey everyone-
> >
> > It looks like a recent PR I posted triggered the run of Hunspell
> > regression tests in github. I'm not sure how these get run, but they
> > appear to be failing because they're trying to run with JDK 11 instead
> > of 17. Any pointers to where I might go to fix this? I'm happy to try
> > to update the setup for these regression tests, but I'm not sure where
> > to start to be honest.
> >
> > https://github.com/apache/lucene/runs/5074324463?check_suite_focus=true
> >
> > Thanks in advance!
> >
> > Cheers,
> > -Greg
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Hunspell regression tests still using JDK 11?

2022-02-04 Thread Greg Miller

Hey everyone-

It looks like a recent PR I posted triggered the run of Hunspell
regression tests in github. I'm not sure how these get run, but they
appear to be failing because they're trying to run with JDK 11 instead
of 17. Any pointers to where I might go to fix this? I'm happy to try
to update the setup for these regression tests, but I'm not sure where
to start to be honest.

https://github.com/apache/lucene/runs/5074324463?check_suite_focus=true

Thanks in advance!

Cheers,
-Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-9.x-Linux (64bit/jdk-17.0.1) - Build # 1077 - Unstable!

2022-02-02 Thread Greg Miller

I'll push a fix for this shortly. I fixed this situation on main but
looks like I didn't backport the unit test fix to 9.x I'll take care
of it now.

Cheers,
-Greg

On Wed, Feb 2, 2022 at 12:06 PM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/1077/
> Java: 64bit/jdk-17.0.1 -XX:-UseCompressedOops -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:  
> org.apache.lucene.sandbox.search.TestCombinedFieldQuery.testScoringWithMultipleFieldTermsMatch
>
> Error Message:
> java.lang.IllegalArgumentException: numHits must be > 0; please use 
> TotalHitCountCollector if you just need the total hit count
>
> Stack Trace:
> java.lang.IllegalArgumentException: numHits must be > 0; please use 
> TotalHitCountCollector if you just need the total hit count
> at 
> __randomizedtesting.SeedInfo.seed([4F7F708E5A0C8F17:44BB93F6D8DC5513]:0)
> at 
> org.apache.lucene.core@9.1.0-SNAPSHOT/org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:237)
> at 
> org.apache.lucene.core@9.1.0-SNAPSHOT/org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:226)
> at 
> org.apache.lucene.sandbox.search.TestCombinedFieldQuery.testScoringWithMultipleFieldTermsMatch(TestCombinedFieldQuery.java:238)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.test_framework@9.1.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules

Re: Welcome Guo Feng as Lucene committer

2022-01-25 Thread Greg Miller

Congrats Feng! Well deserved!

Cheers,
-Greg

On Tue, Jan 25, 2022 at 6:40 AM Steve Rowe  wrote:
>
> Congrats and welcome Feng!
>
> --
> Steve
>
> > On Jan 25, 2022, at 4:09 AM, Adrien Grand  wrote:
> >
> > I'm pleased to announce that Guo Feng has accepted the PMC's
> > invitation to become a committer.
> >
> > Feng, the tradition is that new committers introduce themselves with a
> > brief bio.
> >
> > Congratulations and welcome!
> >
> > --
> > Adrien
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-20 Thread Greg Miller

Congrats and welcome!

On Mon, Dec 20, 2021 at 6:08 AM Jan Høydahl  wrote:
>
> Welcome Patrick!
>
> Jan
>
> 19. des. 2021 kl. 10:11 skrev Dawid Weiss :
>
> Hello everyone!
>
> Please welcome Haoyu Zhai as the latest Lucene committer. You may also
> know Haoyu as Patrick - this is perhaps his kind gesture to those of
> us whose tongues are less flexible in pronouncing difficult first
> names. :)
>
> It's a tradition to briefly introduce yourself to the group, Patrick.
> Welcome and thank you!
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Any potential benefits to a SSDV#bulkLookupOrd(long ord) impl?

2021-12-17 Thread Greg Miller

On Thu, Dec 16, 2021 at 4:56 PM Robert Muir  wrote:
>
> On Thu, Dec 16, 2021 at 5:57 PM Greg Miller  wrote:
> >
> > This is separate from adding hierarchical support. I'm probably not
> > communicating the current state well, but here's where SSDV faceting
> > does ordinal lookups:
> > https://github.com/apache/lucene/blob/c64e5fe84c4990968844193e3a62f4ebbba638ea/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L148
> >
> > So this is done for every returned value, which as you describe,
> > scales with the requested top-n. For getAllDims, this logic is
> > executed for every dimension.
> >
> > I don't think these lookups are avoidable since we provide the path
> > for each returned value, and in order to get the path, we need to
> > dereference the ordinal.
> >
>
> OK I get it. I think the strangeness (compared to e.g. solr faceting)
> is that we're mixing ordinals from different fields ("dims") all into
> one DV field? And then we have a trappy method to do top-N for all
> possible dims in this single packed field (what if there are
> thousands???).

Right. The "get all dims" functionality does have a bit of a "trappy"
feel to it for this reason. I think there are situations where "dim
mixing" can be beneficial; if you actually do need facet counts for
most (or all) of your dims, I can see the benefit of iterating the
FacetsCollector once, counting everything in the SSDV field in the
same pass, then getting the results you need. But this is very
suboptimal if you have a large number of dims and only need faceting
on a small number of them (burn a bunch of up-front cost counting dims
you don't care about). I tried to address this by providing
StringValueFacetCounts (LUCENE-9950), which essentially chucks the
concept of "dim" altogether and assumes the field itself is the dim
(sounds like what solr does, but I need to get more familiar with that
impl). When I introduced StringValueFacetCounts, I was hesitant to
suggest deprecating what SSDV faceting does since I think there are
valid applications for wanting to pack many dims into a single field.
For what it's worth, taxonomy-based faceting operates in the same way,
defaulting to packing all the dims into one doc value field.

Anyway, not really sure where I'm going with all this except to say +1
to getAllDims being potentially trappy. I can see users thinking,
"well, I need to grab counts for a few different dims so I'll just
call getAllDims then pull out what I want instead of calling
getTopChildren for each dim." Hopefully they're not doing this, but it
would be an easy trap to fall into.

I suppose the last thing I'd say is that there are valid use-cases for
wanting the "top" dims along with their "top" children, and getAllDims
provides a reasonable way to do this. For example, in Amazon's product
search, we have a large number of different dims but only want to show
a small sub-set  to customers on a search page. One way to go about
this would be to determine the "top" dims for the match set along with
the "top n" values under each; getAllDims is helpful for this but has
a bit of an unpleasant side-effect that it unnecessarily resolves the
paths for all children for all dims. As I think about this, I wonder
if a getTopDims method would be more useful that lets the user specify
the number of dims they want back along with the number of children
for each? I'll open a Jira for that.

>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Any potential benefits to a SSDV#bulkLookupOrd(long ord) impl?

2021-12-16 Thread Greg Miller

On Thu, Dec 16, 2021 at 2:29 PM Robert Muir  wrote:
>
> On Thu, Dec 16, 2021 at 5:05 PM Greg Miller  wrote:
> >
> > On Thu, Dec 16, 2021 at 1:31 PM Robert Muir  wrote:
> > >
> > > On Thu, Dec 16, 2021 at 3:53 PM Greg Miller  wrote:
> > > >
> > >
> > > > TaxonomyReader was recently updated
> > > > to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is
> > > > stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This
> > > > results in a separate TermsEnum#seekExact() call down in
> > > > Lucene90DocValuesProducer for each ordinal being returned.
> > > >
> > >
> > > I'm confused, where do we do gazillions of lookupOrd(), we should not
> > > be doing that. The ordinals should be used for all the heavy-duty
> > > work, and at the very end, only the top-10 or whatever resolved back
> > > to strings with lookupOrd. Think of it kinda like the stored fields :)
> >
> > This is right, but we still need to do the lookup for each value being
> > returned (which is bounded by the top-n param supplied by the user).
> > In getAllDims, we'll do "n" lookups for every dimension indexed. So
> > while we're working in "ordinal space" for doing all the counting and
> > such, there could still be a somewhat sizable number of ordinals that
> > need to be looked up after counting. This is where taxo-faceting leans
> > on bulk lookups.
>
> OK I need to understand this better, because I don't see why it is
> necessary to do it this way. It definitely is very different from the
> way solr wiki page documents hierarchical faceting. Maybe we should
> adopt their approach?

This is separate from adding hierarchical support. I'm probably not
communicating the current state well, but here's where SSDV faceting
does ordinal lookups:
https://github.com/apache/lucene/blob/c64e5fe84c4990968844193e3a62f4ebbba638ea/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L148

So this is done for every returned value, which as you describe,
scales with the requested top-n. For getAllDims, this logic is
executed for every dimension.

I don't think these lookups are avoidable since we provide the path
for each returned value, and in order to get the path, we need to
dereference the ordinal.

>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Any potential benefits to a SSDV#bulkLookupOrd(long ord) impl?

2021-12-16 Thread Greg Miller

On Thu, Dec 16, 2021 at 1:31 PM Robert Muir  wrote:
>
> On Thu, Dec 16, 2021 at 3:53 PM Greg Miller  wrote:
> >
>
> > TaxonomyReader was recently updated
> > to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is
> > stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This
> > results in a separate TermsEnum#seekExact() call down in
> > Lucene90DocValuesProducer for each ordinal being returned.
> >
>
> I'm confused, where do we do gazillions of lookupOrd(), we should not
> be doing that. The ordinals should be used for all the heavy-duty
> work, and at the very end, only the top-10 or whatever resolved back
> to strings with lookupOrd. Think of it kinda like the stored fields :)

This is right, but we still need to do the lookup for each value being
returned (which is bounded by the top-n param supplied by the user).
In getAllDims, we'll do "n" lookups for every dimension indexed. So
while we're working in "ordinal space" for doing all the counting and
such, there could still be a somewhat sizable number of ordinals that
need to be looked up after counting. This is where taxo-faceting leans
on bulk lookups.

We also call lookupOrd for _every_ ordinal in the given field when
building the state (see the ctor logic in
DefaultSortedSetDocValuesReaderState). I'm not as concerned about this
since state building only needs to happen when the index changes.

>
> > Having no knowledge about the actual data representation behind the
> > TermsDict in an SSDV field, I'm wondering if someone here can provide
> > a high-level sense of whether-or-not there might be an advantage to
> > looking up ordinals in bulk. I'm going to dig into the code anyway
> > (curious!), but thought I'd raise the idea/question here as well
> > regarding whether-or-not a bulk lookup might be advantageous in
> > general for SSDV fields. Any thoughts?
>
> I don't think we should provide such an API, because the operation is
> slow and should not be done in "bulk" anyway. Number of lookups should
> be low (e.g. 10, 50, whatever the user's top-N is). If you want to
> optimize it, sort them in ascending order and look that up first, but
> honestly in most cases, that probably isn't even worth it.

That's fair. I can see the argument for not wanting to encourage
unnecessary lookups with a "bulk" operation. Thanks for the feedback.
I'll think about this a little more when I have some time to dig into
the code, but what you're saying sounds reasonable.

>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Any potential benefits to a SSDV#bulkLookupOrd(long ord) impl?

2021-12-16 Thread Greg Miller

Hey folks-

I've been chatting with Marc D'Mello a bit about the SSDV faceting
he's working on (LUCENE-10250) (disclaimer: we both work on Amazon's
Product Search engine). We're trying to figure out where
taxonomy-based faceting has a performance advantage over SSDV, and it
occurred to me that the way the two approaches resolve the paths for
given ordinals is a bit different. TaxonomyReader was recently updated
to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is
stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This
results in a separate TermsEnum#seekExact() call down in
Lucene90DocValuesProducer for each ordinal being returned.

Having no knowledge about the actual data representation behind the
TermsDict in an SSDV field, I'm wondering if someone here can provide
a high-level sense of whether-or-not there might be an advantage to
looking up ordinals in bulk. I'm going to dig into the code anyway
(curious!), but thought I'd raise the idea/question here as well
regarding whether-or-not a bulk lookup might be advantageous in
general for SSDV fields. Any thoughts?

Cheers,
-Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Julie Tibshirani to the Lucene PMC

2021-11-30 Thread Greg Miller

Congrats Julie!

On Tue, Nov 30, 2021 at 2:20 PM Michael Sokolov  wrote:
>
> Welcome, Julie!
>
>  I think Adrien already added you to the PMC LDAP group, but I'll double-check
>
> On Tue, Nov 30, 2021, 2:11 PM Anshum Gupta  wrote:
>>
>> Congratulations and welcome, Julie!
>>
>> On Tue, Nov 30, 2021 at 1:49 PM Adrien Grand  wrote:
>>>
>>> I'm pleased to announce that Julie Tibshirani has accepted an invitation to 
>>> join the Lucene PMC!
>>>
>>> Congratulations Julie, and welcome aboard!
>>>
>>> --
>>> Adrien
>>
>>
>>
>> --
>> Anshum Gupta

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-30 Thread Greg Miller

OK, thanks Adrien. I went ahead and backported to 9.0.

Cheers,
-Greg

On Tue, Nov 30, 2021 at 1:33 PM Adrien Grand  wrote:
>
> I'm good with getting safe bug fixes in as we respin, +1 to backport this fix 
> to 9.0.
>
> On Tue, Nov 30, 2021 at 10:25 PM Greg Miller  wrote:
>>
>> If we're going to respin, I'd like to propose we pick up the bug fix
>> in https://issues.apache.org/jira/browse/LUCENE-10232. I certainly
>> wouldn't respin just to get this fix, but if we're going to anyway, it
>> would be nice to grab it.
>>
>> Here's a PR to do so if the group thinks it makes sense:
>> https://github.com/apache/lucene/pull/495
>>
>> Cheers,
>> -Greg
>>
>> On Mon, Nov 29, 2021 at 2:02 PM Adrien Grand  wrote:
>> >
>> > You could send a heads up to dev@ to make this more visible but I don't 
>> > think we need a vote.
>> >
>> > Thanks Uwe and Dawid for taking care of this.
>> >
>> > Le lun. 29 nov. 2021 à 22:25, Uwe Schindler  a écrit :
>> >>
>> >> Hi,
>> >>
>> >> Dawid and I changed the gradle build to change the module names to be 
>> >> according to above. With the new gradle task the automatically assigned 
>> >> module names from the gradle projects are now:
>> >>
>> >> > Task :showModuleNames
>> >> lucene-benchmark-10.0.0-SNAPSHOT.jar   -> 
>> >> org.apache.lucene.benchmark
>> >> lucene-backward-codecs-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.backward_codecs
>> >> lucene-classification-10.0.0-SNAPSHOT.jar  -> 
>> >> org.apache.lucene.classification
>> >> lucene-codecs-10.0.0-SNAPSHOT.jar  -> 
>> >> org.apache.lucene.codecs
>> >> lucene-core-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.core
>> >> lucene-demo-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.demo
>> >> lucene-expressions-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.expressions
>> >> lucene-facet-10.0.0-SNAPSHOT.jar   -> 
>> >> org.apache.lucene.facet
>> >> lucene-grouping-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.grouping
>> >> lucene-highlighter-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.highlighter
>> >> lucene-join-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.join
>> >> lucene-luke-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.luke
>> >> lucene-memory-10.0.0-SNAPSHOT.jar  -> 
>> >> org.apache.lucene.memory
>> >> lucene-misc-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.misc
>> >> lucene-monitor-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.monitor
>> >> lucene-queries-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.queries
>> >> lucene-queryparser-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.queryparser
>> >> lucene-replicator-10.0.0-SNAPSHOT.jar  -> 
>> >> org.apache.lucene.replicator
>> >> lucene-sandbox-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.sandbox
>> >> lucene-spatial-extras-10.0.0-SNAPSHOT.jar  -> 
>> >> org.apache.lucene.spatial_extras
>> >> lucene-spatial3d-10.0.0-SNAPSHOT.jar   -> 
>> >> org.apache.lucene.spatial3d
>> >> lucene-suggest-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.suggest
>> >> lucene-test-framework-10.0.0-SNAPSHOT.jar  -> 
>> >> org.apache.lucene.test_framework
>> >> lucene-analysis-common-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.analysis.common
>> >> lucene-analysis-icu-10.0.0-SNAPSHOT.jar-> 
>> >> org.apache.lucene.analysis.icu
>> >> lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar   -> 
>> >> org.apache.lucene.analysis.kuromoji
>> >> lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar -> 
>> >> org.apache.lucene.analysis.morfologik
>> >> lucene-analysis-nori-10.0.0-SNAPSHOT.jar   -> 
>> >> org.apache.lucene.analysis.nori
>> >> lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar

Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-30 Thread Greg Miller

If we're going to respin, I'd like to propose we pick up the bug fix
in https://issues.apache.org/jira/browse/LUCENE-10232. I certainly
wouldn't respin just to get this fix, but if we're going to anyway, it
would be nice to grab it.

Here's a PR to do so if the group thinks it makes sense:
https://github.com/apache/lucene/pull/495

Cheers,
-Greg

On Mon, Nov 29, 2021 at 2:02 PM Adrien Grand  wrote:
>
> You could send a heads up to dev@ to make this more visible but I don't think 
> we need a vote.
>
> Thanks Uwe and Dawid for taking care of this.
>
> Le lun. 29 nov. 2021 à 22:25, Uwe Schindler  a écrit :
>>
>> Hi,
>>
>> Dawid and I changed the gradle build to change the module names to be 
>> according to above. With the new gradle task the automatically assigned 
>> module names from the gradle projects are now:
>>
>> > Task :showModuleNames
>> lucene-benchmark-10.0.0-SNAPSHOT.jar   -> 
>> org.apache.lucene.benchmark
>> lucene-backward-codecs-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.backward_codecs
>> lucene-classification-10.0.0-SNAPSHOT.jar  -> 
>> org.apache.lucene.classification
>> lucene-codecs-10.0.0-SNAPSHOT.jar  -> 
>> org.apache.lucene.codecs
>> lucene-core-10.0.0-SNAPSHOT.jar-> org.apache.lucene.core
>> lucene-demo-10.0.0-SNAPSHOT.jar-> org.apache.lucene.demo
>> lucene-expressions-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.expressions
>> lucene-facet-10.0.0-SNAPSHOT.jar   -> org.apache.lucene.facet
>> lucene-grouping-10.0.0-SNAPSHOT.jar-> 
>> org.apache.lucene.grouping
>> lucene-highlighter-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.highlighter
>> lucene-join-10.0.0-SNAPSHOT.jar-> org.apache.lucene.join
>> lucene-luke-10.0.0-SNAPSHOT.jar-> org.apache.lucene.luke
>> lucene-memory-10.0.0-SNAPSHOT.jar  -> 
>> org.apache.lucene.memory
>> lucene-misc-10.0.0-SNAPSHOT.jar-> org.apache.lucene.misc
>> lucene-monitor-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.monitor
>> lucene-queries-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.queries
>> lucene-queryparser-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.queryparser
>> lucene-replicator-10.0.0-SNAPSHOT.jar  -> 
>> org.apache.lucene.replicator
>> lucene-sandbox-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.sandbox
>> lucene-spatial-extras-10.0.0-SNAPSHOT.jar  -> 
>> org.apache.lucene.spatial_extras
>> lucene-spatial3d-10.0.0-SNAPSHOT.jar   -> 
>> org.apache.lucene.spatial3d
>> lucene-suggest-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.suggest
>> lucene-test-framework-10.0.0-SNAPSHOT.jar  -> 
>> org.apache.lucene.test_framework
>> lucene-analysis-common-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.analysis.common
>> lucene-analysis-icu-10.0.0-SNAPSHOT.jar-> 
>> org.apache.lucene.analysis.icu
>> lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar   -> 
>> org.apache.lucene.analysis.kuromoji
>> lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar -> 
>> org.apache.lucene.analysis.morfologik
>> lucene-analysis-nori-10.0.0-SNAPSHOT.jar   -> 
>> org.apache.lucene.analysis.nori
>> lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar-> 
>> org.apache.lucene.analysis.opennlp
>> lucene-analysis-phonetic-10.0.0-SNAPSHOT.jar   -> 
>> org.apache.lucene.analysis.phonetic
>> lucene-analysis-smartcn-10.0.0-SNAPSHOT.jar-> 
>> org.apache.lucene.analysis.smartcn
>> lucene-analysis-stempel-10.0.0-SNAPSHOT.jar-> 
>> org.apache.lucene.analysis.stempel
>>
>> The module names on the right can now be used in Java source code to refer 
>> in Java 11 to the module. Those are now "automatic module names" (because 
>> the lucene behind is not completely modularized). In later Lucene 9.x 
>> versions we will add full module support and only expose APIs for external 
>> consumption and hide all internal lucene packages.
>>
>> The 9.0 relese should make sure that the module names are at least 
>> "defined", so we can use them later in module-info.java,
>>
>> Should I send a vote thread about this to the mailing list separately?
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> > -Original Message-
>> > From: Dawid Weiss 
>> > Sent: Monday, November 29, 2021 7:36 PM
>> > To: Lucene Dev 
>> > Subject: Re: [VOTE] Release Lucene 9.0.0 RC3
>> >
>> > Here is the change adding the 'org.apache.*' prefix, Uwe:
>> > https://github.com/apache/lucene/pull/487
>> >
>> > I verified that Luke starts in the rebuilt distribution and that
>> > module names show org.apache.* prefixes. Dashes are not allowed in
>> > modules so Lucene artifacts using them (spatial-extras,
>> > test-framework, backward-codecs) use an underscore in place of the
>>

Re: Anyone familiar (or use) MultiRangeQuery?

2021-11-27 Thread Greg Miller

Thanks everyone!

> I suspect that the fact that it doesn't work with multi-dimensional
points is a bug that hasn't been found yet because it's been mostly
discussed in the context of 1D fields?

This seems plausible. It also made me think that writing a "duel" test
that compares randomized scenarios against a disjunction of "standard"
PointRangeQueries would be a good idea, so I went ahead and added that
to my PR (https://github.com/apache/lucene/pull/437).

It seems to me that this is in fact a bug, so I'd suggest we move
forward with fixing it. But if anyone disagrees, let's discuss :)

Cheers,
-Greg

On Thu, Nov 25, 2021 at 9:47 AM Adrien Grand  wrote:
>
> I think Greg is right and this query is supposed to be a
> specialization for a disjunction of multiple range queries. It helps
> because you need to visit the index of the BKD tree and build a bit
> set once for the entire disjunction instead of once per range.
>
> I suspect that the fact that it doesn't work with multi-dimensional
> points is a bug that hasn't been found yet because it's been mostly
> discussed in the context of 1D fields?
>
> On Mon, Nov 22, 2021 at 5:13 PM Michael Sokolov  wrote:
> >
> > I did a little git spelunking and found this PR
> > https://github.com/apache/lucene-solr/pull/794 where it was
> > introduced. It does sound to me as if the intent was to match on
> > multiple multi-dimensional ranges (ie hypercubes), not on any
> > dimension among multiple ranges? Why would anyone ever want to do
> > that? On the other hand a lot of people looked at it ... so maybe
> > we're missing something here?
> >
> > On Sun, Nov 21, 2021 at 11:14 AM Greg Miller  wrote:
> > >
> > > Hi folks-
> > >
> > > Is anyone familiar with MultiRangeQuery (found in
> > > o.a.l.sandbox.search)? I was playing around with it recently since it
> > > might be a good fit for a use-case I'm working on for Amazon's Product
> > > Search engine, but it looks like it has a pretty fundamental bug in
> > > how it works. That or I'm completely mis-understanding what the query
> > > is meant to do.
> > >
> > > My understanding is that this query should consider documents to be a
> > > match if they contain a point that is found in _any_ of the ranges
> > > represented by this query (i.e., it's a disjunction over a set of
> > > query ranges). But... it appears that the query incorrectly considers
> > > a document to be a match if its point matches on any single dimension
> > > of any range (where it should be requiring all dimensions in a
> > > particular range to match).
> > >
> > > I added a unit test to demonstrate this bug along with a proposed fix
> > > over here: https://github.com/apache/lucene/pull/437
> > >
> > > If anyone is familiar with this query (or better yet, uses it), I'd be
> > > really interested in your input.
> > >
> > > Cheers,
> > > -Greg
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Anyone familiar (or use) MultiRangeQuery?

2021-11-21 Thread Greg Miller

Hi folks-

Is anyone familiar with MultiRangeQuery (found in
o.a.l.sandbox.search)? I was playing around with it recently since it
might be a good fit for a use-case I'm working on for Amazon's Product
Search engine, but it looks like it has a pretty fundamental bug in
how it works. That or I'm completely mis-understanding what the query
is meant to do.

My understanding is that this query should consider documents to be a
match if they contain a point that is found in _any_ of the ranges
represented by this query (i.e., it's a disjunction over a set of
query ranges). But... it appears that the query incorrectly considers
a document to be a match if its point matches on any single dimension
of any range (where it should be requiring all dimensions in a
particular range to match).

I added a unit test to demonstrate this bug along with a proposed fix
over here: https://github.com/apache/lucene/pull/437

If anyone is familiar with this query (or better yet, uses it), I'd be
really interested in your input.

Cheers,
-Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene » Lucene-Check-main - Build # 3872 - Unstable!

2021-11-21 Thread Greg Miller

Took a glance at this failure and it looks like the test is creating a
TopScoreDocCollector with a "numHits" of 0 (with this random seed).
Way back in LUCENE-2785, TSDC started requiring > 0 (pushing users to
use TotalHitCountCollector instead of passing 0).

Looks like this run just got really unlucky with the random value it
picked up for numHits, but I'll go ahead and tweak the test to make
sure numHits is at least 1.

Cheers,
-Greg

On Sat, Nov 20, 2021 at 4:44 PM Apache Jenkins Server
 wrote:
>
> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/3872/
>
> 1 tests failed.
> FAILED:  
> org.apache.lucene.sandbox.search.TestCombinedFieldQuery.testScoringWithMultipleFieldTermsMatch
>
> Error Message:
> java.lang.IllegalArgumentException: numHits must be > 0; please use 
> TotalHitCountCollector if you just need the total hit count
>
> Stack Trace:
> java.lang.IllegalArgumentException: numHits must be > 0; please use 
> TotalHitCountCollector if you just need the total hit count
> at 
> __randomizedtesting.SeedInfo.seed([25A4846FCF0B6C50:2E6067174DDBB654]:0)
> at 
> org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:237)
> at 
> org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:226)
> at 
> org.apache.lucene.sandbox.search.TestCombinedFieldQuery.testScoringWithMultipleFieldTermsMatch(TestCombinedFieldQuery.java:238)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evalu

Re: [JENKINS] Lucene » Lucene-Check-9.0 - Build # 148 - Unstable!

2021-11-19 Thread Greg Miller

This was my fault. I've just merged a fix. Apologies!

Cheers,
-Greg

On Fri, Nov 19, 2021 at 11:34 AM Apache Jenkins Server
 wrote:
>
> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.0/148/
>
> 1 tests failed.
> FAILED:  
> org.apache.lucene.facet.taxonomy.directory.TestBackwardsCompatibility.testCreateNewTaxonomy
>
> Error Message:
> java.lang.IllegalArgumentException: docs out of order: previous docId=1 
> current docId=0
>
> Stack Trace:
> java.lang.IllegalArgumentException: docs out of order: previous docId=1 
> current docId=0
> at 
> __randomizedtesting.SeedInfo.seed([1408C42B3DC09BAC:9C5DDC81CE36660B]:0)
> at 
> org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels$FacetLabelReader.nextFacetLabel(TaxonomyFacetLabels.java:146)
> at 
> org.apache.lucene.facet.taxonomy.directory.TestBackwardsCompatibility.createNewTaxonomyIndex(TestBackwardsCompatibility.java:228)
> at 
> org.apache.lucene.facet.taxonomy.directory.TestBackwardsCompatibility.testCreateNewTaxonomy(TestBackwardsCompatibility.java:78)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.

Re: Lucene 9.0 release candidate

2021-11-19 Thread Greg Miller

Heads up that both LUCENE-10122 and LUCENE-10062 have been merged onto
branch_9_0 now. @Adrien Grand I know you're aware already, but
following up here just for completeness. Thanks!

-Greg

On Mon, Nov 15, 2021 at 11:17 AM Robert Muir  wrote:
>
> On Mon, Nov 15, 2021 at 2:02 PM Michael McCandless
>  wrote:
> >
> >
> > Yeah I love that idea, but that's not what Patrick's PR explored (yet?).
> >
> > His explored switching away from custom token positions to NumericDocValues 
> > to store the same data (ordinal -> parent mapping), but it still loaded all 
> > of those into massive heap-resident int[].
> >
> > I agree it would be awesome to try avoiding those big int[] and reading 
> > live from NumericDocValues during faceting!  It would require some re-work 
> > of the facetting code to e.g. sort the ordinals to (efficiently) visiting 
> > them in forward iterator-friendly order.
> >
> > But that is a different change and probably we should not hold 9.0 for it?
> >
>
> Agreed: I was confused about the scope of the change.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene 9.0 release candidate

2021-11-15 Thread Greg Miller

+1, yeah let's see if we can get these in. It would be nice to not
carry the back-compat logic into 10. I'll prioritize these
today/tomorrow; I think we should be able to turn them around pretty
quickly. I'll update on this thread when PRs have been merged.

Cheers,
-Greg

On Mon, Nov 15, 2021 at 6:20 AM Adrien Grand  wrote:
>
> Thanks Dawid.
>
> @Greg Miller What do you think about getting these two PRs in for 9.0?
>
> On Sun, Nov 14, 2021 at 9:25 AM Dawid Weiss  wrote:
>>
>> I think we should wait for these two changes.
>>
>> I also think we should add automatic bundle names to all JARs starting
>> with 9.0. Even if they're not proper modules yet, it'd clarify what
>> the module names would be for all of the 9x line. I think a short name
>> of lucene.* is sufficient (we don't need to prefix with
>> org.apache.lucene) so that we have modules like lucene.core,
>> lucene.analysis.kuromoji, etc.  I can add this - already have a local
>> patch that does it and enables Luke to become a first-class module,
>> for example.
>>
>> Dawid
>>
>> On Sat, Nov 13, 2021 at 8:49 PM Adrien Grand  wrote:
>> >
>> > Hello,
>> >
>> > I plan to build a RC for Lucene 9.0 in the next few days.
>> >
>> > We don't have blockers left, but there are two faceting changes that look 
>> > like we could save some backward compatibility logic in 10.x by folding 
>> > them into 9.0:
>> >  - LUCENE-10062: https://github.com/apache/lucene/pull/264
>> >  - LUCENE-10122: https://github.com/apache/lucene/pull/420
>> >
>> > I'm interested in thoughts regarding whether I should wait for these 
>> > changes.
>> >
>> > --
>> > Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
> --
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Multi-valued xxValue / xxValueSource implementations?

2021-10-27 Thread Greg Miller

Thanks Robert for all your thoughts and context!

> I feel that things like facets apis should really try to move to lower-level 
> apis (DoubleValuesSource, SortedSetDocValues, etc)

Yeah I think this direction generally makes sense. All the cases I can
think of where a user might want to provide custom values (e.g.,
filtering, transforming, etc.) could be solved by allowing users to
pass their own xxDocValues instance into faceting implementations. For
example, if a user wanted to provide some filtering or transformation
on long values before counting them with LongValueFacetCounts, they
could do so by creating their own SortedNumericDocValues /
NumericDocValues implementations and passing them in if the faceting
implementations supported this.

The only possible gap I see here is that implementing xxDocValues
requires the ability to provide iteration over the documents
themselves, whereas xxValuesSource doesn't. So if there was some case
where a user wanted to provide multi-valued data but couldn't provide
document iteration, that might be an issue. It's a bit of a funny
limitation since faceting doesn't need the value source to lead
iteration, so I could see a multi-valued version of something like
LongValuesSource maybe being a better fit.

Cheers,
-Greg

On Tue, Oct 26, 2021 at 8:03 PM Robert Muir  wrote:
>
> On Tue, Oct 26, 2021 at 8:01 PM Robert Muir  wrote:
> >
> > Hi Greg, I think the general issue is one of the API, the ValueSource
> > seems really geared at returning values from single-valued fields.
>
> I think really, this is the core issue. This ValueSource thing was
> created before the days of docvalues, in a lot of cases will do
> inefficient things depending on how you hold it.
>
> I feel that things like facets apis should really try to move to
> lower-level apis (DoubleValuesSource, SortedSetDocValues, etc)
>
> Reverse the problem around from push to a pull, now if you want to
> give "computed field" or similar inputs to faceting (e.g. some kind of
> filtering-on-the-fly), you have the chance to implement it
> efficiently.
> The expressions module switched away from this ValueSource to a
> DoubleValues/DoubleValuesSource already, though I didn't follow
> specific reasons why.
> Maybe similar approaches apply to all the numerics.
>
> As far as the strings, personally, I'm not sure what a ValueSource API
> that "filters/transforms" terms should look like. Seems slow no matter
> how you do it. But maybe fresh ideas are needed.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Multi-valued xxValue / xxValueSource implementations?

2021-10-26 Thread Greg Miller

Hi folks-

Out of curiosity, is there a reason Lucene doesn't have
implementations for concepts like DoubleValues / DoubleValuesSource
that support multiple values per document? Or maybe something like
this does exist in Lucen that I'm not aware of? I can't believe this
hasn't been a topic of discussion at least once, but I couldn't turn
up a past Jira issue.

I ask because most of the faceting implementations in Lucene allow the
user to provide their own xxValuesSource to use instead of assuming
the data is in an indexed field, but there's an inherent limitation
here forcing documents to have a single value. The faceting
implementations have all been updated to operate correctly for
multi-valued documents when referencing an indexed field, but there's
a bit of a gap here if the user wants to supply their own source.

Many thanks!

Cheers,
-Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Should Queries be able to throw CollectionTerminationException?

2021-10-05 Thread Greg Miller

+1 I like this as well. Thanks for the suggestion! If IndexSearcher is
aware of timeouts, I think it can "do the right thing" with respect to
caching as well as properly establish the right "Relation" in
"TotalHits" if it times out (maybe there's some callback or something
allowing IndexSearcher to report a timeout to collectors / collector
managers?).

I went ahead and created an issue to explore this further:
https://issues.apache.org/jira/browse/LUCENE-10151

Thanks again!
-Greg

On Tue, Oct 5, 2021 at 8:20 AM Michael McCandless
 wrote:
>
> On Tue, Oct 5, 2021 at 11:13 AM Adrien Grand  wrote:
>
>> Maybe one clean way to make it happen would be to make timeouts an 
>> IndexSearcher feature. Whenever a timeout is set, IndexSearcher could split 
>> the doc ID space into ranges of X docs and check the timeout between every 
>> range. This way, the CollectionTerminatedException wouldn't be raised by a 
>> query, IndexSearcher would be in full control of terminating the query 
>> prematurely based on the configured timeout.
>
>
> +1, I like this idea!  It'd make timeouts more of a first class feature, and 
> then the overhead should be very small if we check only after each block of 
> partitioned docid space.
>
> Mike McCandless
>
> http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Should Queries be able to throw CollectionTerminationException?

2021-10-05 Thread Greg Miller

Hi folks-

I've run into a bit of an interesting situation when attempting to
enforce a query evaluation time budget in our Lucene application
(Amazon Product Search), and I'm curious if it's something others have
run into or have thoughts on. There's a reasonable chance this
use-case is fairly specific to our application, but if others have
seen similar use-cases, then maybe there's a general solution worth
pursuing here in Lucene itself?

We'd like to enforce a strict time budget for query evaluation, even
at the cost of potentially missing some matches that have yet to be
seen. One tempting solution is to enforce this in our leaf collectors
each time they see a hit by throwing a CollectionTerminationException,
which is handled nicely by IndexSearcher (we use concurrent search so
this more-or-less enforces an overall time budget). Our queries follow
a two-phase matching approach, and we've run into some interesting
edge-cases where the "approximation" phase may produce a very large
set of match candidates but the "confirmation" phase only confirms
matches on a very small fraction of them. In extreme cases, the entire
index could match in the "approximation" phase and none of the hits
could be "confirmed" in the second phase check.

This creates an interesting issue where the query may evaluate for a
long time before the leaf collectors see hits (or they may never see
hits). This boils down to the BulkScorer running a loop over all
"approximate" candidates and then attempting to "confirm" each before
the leaf collector "sees" anything (it could also happen in a case of
many first phase matches with many of those hits having been deleted).
In these cases, we can run significantly over our time budget.

One solution I've come up with is to create a top-level Query
implementation that enforces the time budget each time it produces
"approximate" matches. This more-or-less works for our use-case, but
has some "rough edges" as a general solution. What I've observed is
that Lucene really only supports collectors / leaf collectors throwing
CollectionTerminationException and doesn't necessarily support Query
implementations doing this. One of the most glaring issues is that the
LRU query caching (if enabled) doesn't handle the exception, so if a
Query were to throw when pre-populating the cache bitsets, it would
terminate the entire search (in a pretty ungraceful way).

I'm also aware of ExitableDirectoryReader but it's trickier to manage
for our use case since we read from the index outside of the main
query evaluation phase for other purposes. I'm sure there's a solution
where we maintain multiple Readers, etc.

So... I'm interested if anyone else has run into a similar use-case.
Does anyone have thoughts on alternative solutions? Is there any
appetite to augment Lucene to allow for queries to signal early
termination by throwing CollectionTerminationExceptions? I suspect
ExitableDirectoryReader probably provides a good enough solution for
others in this situation, but I wanted to raise the topic and see what
other folks here think.

Cheers,
-Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Soften Jira's note when opening new issues?

2021-09-23 Thread Greg Miller

Hi Adrien- that's totally fair. There are probably better places for
the additional content I'm proposing. A couple things along these
lines:

1. Do you think it would be worth linking this guide from the JIRA
message (maybe after updating it)?
https://cwiki.apache.org/confluence/display/lucene/HowToContribute. It
could be a nice hook for new users to learn more (and it's what we
link from our README). Maybe it would make the message too long
though?
2. I just put up a very brief PR to add my proposed "friendly message"
to the README before linking off to the above-mentioned guide:
https://github.com/apache/lucene/pull/318.

Back to your original proposal though, I'll add my +1 as I think it's
a big improvement from the current messaging. Thanks for bringing this
up!

Cheers,
-Greg

On Wed, Sep 22, 2021 at 9:23 AM Walter Underwood  wrote:
>
> Hmm. How is this? It is a single longer sentence, but essentially a string of 
> simple ones.
>
> If you want help or have a feature idea, please ask on the mailing list or 
> IRC channel before submitting a Jira issue.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Sep 22, 2021, at 9:18 AM, Adrien Grand  wrote:
>
> Greg, I understand and agree with the intent, but I also would like to keep 
> this as short as possible since the screen to create a new issue in JIRA is 
> already quite intimidating with all its text boxes, and the current version 
> is already taking two lines even though it's short. Maybe this is the sort of 
> thing that we could try to better emphasize in our project's README?
>
> On Wed, Sep 22, 2021 at 6:07 PM Walter Underwood  
> wrote:
>>
>> Two excellent points. So it could be:
>>
>> Are you looking for support for Lucene? Have you seen unexpected behavior? 
>> Have an idea for a new feature or improvement? Please ask for help on the 
>> Lucene user mailing list or the IRC channel. If it is a new problem or idea, 
>> then you can submit a Jira issue.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Sep 22, 2021, at 6:38 AM, Greg Miller  wrote:
>>
>> Love this idea!
>>
>> I wonder if there's a way to make the messaging clear that ideas for
>> new features/improvements are also always welcome? When I read the
>> current language, I interpret it as bug reporting. Maybe adding a
>> leading sentence would help?
>>
>> ```
>> Bug reports, improvements and new feature ideas are always welcome!
>> Please note, this project has a user mailing list and an IRC channel
>> for support. If you are looking for support, or if you are not sure
>> whether the behavior that you are observing is expected or not, please
>> discuss it there first.
>> ```
>>
>> Cheers,
>> -Greg
>>
>> On Wed, Sep 22, 2021 at 5:35 AM Adrien Grand  wrote:
>>
>>
>> Hi Walter,
>>
>> Though it doesn't invalidate your comment, I was considering changing the 
>> message only for the Lucene JIRA, at least for now.
>>
>> On Tue, Sep 21, 2021 at 5:08 PM Walter Underwood  
>> wrote:
>>
>>
>> Here is one with shorter, less complex sentences and clear calls to action.
>>
>> Are you looking for support for Solr? Have you seen unexpected behavior? 
>> Please ask for help on the Solr user mailing list or the IRC channel. If it 
>> is a new problem, then you can submit a Jira issue.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Sep 21, 2021, at 12:23 AM, Adrien Grand  wrote:
>>
>> I think you made a good point, Alexandre. Would something like this read 
>> better:
>>
>> ```
>> This project has a user mailing list and an IRC channel for support. If you 
>> are looking for support, or if you are not sure whether the behavior that 
>> you are observing is expected or not, please discuss it there first.
>> ```
>>
>> On Mon, Sep 20, 2021 at 2:22 PM Alexandre Rafalovitch  
>> wrote:
>>
>>
>> +1.
>> Ideally, the final version could still be several shorter sentences. To 
>> avoid needing to be a programmer to parse the deeply nested, if totally 
>> logical, structure.
>>
>> On Mon., Sep. 20, 2021, 4:33 a.m. Adrien Grand,  wrote:
>>
>>
>> Hello,
>>
>> Jira gives the following note when opening an issue:
>>
>> ```
>> This project has a user mailing list and an IRC channel for support. Please 
>&g

Re: Soften Jira's note when opening new issues?

2021-09-22 Thread Greg Miller

Love this idea!

I wonder if there's a way to make the messaging clear that ideas for
new features/improvements are also always welcome? When I read the
current language, I interpret it as bug reporting. Maybe adding a
leading sentence would help?

```
Bug reports, improvements and new feature ideas are always welcome!
Please note, this project has a user mailing list and an IRC channel
for support. If you are looking for support, or if you are not sure
whether the behavior that you are observing is expected or not, please
discuss it there first.
```

Cheers,
-Greg

On Wed, Sep 22, 2021 at 5:35 AM Adrien Grand  wrote:
>
> Hi Walter,
>
> Though it doesn't invalidate your comment, I was considering changing the 
> message only for the Lucene JIRA, at least for now.
>
> On Tue, Sep 21, 2021 at 5:08 PM Walter Underwood  
> wrote:
>>
>> Here is one with shorter, less complex sentences and clear calls to action.
>>
>> Are you looking for support for Solr? Have you seen unexpected behavior? 
>> Please ask for help on the Solr user mailing list or the IRC channel. If it 
>> is a new problem, then you can submit a Jira issue.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Sep 21, 2021, at 12:23 AM, Adrien Grand  wrote:
>>
>> I think you made a good point, Alexandre. Would something like this read 
>> better:
>>
>> ```
>> This project has a user mailing list and an IRC channel for support. If you 
>> are looking for support, or if you are not sure whether the behavior that 
>> you are observing is expected or not, please discuss it there first.
>> ```
>>
>> On Mon, Sep 20, 2021 at 2:22 PM Alexandre Rafalovitch  
>> wrote:
>>>
>>> +1.
>>> Ideally, the final version could still be several shorter sentences. To 
>>> avoid needing to be a programmer to parse the deeply nested, if totally 
>>> logical, structure.
>>>
>>> On Mon., Sep. 20, 2021, 4:33 a.m. Adrien Grand,  wrote:

 Hello,

 Jira gives the following note when opening an issue:

 ```
 This project has a user mailing list and an IRC channel for support. 
 Please ensure that you have discussed your problem using one of those 
 resources BEFORE creating this ticket.
 ```

 This can be quite intimidating for someone who has never worked with us 
 before, and we don't apply this logic for ourselves, for instance I feel 
 free to open JIRAs without discussing them first on IRC or dev@l.a.o. 
 Given that we are not seeing much irrelevant traffic on JIRA, I'd like to 
 soften the message to something like below:

 ```
 If you are looking for support, or if you are not sure whether the 
 behavior that you are observing is expected or not, please discuss your 
 problem on the user mailing-list instead before creating a ticket.
 ```

 What do you think?

 --
 Adrien
>>
>>
>>
>> --
>> Adrien
>>
>>
>
>
> --
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-EA] Lucene-main-Linux (64bit/jdk-18-ea+8) - Build # 31306 - Still Unstable!

2021-08-26 Thread Greg Miller

I believe this should now be fixed both on main and branch_8x. I'll
keep an eye on the builds. Apologies again.

Cheers,
-Greg

On Thu, Aug 26, 2021 at 1:08 PM Greg Miller  wrote:
>
> Many apologies but this was me. Looks like I broke some of the
> existing tests with a recent push. I'll have a fix in shortly.
>
> Cheers,
> -Greg
>
> -- Forwarded message -
> From: Policeman Jenkins Server 
> Date: Thu, Aug 26, 2021 at 12:37 PM
> Subject: [JENKINS-EA] Lucene-main-Linux (64bit/jdk-18-ea+8) - Build #
> 31306 - Still Unstable!
> To: 
>
>
> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/31306/
> Java: 64bit/jdk-18-ea+8 -XX:-UseCompressedOops -XX:+UseZGC
>
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
>
> Error Message:
> java.lang.AssertionError: expected:<1.0308181> but was:<0.6931471>
>
> Stack Trace:
> java.lang.AssertionError: expected:<1.0308181> but was:<0.6931471>
> at 
> __randomizedtesting.SeedInfo.seed([14BFA3D5D943A2BD:66F386DA682314CE]:0)
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:577)
> at org.junit.Assert.assertEquals(Assert.java:701)
> at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1634)
> at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1304)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
&

Fwd: [JENKINS-EA] Lucene-main-Linux (64bit/jdk-18-ea+8) - Build # 31306 - Still Unstable!

2021-08-26 Thread Greg Miller

Many apologies but this was me. Looks like I broke some of the
existing tests with a recent push. I'll have a fix in shortly.

Cheers,
-Greg

-- Forwarded message -
From: Policeman Jenkins Server 
Date: Thu, Aug 26, 2021 at 12:37 PM
Subject: [JENKINS-EA] Lucene-main-Linux (64bit/jdk-18-ea+8) - Build #
31306 - Still Unstable!
To: 


Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/31306/
Java: 64bit/jdk-18-ea+8 -XX:-UseCompressedOops -XX:+UseZGC

1 tests failed.
FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom

Error Message:
java.lang.AssertionError: expected:<1.0308181> but was:<0.6931471>

Stack Trace:
java.lang.AssertionError: expected:<1.0308181> but was:<0.6931471>
at 
__randomizedtesting.SeedInfo.seed([14BFA3D5D943A2BD:66F386DA682314CE]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:577)
at org.junit.Assert.assertEquals(Assert.java:701)
at 
org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1634)
at 
org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1304)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtes

Re: [JENKINS-MAVEN] Lucene » Lucene-Solr-Maven-8.x #379: POMs out of sync

2021-08-23 Thread Greg Miller

No worries. Thanks for the confirmation!

Cheers,
-Greg

On Mon, Aug 23, 2021 at 12:21 PM Houston Putman  wrote:
>
> Yes it would. I forgot to remove the non-needed dependency from everywhere it 
> seems. I'll get it working soon. Sorry for the annoyance.
>
> On Mon, Aug 23, 2021 at 3:02 PM Greg Miller  wrote:
>>
>> Would this also account for the ant precommit check failing? I'm not
>> able to repro this locally, but am seeing this failure in a PR I just
>> published:
>>
>> check-lib-versions:
>> 43218 [echo] Lib versions check under:
>> /home/runner/work/lucene-solr/lucene-solr/lucene/..
>> 43219[libversions] :: loading settings :: file =
>> /home/runner/work/lucene-solr/lucene-solr/lucene/top-level-ivy-settings.xml
>> 43220[libversions] ORPHAN coordinate key
>> '/com.amazonaws/aws-java-sdk-bom' in ivy-versions.properties is not
>> found in any ivy.xml file.
>> 43221[libversions] Found 0 indirect dependency version conflicts.
>> 43222[libversions] Checked that ivy-versions.properties and
>> ivy-ignore-conflicts.properties have lexically sorted '/org/name' keys
>> and no duplicates or orphans.
>> 43223[libversions] Scanned 51 ivy.xml files for rev="${/org/name}" format.
>> 43224[libversions] Completed in 2.83s., 1 error(s).
>> 43225
>> 43226BUILD FAILED
>> 43227/home/runner/work/lucene-solr/lucene-solr/build.xml:121: The
>> following error occurred while executing this line:
>> 43228/home/runner/work/lucene-solr/lucene-solr/lucene/build.xml:108:
>> The following error occurred while executing this line:
>> 43229/home/runner/work/lucene-solr/lucene-solr/lucene/tools/custom-tasks.xml:108:
>> Lib versions check failed. Check the logs.
>>
>> Cheers,
>> -Greg
>>
>>
>> On Mon, Aug 23, 2021 at 9:37 AM Houston Putman  
>> wrote:
>> >
>> > Thanks for calling this out Mike.
>> >
>> > It's related to SOLR-15089. I'll get it sorted out, as well as some test 
>> > flakiness that were introduced by that ticket.
>> >
>> > - Houston
>> >
>> > On Mon, Aug 23, 2021 at 6:56 AM Michael McCandless 
>> >  wrote:
>> >>
>> >> This is the actual error -- ring any bells?:
>> >>
>> >> /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-Solr-Maven-8.x/lucene/common-build.xml:726:
>> >>  Could not resolve artifacts: Could not find artifact 
>> >> com.amazonaws:aws-java-sdk-bom:jar:1.12.42 in central 
>> >> (https://repo1.maven.org/maven2/)
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Fri, Aug 20, 2021 at 8:45 PM Apache Jenkins Server 
>> >>  wrote:
>> >>>
>> >>> Lucene » Lucene-Solr-Maven-8.x - Build # 379 - Still Failing:
>> >>>
>> >>> Check console output at 
>> >>> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Maven-8.x/379/ 
>> >>> to view the results.
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: builds-h...@lucene.apache.org
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-MAVEN] Lucene » Lucene-Solr-Maven-8.x #379: POMs out of sync

2021-08-23 Thread Greg Miller

Would this also account for the ant precommit check failing? I'm not
able to repro this locally, but am seeing this failure in a PR I just
published:

check-lib-versions:
43218 [echo] Lib versions check under:
/home/runner/work/lucene-solr/lucene-solr/lucene/..
43219[libversions] :: loading settings :: file =
/home/runner/work/lucene-solr/lucene-solr/lucene/top-level-ivy-settings.xml
43220[libversions] ORPHAN coordinate key
'/com.amazonaws/aws-java-sdk-bom' in ivy-versions.properties is not
found in any ivy.xml file.
43221[libversions] Found 0 indirect dependency version conflicts.
43222[libversions] Checked that ivy-versions.properties and
ivy-ignore-conflicts.properties have lexically sorted '/org/name' keys
and no duplicates or orphans.
43223[libversions] Scanned 51 ivy.xml files for rev="${/org/name}" format.
43224[libversions] Completed in 2.83s., 1 error(s).
43225
43226BUILD FAILED
43227/home/runner/work/lucene-solr/lucene-solr/build.xml:121: The
following error occurred while executing this line:
43228/home/runner/work/lucene-solr/lucene-solr/lucene/build.xml:108:
The following error occurred while executing this line:
43229/home/runner/work/lucene-solr/lucene-solr/lucene/tools/custom-tasks.xml:108:
Lib versions check failed. Check the logs.

Cheers,
-Greg

On Mon, Aug 23, 2021 at 9:37 AM Houston Putman  wrote:
>
> Thanks for calling this out Mike.
>
> It's related to SOLR-15089. I'll get it sorted out, as well as some test 
> flakiness that were introduced by that ticket.
>
> - Houston
>
> On Mon, Aug 23, 2021 at 6:56 AM Michael McCandless 
>  wrote:
>>
>> This is the actual error -- ring any bells?:
>>
>> /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-Solr-Maven-8.x/lucene/common-build.xml:726:
>>  Could not resolve artifacts: Could not find artifact 
>> com.amazonaws:aws-java-sdk-bom:jar:1.12.42 in central 
>> (https://repo1.maven.org/maven2/)
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Aug 20, 2021 at 8:45 PM Apache Jenkins Server 
>>  wrote:
>>>
>>> Lucene » Lucene-Solr-Maven-8.x - Build # 379 - Still Failing:
>>>
>>> Check console output at 
>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Maven-8.x/379/ to 
>>> view the results.
>>>
>>> -
>>> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: builds-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-11.0.6) - Build # 31140 - Unstable!

2021-08-07 Thread Greg Miller

Fix has been pushed. Will keep an eye on the nightly runs to make sure
everything is good now (but I expect it should be).

Cheers,
-Greg

On Sat, Aug 7, 2021 at 09:08 Greg Miller  wrote:

> FYI, I cut LUCENE-10047 to track this, but it's an easy/small fix. I'll
> push shortly to ensure these builds start passing again.
>
> Cheers,
> -Greg
>
> On Sat, Aug 7, 2021 at 8:39 AM Policeman Jenkins Server <
> jenk...@thetaphi.de> wrote:
>
>> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/31140/
>> Java: 64bit/jdk-11.0.6 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
>>
>> 1 tests failed.
>> FAILED:
>> org.apache.lucene.facet.TestLongValueFacetCounts.testRandomMultiValued
>>
>> Error Message:
>> java.lang.AssertionError: all docs, sort facets by value: counts[333]
>> expected:<16> but was:<14>
>>
>> Stack Trace:
>> java.lang.AssertionError: all docs, sort facets by value: counts[333]
>> expected:<16> but was:<14>
>> at
>> __randomizedtesting.SeedInfo.seed([39FEA95A68DD83A6:2735C56AB2F2607E]:0)
>> at org.junit.Assert.fail(Assert.java:89)
>> at org.junit.Assert.failNotEquals(Assert.java:835)
>> at org.junit.Assert.assertEquals(Assert.java:647)
>> at
>> org.apache.lucene.facet.TestLongValueFacetCounts.assertSame(TestLongValueFacetCounts.java:615)
>> at
>> org.apache.lucene.facet.TestLongValueFacetCounts.testRandomMultiValued(TestLongValueFacetCounts.java:461)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>> at
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>> at
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> at
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> at
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>> at
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>> at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>> at
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at
>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>> at
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>> at
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>> at
>> com

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-11.0.6) - Build # 31140 - Unstable!

2021-08-07 Thread Greg Miller

FYI, I cut LUCENE-10047 to track this, but it's an easy/small fix. I'll
push shortly to ensure these builds start passing again.

Cheers,
-Greg

On Sat, Aug 7, 2021 at 8:39 AM Policeman Jenkins Server 
wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/31140/
> Java: 64bit/jdk-11.0.6 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
>
> 1 tests failed.
> FAILED:
> org.apache.lucene.facet.TestLongValueFacetCounts.testRandomMultiValued
>
> Error Message:
> java.lang.AssertionError: all docs, sort facets by value: counts[333]
> expected:<16> but was:<14>
>
> Stack Trace:
> java.lang.AssertionError: all docs, sort facets by value: counts[333]
> expected:<16> but was:<14>
> at
> __randomizedtesting.SeedInfo.seed([39FEA95A68DD83A6:2735C56AB2F2607E]:0)
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at
> org.apache.lucene.facet.TestLongValueFacetCounts.assertSame(TestLongValueFacetCounts.java:615)
> at
> org.apache.lucene.facet.TestLongValueFacetCounts.testRandomMultiValued(TestLongValueFacetCounts.java:461)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evalu

Re: [DrillSidewaysScorer] performance degradation

2021-07-19 Thread Greg Miller

Interesting. Thanks for raising this Grigoriy! Yes, please open an issue to
track this. It would be nice if we could optimize DrillSidewaysScorer to
not compute the score for "near misses"!

Cheers,
-Greg

On Sun, Jul 18, 2021 at 3:52 PM Grigorii Troitckii 
wrote:

> Hello, community,
>
> *Question*
> Is it ok if I create a Jira issue and pull-request with the following diff?
>
> *Diff*
> @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer {
>
>collectDocID = docID;
>
> -  // TODO: we could score on demand instead since we are
> -  // daat here:
> -  collectScore = baseScorer.score();
> -
>if (failedCollector == null) {
> +collectScore = baseScorer.score();
>  // Hit passed all filters, so it's "real":
>  collectHit(collector, dims);
>} else {
>
> *Motivation*
> 1. Performance degradation: we have quite heavy custom implementation of
> score(). So when we started using DrillSideways, this call became top-1 in
> a profiler snapshot (top-3 with default scoring). We tried doUnionScoring
> and doDrillDownAdvanceScoring, but no luck:
> doUnionScoring scores all baseQuery docIds
> doDrillDownAdvanceScoring avoids some redundant docIds scorings,
> considering symmetric difference of top two iterator's docIds, but still
> scores some docIds, that will be filtered out by 3rd, 4th, ... dimension
> iterators
> doQueryFirstScoring scores near-miss docIds
> Best way is to score only true hits (where baseQuery and all N drill-down
> iterators match). So we suggest a small modification of
> doQueryFirstScoring.
>
> 2. Speaking of doQueryFirstScoring, it doesn't look like we need to
> calculate a score for near-miss hit, because it won't be used anywhere.
> FacetsCollectorManager creates FacetsCollector with default constructor
>
> https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35
> so FacetCollector has false for keepScores
>
> https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119
> and collectScore is not being used
>
> https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200
>
> Thank you in advance,
> https://hh.ru search team,
> Grigoriy Troitskiy.
>

Re: Two-phase range queries?

2021-06-29 Thread Greg Miller

Thanks Robert/Adrien for the quick thoughts!

Robert- Just to clarify a little bit (slightly more caffeinated now): For
the cases where an indexed rectangle fits entirely within the query
rectangle, these would be "confirmed" matches up-front, relying on the
efficiency of the BKD-tree data structure (as you describe). The second
phase confirmation I had in mind were the cases where the query
rectangle overlaps,
or sits inside the indexed rectangle, and the individual points/docs within
the indexed rectangle need to be checked against the boundaries of the
query rectangle. That step seemed like a good candidate for a second phase,
effectively avoiding that check if it doesn't end up being necessary. But,
if that is relatively rare compared to the case where an indexed shape fits
entirely within the query shape, then maybe there isn't value.

Adrien- Oh cool! Was not aware of this actually, but after reading the blog
and looking at the code a little bit, this is in the neighborhood of what I
was thinking. I suppose what I had in mind is a little bit of a hybrid of
the two approaches currently implemented. I wonder if, in the case where
the range query is used to "verify" matches (to borrow from your
terminology), there's a benefit to still using the BKD-tree to quickly
eliminate docs that don't match before using the doc values to confirm
matches for "approximate" matches. Said differently, what if there was a
range query implementation that collected "definite" matches as well as
"potential" matches using the BKD-tree up-front (so definite matches are
docs where the indexed shape is entirely contained in the query shape and
potential matches are cases where the indexed shape overlaps the query
shape), then in a second phase used the doc values to confirm/reject
"potential" matches? Just a thought.

Unwinding a bit, these responses are exactly what I was looking for. As a
starting point, I wanted to understand if this had been explored, and it
looks like it certainly has—so thanks for all the pointers!

Cheers,
-Greg

On Tue, Jun 29, 2021 at 7:40 AM Adrien Grand  wrote:

> Hi Greg,
>
> Have you looked at IndexOrDocValuesQuery? It dynamically chooses between
> computing the range up-front using the BKD tree and running the range query
> using doc values depending on the estimated cost of the range query
> (computed by counting the number of leaf nodes of the BKD tree that have
> matches, which is cheap to compute) vs. the cost of the overall query.
> Hopefully the javadocs are not too bad, and I wrote a small blog
> <https://www.elastic.co/blog/better-query-planning-for-range-queries-in-elasticsearch>
> about this query a few years ago.
>
>
> On Tue, Jun 29, 2021 at 3:30 PM Greg Miller  wrote:
>
>> Hi folks-
>>
>> I've been spending a little time getting familiar with the BKD-tree-based
>> range query support currently implemented in Lucene, and wonder if there's
>> ever been a discussion around supporting two-phase iteration in this space.
>> If I'm understanding the current implementation properly (specifically
>> looking at PointRangeQuery), it appears that all matches are determined
>> up-front by 1) identifying segments of the tree that contain candidate
>> matches (i.e., containing part of the query range), and then 2) confirming
>> whether-or-not the contained points actually fall within the range. I'm
>> also a little low on coffee this morning so it's entirely possible I'm
>> misunderstanding the current implementation (please correct me if so).
>>
>> With this approach, it seems like we could potentially be doing quite a
>> bit of wasted effort in some situations. I have no thoughts on how to
>> actually implement this yet, but I wonder if we could support two-phase
>> iteration by 1) returning all docs with points contained in candidate
>> BKD-tree segments as an approximation, and then 2) only checking the points
>> against the query range when confirming matches in the second phase? I
>> think the idea would extend to LatLonPointDistanceQuery as well (and maybe
>> others?).
>>
>> I did a Jira search for a related issue but came up empty. Anyone know if
>> this idea has been discussed previously, or if there's some inherent flaw
>> with the approach that would make it a non-starter? I don't really have any
>> cycles to work on this at the moment, but can at least open a Jira issue to
>> track if it seems like a reasonable thing to explore.
>>
>> Cheers,
>> -Greg
>>
>
>
> --
> Adrien
>

Two-phase range queries?

2021-06-29 Thread Greg Miller

Hi folks-

I've been spending a little time getting familiar with the BKD-tree-based
range query support currently implemented in Lucene, and wonder if there's
ever been a discussion around supporting two-phase iteration in this space.
If I'm understanding the current implementation properly (specifically
looking at PointRangeQuery), it appears that all matches are determined
up-front by 1) identifying segments of the tree that contain candidate
matches (i.e., containing part of the query range), and then 2) confirming
whether-or-not the contained points actually fall within the range. I'm
also a little low on coffee this morning so it's entirely possible I'm
misunderstanding the current implementation (please correct me if so).

With this approach, it seems like we could potentially be doing quite a bit
of wasted effort in some situations. I have no thoughts on how to actually
implement this yet, but I wonder if we could support two-phase iteration by
1) returning all docs with points contained in candidate BKD-tree segments
as an approximation, and then 2) only checking the points against the query
range when confirming matches in the second phase? I think the idea would
extend to LatLonPointDistanceQuery as well (and maybe others?).

I did a Jira search for a related issue but came up empty. Anyone know if
this idea has been discussed previously, or if there's some inherent flaw
with the approach that would make it a non-starter? I don't really have any
cycles to work on this at the moment, but can at least open a Jira issue to
track if it seems like a reasonable thing to explore.

Cheers,
-Greg

Re: Welcome Mayya Sharipova to the Lucene PMC

2021-06-28 Thread Greg Miller

Congratulations Mayya!

On Mon, Jun 28, 2021 at 8:04 AM Nhat Nguyen 
wrote:

> Congrats Mayya!
>
> On Mon, Jun 28, 2021 at 10:50 AM Dawid Weiss 
> wrote:
>
>>
>> Congratulations, Mayya!
>>
>> Dawid
>>
>> On Mon, Jun 28, 2021 at 3:17 PM Robert Muir  wrote:
>>
>>> I am pleased to announce that Mayya has accepted an invitation to join
>>> the Lucene PMC!
>>>
>>> Congratulations, and welcome aboard!
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-15) - Build # 30669 - Unstable!

2021-06-24 Thread Greg Miller

I'm looking into this. I think randomized testing uncovered a concurrency
bug in some of my DrillSideways work.

Cheers,
-Greg

On Thu, Jun 24, 2021 at 2:40 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/30669/
> Java: 64bit/jdk-15 -XX:-UseCompressedOops -XX:+UseParallelGC
>
> 3 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
>
> Error Message:
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
>
> Stack Trace:
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at
> __randomizedtesting.SeedInfo.seed([F04F27A09C77D609:820302AF2D17607A]:0)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:707)
> at
> org.apache.lucene.facet.DrillSideways.searchSequentially(DrillSideways.java:519)
> at
> org.apache.lucene.facet.DrillSideways.search(DrillSideways.java:460)
> at
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1168)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedt

Re: Boolean Scorer

2021-06-15 Thread Greg Miller

Thanks for this explanation Adrien! I'd been wondering about this a bit
myself since seeing that DrillSideways also implements a TAAT approach (in
addition to a doc-at-a-time approach). This really helps clear that up.
Appreciate you taking the time to explain!

Cheers,
-Greg

On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand  wrote:

> Hello Arihant,
>
> The Scorer for disjunctions uses a heap data structure that needs to be
> reordered upon every hit. While reordering heaps is efficient as it runs in
> logarithmic time, the fact that it needs to run on every document might add
> non-negligible overhead. BooleanScorer tries to work around this overhead
> by scoring large windows of documents in a more TAAT (term-at-a-time)
> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
> (the hardcoded window size).
>
> This paper gives a bit more context:
> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
> see section 4 in particular.
>
> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar  wrote:
>
>> Hi ,
>>
>> I am new here . I would like to know what is the exact optimisation
>> carried out in “Boolean Scorer.java” code which led to a separate class for
>> resolving Boolean Queries in bulk documents. I could not find any material
>> in the documentation for this as well, hence I decided to ask here.
>>
>>
>> Thanking you in advance,
>>
>> Arihant.
>>
>>
>>
>> Sent from Mail  for
>> Windows 10
>>
>>
>>
>
>
> --
> Adrien
>

Re: A couple backport questions

2021-06-07 Thread Greg Miller

Thanks Mike!

Matching the original styling seems reasonable to me. Shouldn't be too
difficult (at least for the backports I had in mind). I also really
appreciate the consistent styling enforced (and made easy!) with 9.0!

As for combining a couple small changes, I'll go ahead with this
approach. I think it makes sense in the case I have in mind, but I'll
put the PR out there and see if folks agree/disagree. Will definitely
reference the original issues in the commit message.

If anyone else has feedback (or tips/tricks on their current approach
to backporting), I'm happy to hear them. Thanks!

Cheers,
-Greg

On Sun, Jun 6, 2021 at 7:14 AM Michael McCandless
 wrote:
>
> Great questions!
>
> On Fri, Jun 4, 2021 at 1:37 PM Greg Miller  wrote:
>>
>> Hey folks-
>>
>> I have a couple (hopefully quick) questions about backporting best
>> practices (from the lucene/main branch to lucene-solr/branch_8x).
>> Really appreciate the help!
>>
>> 1. A lot of code reformatting has happened on lucene/main (e.g.,
>> spotlessApply). When pulling over changes into branch_8x for a given
>> file, is it preferable to pull in the formatting changes as well, or
>> are we trying to leave the formatting of branch_8x alone as much as
>> possible? It generally seems easier to pull the formatting changes
>> over as well, but it can make a PR look more involved than it really
>> is.
>
>
> I think it is best to match 8.x styling if it is not too much trouble?  The 
> "when in Rome ..." argument.  But if that is too much hassle, it's OK to 
> carry over main's styling?
>
> I love that we have moved to fixed styling for 9.0 onwards (thanks Dawid!).
>
>> 2. In a case where multiple small changes were made to the same area
>> of code (e.g., a fast-follow update or a bug fix), is it acceptable to
>> combine those changes into one PR against branch_8x? Or do we want to
>> maintain a more strict 1:1 relationship between a change on main and
>> branch_8x? Seems like it would be OK to combine a couple small changes
>> into a single backport PR, but curious if people feel differently.
>
>
> I think it's fine to combine small changes into single backport, but try to 
> reference all of the original issues in the resulting commit message?
>
> Mike
> --
> Mike McCandless
>
> http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 136 matches

Mail list logo