Re: Lucene 10

2024-06-27 Thread Michael McCandless
Thanks Adrien.  Longish term planning in open source is such a hard thing
so I'm glad you are helping to herd us cats ;)

I've also finally switched our nightly benchmarks to use concurrent search
(intra-query concurrency)!  It's annotation GM in the charts.  Some queries
got faster, like BooleanQuery disjunction of two high frequency terms (
https://home.apache.org/~mikemccand/lucenebench/OrHighHigh.html) and some
got slower e.g. simple TermQuery (
https://home.apache.org/~mikemccand/lucenebench/Term.html).  Now as we make
improvements to Lucene's cross-slice / cross-thread search concurrency,
e.g. intra-segment concurrency, we should be able to see the gains in our
nightly benchmarks.  Adding concurrency to Lucene has been such a long and
fun road, and we are really only getting started in search-time concurrency.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jun 26, 2024 at 10:59 AM Adrien Grand  wrote:

> Hello everyone,
>
> Time flies, I started this email thread ~3.5 months ago and we now have ~3
> months before September 22nd, where 10.0 will go on feature freeze.
>
> Robert kindly added a description to the GitHub milestone that refers to
> this thread: https://github.com/apache/lucene/milestone/2.
>
> Overall, progress looks rather good to me:
>  - I/O concurrency is progressing nicely
> https://github.com/apache/lucene/issues/13179. In particular I'm hoping
> to merge I/O concurrency for terms dictionary lookups soon.
>  - Ignacio recently merged initial support for sparse indexing.
> https://github.com/apache/lucene/issues/11432 There are follow-ups we
> need to address, but they look reasonable in terms of amount of work and
> uncontroversial.
>
> Some things have got less traction:
>  - We haven't made significant progress on intra-segment search
> concurrency: https://github.com/apache/lucene/issues/9721.
>  - Relatedly, if we think that IndexSearcher should enable concurrency by
> default, a major version is a good time to make such a big change to
> runtime behavior. https://github.com/apache/lucene/issues/11523
>
> In any case, help is welcome. I know people have been creating more issues
> that they attached to the 10.0 milestone, e.g. doing more off-heap scoring
> for vectors https://github.com/apache/lucene/issues/13515 or deprecating
> the COSINE similarity https://github.com/apache/lucene/issues/13281. This
> is great too, the list isn't closed, I'll start thinking harder about which
> changes specifically should block the release as we get closer to September
> (I can't think of any at the moment). In the meantime, it's fine to
> optimistically attach issues to the 10.0 milestone.
>
> On Wed, Mar 20, 2024 at 2:09 PM Adrien Grand  wrote:
>
>> Thanks Mike and Dawid for the kind words, and thanks Patrick, Luca and
>> Egor for your interest in decoupling index geometry from search
>> concurrency, this would be a great release highlight if we can get it into
>> Lucene 10!
>>
>> I haven't seen pushback on the proposed schedule so I plan on proceeding
>> with this timeline in mind.
>>
>> If you have changes that you would like to include in Lucene 10.0, please
>> add the 10.0 milestone
>>  to them. It's ok to
>> be a bit ambitious at this stage and optimistically mark some changes as
>> scheduled for 10.0, we'll have opportunities for removing items from this
>> list when the date comes closer and some issues are not getting proper
>> traction. I'll take care of that.
>>
>> On Mon, Mar 18, 2024 at 11:39 AM Dawid Weiss 
>> wrote:
>>
>>> [...] but Adrien I don't honestly believe anyone who is
 paying attention thinks that is what you have been doing!
>>>
>>>
>>> +1. I wish I were procrastinating as productively!
>>>
>>> D.
>>>
>>
>>
>> --
>> Adrien
>>
>
>
> --
> Adrien
>


Re: Lucene 10

2024-06-26 Thread Adrien Grand
Hello everyone,

Time flies, I started this email thread ~3.5 months ago and we now have ~3
months before September 22nd, where 10.0 will go on feature freeze.

Robert kindly added a description to the GitHub milestone that refers to
this thread: https://github.com/apache/lucene/milestone/2.

Overall, progress looks rather good to me:
 - I/O concurrency is progressing nicely
https://github.com/apache/lucene/issues/13179. In particular I'm hoping to
merge I/O concurrency for terms dictionary lookups soon.
 - Ignacio recently merged initial support for sparse indexing.
https://github.com/apache/lucene/issues/11432 There are follow-ups we need
to address, but they look reasonable in terms of amount of work and
uncontroversial.

Some things have got less traction:
 - We haven't made significant progress on intra-segment search
concurrency: https://github.com/apache/lucene/issues/9721.
 - Relatedly, if we think that IndexSearcher should enable concurrency by
default, a major version is a good time to make such a big change to
runtime behavior. https://github.com/apache/lucene/issues/11523

In any case, help is welcome. I know people have been creating more issues
that they attached to the 10.0 milestone, e.g. doing more off-heap scoring
for vectors https://github.com/apache/lucene/issues/13515 or deprecating
the COSINE similarity https://github.com/apache/lucene/issues/13281. This
is great too, the list isn't closed, I'll start thinking harder about which
changes specifically should block the release as we get closer to September
(I can't think of any at the moment). In the meantime, it's fine to
optimistically attach issues to the 10.0 milestone.

On Wed, Mar 20, 2024 at 2:09 PM Adrien Grand  wrote:

> Thanks Mike and Dawid for the kind words, and thanks Patrick, Luca and
> Egor for your interest in decoupling index geometry from search
> concurrency, this would be a great release highlight if we can get it into
> Lucene 10!
>
> I haven't seen pushback on the proposed schedule so I plan on proceeding
> with this timeline in mind.
>
> If you have changes that you would like to include in Lucene 10.0, please
> add the 10.0 milestone
>  to them. It's ok to
> be a bit ambitious at this stage and optimistically mark some changes as
> scheduled for 10.0, we'll have opportunities for removing items from this
> list when the date comes closer and some issues are not getting proper
> traction. I'll take care of that.
>
> On Mon, Mar 18, 2024 at 11:39 AM Dawid Weiss 
> wrote:
>
>> [...] but Adrien I don't honestly believe anyone who is
>>> paying attention thinks that is what you have been doing!
>>
>>
>> +1. I wish I were procrastinating as productively!
>>
>> D.
>>
>
>
> --
> Adrien
>


-- 
Adrien


Re: Lucene 10

2024-03-20 Thread Adrien Grand
Thanks Mike and Dawid for the kind words, and thanks Patrick, Luca and Egor
for your interest in decoupling index geometry from search concurrency,
this would be a great release highlight if we can get it into Lucene 10!

I haven't seen pushback on the proposed schedule so I plan on proceeding
with this timeline in mind.

If you have changes that you would like to include in Lucene 10.0, please
add the 10.0 milestone 
to them. It's ok to be a bit ambitious at this stage and
optimistically mark some changes as scheduled for 10.0, we'll have
opportunities for removing items from this list when the date comes closer
and some issues are not getting proper traction. I'll take care of that.

On Mon, Mar 18, 2024 at 11:39 AM Dawid Weiss  wrote:

> [...] but Adrien I don't honestly believe anyone who is
>> paying attention thinks that is what you have been doing!
>
>
> +1. I wish I were procrastinating as productively!
>
> D.
>


-- 
Adrien


Re: Lucene 10

2024-03-18 Thread Dawid Weiss
>
> [...] but Adrien I don't honestly believe anyone who is
> paying attention thinks that is what you have been doing!


+1. I wish I were procrastinating as productively!

D.


Re: Lucene 10

2024-03-18 Thread Luca Cavanna
Hey Patrick,
your help on search concurrency will be much appreciated :)  I have some
very hacky branch that I'd like to use as a base for discussion of
the issues I found and needed adjustments. Lots to do there. I will ping
you once I put up a draft PR.

Cheers
Luca

On Fri, Mar 15, 2024 at 9:55 PM Patrick Zhai  wrote:

> Thanks Adrien +1 to the timelines.
>
> I'm also willing to work on/ review the Decouple search concurrency from
> index geometry  task,
> that's a very nice one to have for those latency sensitive applications
> (rather than have to tune
> merge policy case by case). But I cannot guarantee anything yet so if
> others are also
> working on it I'm happy to share the ideas/ efforts (if any).
>
> Patrick
>
> On Thu, Mar 14, 2024 at 12:09 PM Michael Sokolov 
> wrote:
>
>> timing makes sense to me. +1 for having a deadline to reduce
>> procrastination, but Adrien I don't honestly believe anyone who is
>> paying attention thinks that is what you have been doing!
>>
>> On Wed, Mar 13, 2024 at 10:40 AM Adrien Grand  wrote:
>> >
>> > Hello everyone!
>> >
>> > It's been ~2.5 years since we released Lucene 9.0 (December 2021) and
>> I'd like us to start working towards Lucene 10.0. I'm volunteering for
>> being the release manager and propose the following timeline:
>> >  - ~September 15th: main gets bumped to 11.x, branch_10x gets created
>> >  - ~September 22nd: Do a last 9.x minor release.
>> >  - ~October 1st: Release 10.0.
>> >
>> > This may sound like a long notice period. My motivation is that there
>> are a few changes I have on my mind that are likely worthy of a major
>> release, and I plan on taking advantage of a date being set to stop
>> procrastinating and finally start moving these enhancements forward. These
>> are not blockers, only my wish list for Lucene 10.0, if they are not ready
>> in time we can have discussions about letting them slip until the next
>> major.
>> >  - Greater I/O concurrency. Can Lucene better utilize modern disks that
>> are plenty concurrent?
>> >  - Decouple search concurrency from index geometry. Can Lucene better
>> utilize modern CPUs that are plenty concurrent?
>> >  - "Sparse indexing" / "zone indexing" for sorted indexes. This is one
>> of the most efficient techniques that OLAP databases take advantage of to
>> make search fast. Let's bring it to Lucene.
>> >
>> > This list isn't meant to be an exhaustive list of release highlights
>> for Lucene 10, feel free to add your own. There are also a number of
>> cleanups we may want to consider. I wanted to share this list for
>> visibility though in case you have thoughts on these enhancements and/or
>> would like to help.
>> >
>> > --
>> > Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: Lucene 10

2024-03-15 Thread Patrick Zhai
Thanks Adrien +1 to the timelines.

I'm also willing to work on/ review the Decouple search concurrency from
index geometry  task,
that's a very nice one to have for those latency sensitive applications
(rather than have to tune
merge policy case by case). But I cannot guarantee anything yet so if
others are also
working on it I'm happy to share the ideas/ efforts (if any).

Patrick

On Thu, Mar 14, 2024 at 12:09 PM Michael Sokolov  wrote:

> timing makes sense to me. +1 for having a deadline to reduce
> procrastination, but Adrien I don't honestly believe anyone who is
> paying attention thinks that is what you have been doing!
>
> On Wed, Mar 13, 2024 at 10:40 AM Adrien Grand  wrote:
> >
> > Hello everyone!
> >
> > It's been ~2.5 years since we released Lucene 9.0 (December 2021) and
> I'd like us to start working towards Lucene 10.0. I'm volunteering for
> being the release manager and propose the following timeline:
> >  - ~September 15th: main gets bumped to 11.x, branch_10x gets created
> >  - ~September 22nd: Do a last 9.x minor release.
> >  - ~October 1st: Release 10.0.
> >
> > This may sound like a long notice period. My motivation is that there
> are a few changes I have on my mind that are likely worthy of a major
> release, and I plan on taking advantage of a date being set to stop
> procrastinating and finally start moving these enhancements forward. These
> are not blockers, only my wish list for Lucene 10.0, if they are not ready
> in time we can have discussions about letting them slip until the next
> major.
> >  - Greater I/O concurrency. Can Lucene better utilize modern disks that
> are plenty concurrent?
> >  - Decouple search concurrency from index geometry. Can Lucene better
> utilize modern CPUs that are plenty concurrent?
> >  - "Sparse indexing" / "zone indexing" for sorted indexes. This is one
> of the most efficient techniques that OLAP databases take advantage of to
> make search fast. Let's bring it to Lucene.
> >
> > This list isn't meant to be an exhaustive list of release highlights for
> Lucene 10, feel free to add your own. There are also a number of cleanups
> we may want to consider. I wanted to share this list for visibility though
> in case you have thoughts on these enhancements and/or would like to help.
> >
> > --
> > Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Lucene 10

2024-03-14 Thread Michael Sokolov
timing makes sense to me. +1 for having a deadline to reduce
procrastination, but Adrien I don't honestly believe anyone who is
paying attention thinks that is what you have been doing!

On Wed, Mar 13, 2024 at 10:40 AM Adrien Grand  wrote:
>
> Hello everyone!
>
> It's been ~2.5 years since we released Lucene 9.0 (December 2021) and I'd 
> like us to start working towards Lucene 10.0. I'm volunteering for being the 
> release manager and propose the following timeline:
>  - ~September 15th: main gets bumped to 11.x, branch_10x gets created
>  - ~September 22nd: Do a last 9.x minor release.
>  - ~October 1st: Release 10.0.
>
> This may sound like a long notice period. My motivation is that there are a 
> few changes I have on my mind that are likely worthy of a major release, and 
> I plan on taking advantage of a date being set to stop procrastinating and 
> finally start moving these enhancements forward. These are not blockers, only 
> my wish list for Lucene 10.0, if they are not ready in time we can have 
> discussions about letting them slip until the next major.
>  - Greater I/O concurrency. Can Lucene better utilize modern disks that are 
> plenty concurrent?
>  - Decouple search concurrency from index geometry. Can Lucene better utilize 
> modern CPUs that are plenty concurrent?
>  - "Sparse indexing" / "zone indexing" for sorted indexes. This is one of the 
> most efficient techniques that OLAP databases take advantage of to make 
> search fast. Let's bring it to Lucene.
>
> This list isn't meant to be an exhaustive list of release highlights for 
> Lucene 10, feel free to add your own. There are also a number of cleanups we 
> may want to consider. I wanted to share this list for visibility though in 
> case you have thoughts on these enhancements and/or would like to help.
>
> --
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org