Re: 2021-01 Lucene/Solr Committer meeting

2021-01-16 Thread David Smiley
On Sat, Jan 16, 2021 at 4:50 PM Adrien Grand  wrote:

> It's not fully clear to me why this question is important in the context
> of 9.0, is it because we are considering having a long delay between Lucene
> 9.0 and Solr 9.0 and we would like to avoid keeping Solr without a release
> for too long?
>

Yes.

Alternatively, I suppose Solr could try and figure out how to release a
Solr 8.9 using Lucene 8.8 or if necessary some 8.9-SNAPSHOT.  It would be
a non-standard release of course, ideally performed by someone who is well
familiar with doing Solr releases in order to ensure there is no concurrent
Lucene release.

~ David

>


Re: 2021-01 Lucene/Solr Committer meeting

2021-01-16 Thread Adrien Grand
I don't think you are missing something, Jan, but I would rather not
introduce exceptions in order to keep things simple for users.

It's not fully clear to me why this question is important in the context of
9.0, is it because we are considering having a long delay between Lucene
9.0 and Solr 9.0 and we would like to avoid keeping Solr without a release
for too long?

On Sat, Jan 16, 2021 at 7:04 PM Jan Høydahl  wrote:

> Can we not write in 8.9 notes that upgrades to 9.0 may work but is not
> officially supported, and that they should wait for 9.1. Then add 8.9 back
> compact tests to 9.1. Am I missing something?
>
> Jan Høydahl
>
> 16. jan. 2021 kl. 18:43 skrev Adrien Grand :
>
> 
> On Thu, Jan 14, 2021 at 8:02 PM David Smiley  wrote:
>
>> I'm not familiar with why the testing needs to be manual instead of
>> automated.  After having a RC of 8.9, couldn't we add the back-compat
>> indices to branch_9x and check that 9.0 is happy with them (running
>> applicable automated tests) as a precondition for releasing 8.9?
>> Regardless, you say we've done it before, and we can do it again.  I think
>> it's likely it'll happen.
>>
>
> Right, this is what needs to be done (though ideally on 9.x tags rather
> than branch_9x). I called it manual because it requires special action and
> is neither checked by Jenkins nor by any of the people who run the smoke
> tester on release artifacts. My personal take on this is that the risk of
> having a backward compatibility gap or the cost of investing into automated
> testing for this outweigh the benefits we can get from releasing new minors
> of the previous major. I couldn't attend this meeting, what is the
> motivation for keeping releasing 8.x after 9.0 is out?
>
> For the record, I am less worried about patch releases, which have much
> smaller scopes and never change file formats. So I'd be fine with doing
> 8.lastest patch releases after 9.0 is out, similarly to how we released
> 7.7.3 after 8.0.
>
> --
> Adrien
>
>

-- 
Adrien


Re: 2021-01 Lucene/Solr Committer meeting

2021-01-16 Thread Jan Høydahl
Can we not write in 8.9 notes that upgrades to 9.0 may work but is not 
officially supported, and that they should wait for 9.1. Then add 8.9 back 
compact tests to 9.1. Am I missing something?

Jan Høydahl

> 16. jan. 2021 kl. 18:43 skrev Adrien Grand :
> 
> 
>> On Thu, Jan 14, 2021 at 8:02 PM David Smiley  wrote:
> 
>> I'm not familiar with why the testing needs to be manual instead of 
>> automated.  After having a RC of 8.9, couldn't we add the back-compat 
>> indices to branch_9x and check that 9.0 is happy with them (running 
>> applicable automated tests) as a precondition for releasing 8.9?  
>> Regardless, you say we've done it before, and we can do it again.  I think 
>> it's likely it'll happen.
> 
> 
> Right, this is what needs to be done (though ideally on 9.x tags rather than 
> branch_9x). I called it manual because it requires special action and is 
> neither checked by Jenkins nor by any of the people who run the smoke tester 
> on release artifacts. My personal take on this is that the risk of having a 
> backward compatibility gap or the cost of investing into automated testing 
> for this outweigh the benefits we can get from releasing new minors of the 
> previous major. I couldn't attend this meeting, what is the motivation for 
> keeping releasing 8.x after 9.0 is out?
> 
> For the record, I am less worried about patch releases, which have much 
> smaller scopes and never change file formats. So I'd be fine with doing 
> 8.lastest patch releases after 9.0 is out, similarly to how we released 7.7.3 
> after 8.0.
> 
> -- 
> Adrien


Re: 2021-01 Lucene/Solr Committer meeting

2021-01-16 Thread Adrien Grand
On Thu, Jan 14, 2021 at 8:02 PM David Smiley  wrote:

> I'm not familiar with why the testing needs to be manual instead of
> automated.  After having a RC of 8.9, couldn't we add the back-compat
> indices to branch_9x and check that 9.0 is happy with them (running
> applicable automated tests) as a precondition for releasing 8.9?
> Regardless, you say we've done it before, and we can do it again.  I think
> it's likely it'll happen.
>

Right, this is what needs to be done (though ideally on 9.x tags rather
than branch_9x). I called it manual because it requires special action and
is neither checked by Jenkins nor by any of the people who run the smoke
tester on release artifacts. My personal take on this is that the risk of
having a backward compatibility gap or the cost of investing into automated
testing for this outweigh the benefits we can get from releasing new minors
of the previous major. I couldn't attend this meeting, what is the
motivation for keeping releasing 8.x after 9.0 is out?

For the record, I am less worried about patch releases, which have much
smaller scopes and never change file formats. So I'd be fine with doing
8.lastest patch releases after 9.0 is out, similarly to how we released
7.7.3 after 8.0.

-- 
Adrien


Faster advance on Vector Values

2021-01-16 Thread Anand Kotriwal
Hi ,

Our team is using the recently introduced Lucene90Codec support for
vectors. We have a use case to quickly scan a segment for documents having
vectors.  While implementing it, we noticed that the advance function in
the class Lucene90VectorReader does a linear search for the target document.
I have a proposal to make it faster - We can implement a binary search over
the "ordToDoc" array which will make the advance operation take logarithmic
time to search.

I would like to seek ideas, suggestions from the community. I have an
implementation on my private fork that implements the above idea. I can
open a PR if the idea sounds reasonable.

Thanks !
Anand Kotriwal


Re: Solr Docker discussion

2021-01-16 Thread Jan Høydahl
Great summary Houston!

Could also be that docker team is willing to provide a “link” from official 
_/solr to apache/solr if we can convince them of solid quality. Think they do 
this for elastic images already.

Since Docker images contain Linux and Java, which we would not be allowed to 
release as part of Solr, I have seen discussions in various ASF lists stating 
that it can be argued that the only binaries we do “release” are the layers 
built by out Dockerfile, i.e. what comes after the (runtime) FROM line. So we 
should be careful with what extra software we add in dockerfile. I did a check 
earlier and think we are in good shape.

We currently re-build a bunch of older solr docked images every time we release 
a new, but I don’t think there is any automatic refresh of images outside a 
release. Great idea to kick off refresh of images from Jenkins.

We can also publish nightly “master” images but since they are not officially 
voted releases they must be clearly labelled as unofficial and not advertised 
on the web page, only for dev purposes.

Jan Høydahl

> 16. jan. 2021 kl. 00:36 skrev Timothy Potter :
> 
> I'm curious about how tags will work when updating the base image for a 
> released image? The image for a tag should be immutable (IMHO), and I think 
> people would be surprised if 8.8.0 suddenly changed even if it was for a good 
> reason such as fixing a CVE in the base image. But based on what Kevin said, 
> perhaps there's already precedence for this with the official images?
> 
> On Fri, Jan 15, 2021 at 1:51 PM Houston Putman  
> wrote:
>> Thanks for bringing up this issue Kevin.
>> 
>> Periodically re-building docker images is certainly a feature we could 
>> support, and probably should to automatically keep up with security fixes. 
>> We could even automate it pretty easily in Jenkins.
>> 
>> We could also build in support in the gradle commands to instead of building 
>> a TGZ from source, download and verify the "official" TGZ, to build the 
>> image with. That way release images are always built with the same exact 
>> binaries. The Dockerfile wouldn't need to change at all between local and 
>> release, it still merely expects a TGZ to be passed in the context; gradle 
>> can determine if it needs to be built from scratch or downloaded.
>> 
>> This still likely wouldn't be good enough to make the image an "official 
>> docker image" but it gets us to essentially the same end-state image. The 
>> only difference is the downloading and verification are happening in gradle 
>> instead of the Dockerfile.
>> 
>> - Houston
>> 
>> On Fri, Jan 15, 2021 at 2:05 PM Kevin Risden  wrote:
 Currently the solr-docker-image, and a majority of "Docker Official 
 Images", the officially released Solr binaries are downloaded from mirrors 
 and validated within the Dockerfiles. This makes it easy to ensure to 
 users that the 9.0 solr docker image contains the 9.0 solr release. This 
 process doesn't fit very well with local builds, because there is nowhere 
 to download local builds from, and validation isn't required.
 
 The current opinion in the community is to abandon the "Docker Official 
 Images" style process of downloading and validating official binaries, and 
 instead having the release manager use the local-build image creation with 
 the final release source. This should result in the same docker image in 
 the end, however there is no trust built into the docker image itself. 
 Instead we are likely going to document a way for users to verify the 
 docker-image contents themselves.
>>> 
>>> Before we abandon the official process of downloading/validating official 
>>> binaries, I think there is a good reason to keep the ability to download an 
>>> "official" Apache Solr release and use it in the "official" Solr 
>>> convenience Docker image.
>>> 
>>> Docker images are static point in time copies of an OS and all supporting 
>>> packages (like Java) when built. Periodically Docker images should be 
>>> rebuilt to pick up the latest security and bug enhancements in the base 
>>> image. Just like any OS should run `apt upgrade` or `yum update` 
>>> periodically to ensure it is up to date.
>>> 
>>> My proposal is to periodically rebuild the "official" Solr convenience 
>>> Docker image based on the "official" Solr release to ensure we keep the 
>>> Docker images up to date. The idea being that we have a list of "supported" 
>>> versions of Solr (ie: 8.5, 8.6, 8.7) and periodically (ie: daily, weekly) 
>>> the Docker images are rebuilt. Once a new release is made (ie: 8.8) it gets 
>>> added to this rebuilding matrix. This ensures that the "official" Solr 
>>> convenience Docker image is reasonably up to date with regards to the base 
>>> image security updates.
>>> 
>>> My understanding (from a few years ago) is that the Docker official images 
>>> are rebuilt when the base image is updated. This was automatic from what I 
>>> 

Re: Blog post - Profiling the Lucene nightly benchmarks

2021-01-16 Thread Adrien Grand
This is very cool, thanks for sharing Anton!

Le ven. 15 janv. 2021 à 23:40, Anton Hägerstrand  a
écrit :

> Hello everyone!
>
> I recently wrote a blog post which looks into profiling data of the Lucene
> nightl benchmarks. I emailed Michael McCandless (the maintainer of the
> benchmarks) and he suggested that I post about it here, so here we go.
>
> The post is available at https://blunders.io/posts/lucene-bench-2021-01-10.
> I have published some more periodic profiling data at
> https://blunders.io/lucene-bench - this is not really nightly, but one
> might be able to spot changes over time.
>
> If you have any feedback or questions, I'll happily listen and answer.
>
> best regards,
> Anton Hägerstrand
>
> PS. If no one beats me too it, I'll open a PR for the TermGroupSelector
> thing ;)
>