Re: waiting for a PR review regarding the FieldHighlighter.

2024-05-23 Thread David Smiley
I took a look!

On Sat, May 18, 2024 at 5:51 PM 쿨해머  wrote:
>
> Hello. I have submitted a PR that allows users to decide the final sorting 
> criteria for passages in the FieldHighlighter. If anyone is interested, 
> please take a look. I will leave the PR link below.
>
> https://github.com/apache/lucene/pull/13276

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Announcing githubsearch!

2024-02-19 Thread David Smiley
Cool Mike!

On Mon, Feb 19, 2024 at 11:41 AM Michael McCandless
 wrote:
>
> Hi Team,
>
> ~1.5 years ago (August 2022) we migrated our Lucene issue tracking from Jira 
> to GitHub. Thank you Tomoko for all the hard work doing such a complex, 
> multi-phased, high-fidelity migration!
>
> I finally finished also migrating jirasearch to GitHub: 
> githubsearch.mikemccandless.com. It was tricky because GitHub issues/PRs are 
> fundamentally more complex than Jira's data model, and the GitHub REST API is 
> also quite rich / heavily normalized. All of the source code for githubsearch 
> lives here. The UI remains its barebones self ;)
>
> Githubsearch is dog food for us: it showcases Lucene (currently 9.8.0), and 
> many of its fun features like infix autosuggest, block join queries (each 
> comment is a sub-document on the issue/PR), DrillSideways faceting, 
> near-real-time indexing/searching, synonyms (try “oome”), expressions, 
> non-relevance and blended-relevance sort, etc.  (This old blog post goes into 
> detail.)  Plus, it’s meta-fun to use Lucene to search its own issues, to help 
> us be more productive in improving Lucene!  Nicely recursive.
>
> In addition to good ol’ searching by text, githubsearch has some new/fun 
> features:
>
> Drill down to just PRs or issues
> Filter by “review requested” for a given user: poor Adrien has 8 (open) now 
> (sorry)! Or see your mentions (Robert is mentioned in 27 open issues/PRs). Or 
> PRs that you reviewed (Uwe has reviewed 9 still-open PRs). Or issues and PRs 
> where a user has had any involvement at all (Dawid has interacted on 197 
> issues/PRs).
> Find still-open PRs that were created by a New Contributor (an author who has 
> no changes merged into our repository) or Contributor (non-committer who has 
> had some changes merged into our repository) or Member
> Here are the uber-stale (last touched more than a month ago) open PRs by 
> outside contributors. We should ideally keep this at 0, but it’s 83 now!
> “Link to this search” to get a short-er, more permanent URL (it is NOT a URL 
> shortener, though!)
> Save named searches you frequently run (they just save to local cookie state 
> on that one browser)
>
> I’m sure there are exciting bugs, feedback/patches welcome!  If you see 
> problems, please reply to this email or file an issue here.
>
> Note that jirasearch remains running, to search Solr, Tika and Infra issues.
>
> Happy Searching,
>
> Mike McCandless
>
> http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: @TimeoutSuite and defaults (RandomizedTesting)

2024-02-17 Thread David Smiley
I found that passing -Ptests.timeoutSuite=500 doesn't have any effect
that I can see; it didn't interrupt the tests.  I needed that trailing
exclamation mark for it to do the interrupt.  Thanks for that tip.  I
don't so much mind this for specific tests that might want to pick
their own timeout (rather rare), but it's troublesome for the vast
majority.  IMO LuceneTestCase shouldn't be declaring a default; it
should be done in a gradle build file instead.  Then, configuration
for a build server (I'm thinking of Crave.io used by Solr PRs) can
specify like 10 minutes because otherwise an unlucky build hogs that
96 core server for hours.  Until then, I'll use an exclamation mark
for that server's config which isn't quite ideal but it's adequate.

On Thu, Feb 15, 2024 at 11:53 AM Dawid Weiss  wrote:
>
>
> Sorry, the docs are not the best, I know.
>
> It's documented here -
> https://github.com/randomizedtesting/randomizedtesting/blob/master/randomized-runner/src/main/java/com/carrotsearch/randomizedtesting/SysGlobals.java#L186-L197
>
> So:
>
> 1) if you pass tests.timeoutSuite=1000 this changes the default value for all 
> classes that don't define any explicit timeout using an annotation; classes 
> that do have an annotation,
> use the annotation's value,
> 2) if you pass tests.timeoutSuite=1000! then this overrides everything - the 
> default value and all annotations.
>
> I vaguely recall option (2) was added specifically for nightlies which bumped 
> the iteration multiplier - this affected tests that normally ran fairly fast
> but during nightly runs could run slower than anticipated.
>
> D.
>
>
> On Thu, Feb 15, 2024 at 3:18 PM David Smiley  wrote:
>>
>> Oh; I didn't know that took precedence -- makes sense.  Hopefully a
>> test subclass (like SolrTestCase) could override it as well.
>>
>> On Mon, Feb 12, 2024 at 2:09 PM Dawid Weiss  wrote:
>> >
>> >
>> > You can override the defaults using sysprops in your CI builds -
>> >
>> > -Ptests.timeoutSuite=1000!
>> >
>> > takes precedence over any annotations (1 second).
>> >
>> > Dawid
>> >
>> > On Mon, Feb 12, 2024 at 7:53 PM David Smiley  wrote:
>> >>
>> >> Looking at LuceneTestCase, I see the annotation from RandomizedTesting:
>> >> @TimeoutSuite(millis = 2 * TimeUnits.HOUR)
>> >> This matches my observations of some builds that timed out, perhaps
>> >> some flaky test hanging in Solr (that extends LuceneTestCase).
>> >> Looking at this annotation, there is further documentation that the
>> >> default can be set via sysprop tests.timeoutSuite.  Wouldn't doing
>> >> that make more sense than hard-coding this figure in LuceneTestCase?
>> >> For example, I'd like to have a normal/default test run have a low
>> >> timeout (10min?) but on a "nightly" run on CI, use much higher.  Not 2
>> >> hours though; individual tests needing so much should have a
>> >> TimeoutSuite applied to them.
>> >>
>> >> ~ David Smiley
>> >> Apache Lucene/Solr Search Developer
>> >> http://www.linkedin.com/in/davidwsmiley
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: @TimeoutSuite and defaults (RandomizedTesting)

2024-02-15 Thread David Smiley
Oh; I didn't know that took precedence -- makes sense.  Hopefully a
test subclass (like SolrTestCase) could override it as well.

On Mon, Feb 12, 2024 at 2:09 PM Dawid Weiss  wrote:
>
>
> You can override the defaults using sysprops in your CI builds -
>
> -Ptests.timeoutSuite=1000!
>
> takes precedence over any annotations (1 second).
>
> Dawid
>
> On Mon, Feb 12, 2024 at 7:53 PM David Smiley  wrote:
>>
>> Looking at LuceneTestCase, I see the annotation from RandomizedTesting:
>> @TimeoutSuite(millis = 2 * TimeUnits.HOUR)
>> This matches my observations of some builds that timed out, perhaps
>> some flaky test hanging in Solr (that extends LuceneTestCase).
>> Looking at this annotation, there is further documentation that the
>> default can be set via sysprop tests.timeoutSuite.  Wouldn't doing
>> that make more sense than hard-coding this figure in LuceneTestCase?
>> For example, I'd like to have a normal/default test run have a low
>> timeout (10min?) but on a "nightly" run on CI, use much higher.  Not 2
>> hours though; individual tests needing so much should have a
>> TimeoutSuite applied to them.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



@TimeoutSuite and defaults (RandomizedTesting)

2024-02-12 Thread David Smiley
Looking at LuceneTestCase, I see the annotation from RandomizedTesting:
@TimeoutSuite(millis = 2 * TimeUnits.HOUR)
This matches my observations of some builds that timed out, perhaps
some flaky test hanging in Solr (that extends LuceneTestCase).
Looking at this annotation, there is further documentation that the
default can be set via sysprop tests.timeoutSuite.  Wouldn't doing
that make more sense than hard-coding this figure in LuceneTestCase?
For example, I'd like to have a normal/default test run have a low
timeout (10min?) but on a "nightly" run on CI, use much higher.  Not 2
hours though; individual tests needing so much should have a
TimeoutSuite applied to them.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ./crave pull .. 'heapdumps/* Fwd: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!

2023-12-03 Thread David Smiley
I updated the script accordingly and I still see the problem:

https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x/6008/console

+ status=0
+ ./crave pull --extra-rsync-flags ' --ignore-missing-args'
'**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**'
'**/hs_err_pid*'
Error: rsync: [sender] change_dir "/tmp/src/solr/heapdumps" failed: No
such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1675) [Receiver=3.1.2]
rsync: [Receiver] write error: Broken pipe (32)

~ David


On Sat, Dec 2, 2023 at 5:55 PM Mikhail Khludnev  wrote:

> Thanks Yuvraaj.
> dev@, how to tweak jenkins script?
>
> On Sat, Dec 2, 2023 at 9:25 PM Yuvraaj Kelkar  wrote:
>
>> The new version of crave is in place and will be used automatically on
>> the next invocation from Jenkins.
>> Can you update the Jenkins script to call crave like this:
>>
>> ./crave pull --extra-rsync-flags ' --ignore-missing-args'
>> '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' '**/hs_err_pid*'
>>
>>
>> Release has been marked here:
>> https://github.com/accupara/crave/releases/tag/0.2-6879
>> 
>>
>> Thanks,
>> -Uv
>> On Dec 1 2023, at 11:10 am, Mikhail Khludnev  wrote:
>>
>> Make sense.
>>
>> [image: Sent from Mailspring]
>> On Fri, Dec 1, 2023 at 7:56 PM Yuvraaj Kelkar  wrote:
>>
>> I think the second option is what we'll go for.
>> I'm going to add a flag to pull that will allow the user to specify extra
>> flags to be given to rsync.
>> Then we can call crave pull like this:
>> ./crave pull --extra-rsync-flags ' --ignore-missing-args'
>> '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' '**/hs_err_pid*'
>>
>>
>> *** Note the additional space before the hypen in '
>> --ignore-missing-args' .
>>
>> This should handle the missing source files/directories.
>>
>> What do you think?
>>
>> Thanks,
>> -Uv
>>
>> On Dec 1 2023, at 12:56 am, Mikhail Khludnev  wrote:
>>
>> Hello Yuvraaj,
>> Thanks for taking care of this. Honestly it's not my wheelhouse.
>> I seems like there's a consideration that a test getting out of heap will
>> create heapdumps folder and put a file into. I don't know wether
>> test/gradle can dump heap there ever. At least we don't have tests dumps
>> heap there now. So, whether this folder exists or is absent is not certain.
>> We have a few options:
>>  - drop heapdumps/** from crave pull until someone needs to investigate a
>> test falling out of memory.
>>  - hack crave pull to ignore path wildcards for absent dir
>>  - execute $mkdir heapdumps or  $mkdir -p heapdumps (depending on
>> script's error handling more) before $crave pull
>>
>>
>> On Thu, Nov 30, 2023 at 11:24 PM Yuvraaj Kelkar  wrote:
>>
>> I just started a build with crave:
>> crave run ./gradlew --console=plain check integrationTests
>>
>> And at the end of it, looked for the patterns in the crave pull  command:
>>
>> admin@171074329f9e:/tmp/src/solr$ find . -name '*.events'
>> admin@171074329f9e:/tmp/src/solr$ find . -name 'hs_err_pid*'
>> admin@171074329f9e:/tmp/src/solr$
>> admin@171074329f9e:/tmp/src/solr$ ls -l heapdumps
>> ls: cannot access 'heapdumps': No such file or directory
>>
>>
>> The only thing I could get a lot of output on was
>>
>> admin@171074329f9e:/tmp/src/solr$ find . | grep 'build.*test.TEST' | head
>> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiTest.xml
>> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml
>>  
>> 
>> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.IndexingNestedDocuments.xml
>> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.ZkConfigFilesTest.xml
>> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiHeatmapFacetingTest.xml
>> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.SolrExporterIntegrationTest.xml
>> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperBasicAuthTest.xml
>> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.MetricsQueryTemplateTest.xml
>> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperTest.xml
>> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrCloudScraperTest.xml
>>
>>
>> Is 

Re: Code longevity statistics

2023-10-22 Thread David Smiley
IMO another factor is leaving stuff around because it takes effort to
remove old things, effort that isn't fun like making claims to remove
something that someone else may still like/use, and soliciting users "hey,
is XYZ used?".  No fun.  Lucene has several roughly equivalent ways to do
the same thing (like in highlighting).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Sep 27, 2023 at 1:35 PM Stefan Vodita 
wrote:

> Hi all,
>
> I came across an interesting article [1] on patterns of code evolution. I
> got
> curious, ran their analysis on the Lucene repo, and produced a breakdown of
> lines of code per year and the chance of a line of code still existing
> after
> 5 years (see attached images).
>
> I think the big drop in the first plot corresponds to Solr moving to its
> own
> project. Not sure about the other jumps - maybe someone else has insight
> there.
> The second plot shows that there is relatively little churn in Lucene; 45%
> of
> the code written 5 years ago is still around. For many modern projects,
> this
> curve is a steeper exponential. In Lucene, it's closer to linear. The
> article
> argues that this could point to better design and more modularity, which
> makes
> it so code isn't rewritten much.
>
> Just a fun thing to share!
>
> Stefan
>
> [1] https://erikbern.com/2016/12/05/the-half-life-of-code.html
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread David Smiley
Thanks Michael for sharing your code snippet on how to circumvent the
limit.  My reaction to this is the same as Alessandro.

I just created a PR to make the limit configurable:
https://github.com/apache/lucene/pull/12306
If there is to be a veto presented to the PR, it should include technical
reasons specific to the PR and be raised on the PR itself.

Afterwards, I leave it to others to move the limit with its configurability
to be enforced in a codec specific way.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, May 17, 2023 at 12:58 PM Mayya Sharipova
 wrote:

> Alessandro,
> Thanks for raising the code of conduct; it is very discouraging and
> intimidating to participate in discussions where such language is used
> especially by senior members.
>
> Michael S.,
> thanks for your suggestion and that's what we used in Elasticsearch to
> raise dims limit, and Alessandro, perhaps, you can use it as well in Solr
> for the time being.
>
> On Wed, May 17, 2023 at 11:03 AM Alessandro Benedetti <
> a.benede...@sease.io> wrote:
>
>> Thanks, Michael,
>> that example backs even more strongly the need of cleaning it up and
>> making the limit configurable without the need for custom field types I
>> guess (I was taking a look at the code again, and it seems the limit is
>> also checked twice:
>> in org.apache.lucene.document.KnnByteVectorField#createType and then
>> in org.apache.lucene.document.FieldType#setVectorAttributes (for both byte
>> and float variants).
>> This should help people vote, great!
>>
>> Cheers
>> --
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>>
>> On Wed, 17 May 2023 at 15:42, Michael Sokolov  wrote:
>>
>>> see https://markmail.org/message/kf4nzoqyhwacb7ri
>>>
>>> On Wed, May 17, 2023 at 10:09 AM David Smiley 
>>> wrote:
>>>
>>>> > easily be circumvented by a user
>>>>
>>>> This is a revelation to me and others, if true.  Michael, please then
>>>> point to a test or code snippet that shows the Lucene user community what
>>>> they want to see so they are unblocked from their explorations of vector
>>>> search.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Wed, May 17, 2023 at 7:51 AM Michael Sokolov 
>>>> wrote:
>>>>
>>>>> I think I've said before on this list we don't actually enforce the
>>>>> limit in any way that can't easily be circumvented by a user. The codec
>>>>> already supports any size vector - it doesn't impose any limit. The way 
>>>>> the
>>>>> API is written you can *already today* create an index with max-int sized
>>>>> vectors and we are committed to supporting that going forward by our
>>>>> backwards compatibility policy as Robert points out. This wasn't
>>>>> intentional, I think, but it is the facts.
>>>>>
>>>>> Given that, I think this whole discussion is not really necessary.
>>>>>
>>>>> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <
>>>>> a.benede...@sease.io> wrote:
>>>>>
>>>>>> Hi all,
>>>>>> we have finalized all the options proposed by the community and we
>>>>>> are ready to vote for the preferred one and then proceed with the
>>>>>> implementation.
>>>>>>
>>>>>> *Option 1*
>>>>>> Keep it as it is (dimension limit hardcoded to 1024)
>>>>>> *Motivation*:
>>>>>> We are close to improving on many fronts. Given the criticality of
>>>>>> Lucene in computing infrastructure and the concerns raised by one of the
>>>>>> most active stewards of the project, I think we should keep working 
>>>>>> toward
>>>>>> improving the fea

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread David Smiley
> easily be circumvented by a user

This is a revelation to me and others, if true.  Michael, please then point
to a test or code snippet that shows the Lucene user community what they
want to see so they are unblocked from their explorations of vector search.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, May 17, 2023 at 7:51 AM Michael Sokolov  wrote:

> I think I've said before on this list we don't actually enforce the limit
> in any way that can't easily be circumvented by a user. The codec already
> supports any size vector - it doesn't impose any limit. The way the API is
> written you can *already today* create an index with max-int sized vectors
> and we are committed to supporting that going forward by our backwards
> compatibility policy as Robert points out. This wasn't intentional, I
> think, but it is the facts.
>
> Given that, I think this whole discussion is not really necessary.
>
> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti 
> wrote:
>
>> Hi all,
>> we have finalized all the options proposed by the community and we are
>> ready to vote for the preferred one and then proceed with the
>> implementation.
>>
>> *Option 1*
>> Keep it as it is (dimension limit hardcoded to 1024)
>> *Motivation*:
>> We are close to improving on many fronts. Given the criticality of Lucene
>> in computing infrastructure and the concerns raised by one of the most
>> active stewards of the project, I think we should keep working toward
>> improving the feature as is and move to up the limit after we can
>> demonstrate improvement unambiguously.
>>
>> *Option 2*
>> make the limit configurable, for example through a system property
>> *Motivation*:
>> The system administrator can enforce a limit its users need to respect
>> that it's in line with whatever the admin decided to be acceptable for
>> them.
>> The default can stay the current one.
>> This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
>> and any sort of plugin development
>>
>> *Option 3*
>> Move the max dimension limit lower level to a HNSW specific
>> implementation. Once there, this limit would not bind any other potential
>> vector engine alternative/evolution.
>> *Motivation:* There seem to be contradictory performance interpretations
>> about the current HNSW implementation. Some consider its performance ok,
>> some not, and it depends on the target data set and use case. Increasing
>> the max dimension limit where it is currently (in top level
>> FloatVectorValues) would not allow potential alternatives (e.g. for other
>> use-cases) to be based on a lower limit.
>>
>> *Option 4*
>> Make it configurable and move it to an appropriate place.
>> In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
>> 1024) should be enough.
>> *Motivation*:
>> Both are good and not mutually exclusive and could happen in any order.
>> Someone suggested to perfect what the _default_ limit should be, but I've
>> not seen an argument _against_ configurability.  Especially in this way --
>> a toggle that doesn't bind Lucene's APIs in any way.
>>
>> I'll keep this [VOTE] open for a week and then proceed to the
>> implementation.
>> --
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>


Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread David Smiley
We agree backwards compatibility with the index should be maintained and
that checkIndex should work.  And we agree on a number of other things, but
I want to focus on configurability.
As long as the index contains the number of dimensions actually used in a
specific segment & field, why couldn't checkIndex work if the dimension
*limit* is configurable?  It's not checkindex's job to enforce the limit,
only to check that the data appears consistent / valid, irrespective of how
the number of dimensions came to be specified originally.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, May 16, 2023 at 10:58 PM Robert Muir  wrote:

> My problem is that it impacts the default codec which is supported by our
> backwards compatibility policy for many years. We can't just let the user
> determine backwards compatibility with a sysprop. how will checkindex work?
> We have to have bounds and also allow for more performant implementations
> that might have different limitations. And I'm pretty sure we want a faster
> implementation than what we have in the future, and it will probably have
> different limits.
>
> For other codecs, it is fine to have a different limit as I already said,
> as it is implementation dependent. And honestly the stuff in lucene/codecs
> can be more "Fast and loose" because it doesn't require the extensive index
> back compat guarantee.
>
> Again, penultimate concern is that index back compat guarantee. When it
> comes to limits, the proper way is not to just keep bumping them without
> technical reasons, instead the correct approach is to fix the technical
> problems and make them irrelevant. Great example here (merged this
> morning):
> https://github.com/apache/lucene/commit/f53eb28af053d7612f7e4d1b2de05d33dc410645
>
>
> On Tue, May 16, 2023 at 10:49 PM David Smiley  wrote:
>
>> Robert, I have not heard from you (or anyone) an argument against System
>> property based configurability (as I described in Option 4 via a System
>> property).  Uwe notes wisely some care must be taken to ensure it actually
>> works.  Sure, of course.  What concerns do you have with this?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Tue, May 16, 2023 at 9:50 PM Robert Muir  wrote:
>>
>>> by the way, i agree with the idea to MOVE THE LIMIT UNCHANGED to the
>>> hsnw-specific code.
>>>
>>> This way, someone can write alternative codec with vectors using some
>>> other completely different approach that incorporates a different more
>>> appropriate limit (maybe lower, maybe higher) depending upon their
>>> tradeoffs. We should encourage this as I think it is the "only true fix" to
>>> the scalability issues: use a scalable algorithm! Also, alternative codecs
>>> don't force the project into many years of index backwards compatibility,
>>> which is really my penultimate concern. We can lock ourselves into a truly
>>> bad place and become irrelevant (especially with scalar code implementing
>>> all this vector stuff, it is really senseless). In the meantime I suggest
>>> we try to reduce pain for the default codec with the current implementation
>>> if possible. If it is not possible, we need a new codec that performs.
>>>
>>> On Tue, May 16, 2023 at 8:53 PM Robert Muir  wrote:
>>>
>>>> Gus, I think i explained myself multiple times on issues and in this
>>>> thread. the performance is unacceptable, everyone knows it, but nobody is
>>>> talking about.
>>>> I don't need to explain myself time and time again here.
>>>> You don't seem to understand the technical issues (at least you sure as
>>>> fuck don't know how service loading works or you wouldnt have opened
>>>> https://github.com/apache/lucene/issues/12300 )
>>>>
>>>> I'm just the only one here completely unconstrained by any of silicon
>>>> valley's influences to speak my true mind, without any repercussions, so I
>>>> do it. Don't give any fucks about ChatGPT.
>>>>
>>>> I'm standing by my technical veto. If you bypass it, I'll revert the
>>>> offending commit.
>>>>
>>>> As far as fixing the technical performance, I just opened an issue with
>>>> some ideas to at least improve cpu usage by a factor of N. It does not help
>>>> with the crazy heap memory usage or other issues of KNN implementation
>>>> causing shit like OOM on merge. But it is one step:
>>>> https://github.com/apache/lucene/issues

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread David Smiley
Robert, I have not heard from you (or anyone) an argument against System
property based configurability (as I described in Option 4 via a System
property).  Uwe notes wisely some care must be taken to ensure it actually
works.  Sure, of course.  What concerns do you have with this?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, May 16, 2023 at 9:50 PM Robert Muir  wrote:

> by the way, i agree with the idea to MOVE THE LIMIT UNCHANGED to the
> hsnw-specific code.
>
> This way, someone can write alternative codec with vectors using some
> other completely different approach that incorporates a different more
> appropriate limit (maybe lower, maybe higher) depending upon their
> tradeoffs. We should encourage this as I think it is the "only true fix" to
> the scalability issues: use a scalable algorithm! Also, alternative codecs
> don't force the project into many years of index backwards compatibility,
> which is really my penultimate concern. We can lock ourselves into a truly
> bad place and become irrelevant (especially with scalar code implementing
> all this vector stuff, it is really senseless). In the meantime I suggest
> we try to reduce pain for the default codec with the current implementation
> if possible. If it is not possible, we need a new codec that performs.
>
> On Tue, May 16, 2023 at 8:53 PM Robert Muir  wrote:
>
>> Gus, I think i explained myself multiple times on issues and in this
>> thread. the performance is unacceptable, everyone knows it, but nobody is
>> talking about.
>> I don't need to explain myself time and time again here.
>> You don't seem to understand the technical issues (at least you sure as
>> fuck don't know how service loading works or you wouldnt have opened
>> https://github.com/apache/lucene/issues/12300 )
>>
>> I'm just the only one here completely unconstrained by any of silicon
>> valley's influences to speak my true mind, without any repercussions, so I
>> do it. Don't give any fucks about ChatGPT.
>>
>> I'm standing by my technical veto. If you bypass it, I'll revert the
>> offending commit.
>>
>> As far as fixing the technical performance, I just opened an issue with
>> some ideas to at least improve cpu usage by a factor of N. It does not help
>> with the crazy heap memory usage or other issues of KNN implementation
>> causing shit like OOM on merge. But it is one step:
>> https://github.com/apache/lucene/issues/12302
>>
>>
>>
>> On Tue, May 16, 2023 at 7:45 AM Gus Heck  wrote:
>>
>>> Robert,
>>>
>>> Can you explain in clear technical terms the standard that must be met
>>> for performance? A benchmark that must run in X time on Y hardware for
>>> example (and why that test is suitable)? Or some other reproducible
>>> criteria? So far I've heard you give an *opinion* that it's unusable, but
>>> that's not a technical criteria, others may have a different concept of
>>> what is usable to them.
>>>
>>> Forgive me if I misunderstand, but the essence of your argument has
>>> seemed to be
>>>
>>> "Performance isn't good enough, therefore we should force anyone who
>>> wants to experiment with something bigger to fork the code base to do it"
>>>
>>> Thus, it is necessary to have a clear unambiguous standard that anyone
>>> can verify for "good enough". A clear standard would also focus efforts at
>>> improvement.
>>>
>>> Where are the goal posts?
>>>
>>> FWIW I'm +1 on any of 2-4 since I believe the existence of a hard limit
>>> is fundamentally counterproductive in an open source setting, as it will
>>> lead to *fewer people* pushing the limits. Extremely few people are
>>> going to get into the nitty-gritty of optimizing things unless they are
>>> staring at code that they can prove does something interesting, but doesn't
>>> run fast enough for their purposes. If people hit a hard limit, more of
>>> them give up and never develop the code that will motivate them to look for
>>> optimizations.
>>>
>>> -Gus
>>>
>>> On Tue, May 16, 2023 at 6:04 AM Robert Muir  wrote:
>>>
>>>> i still feel -1 (veto) on increasing this limit. sending more emails
>>>> does not change the technical facts or make the veto go away.
>>>>
>>>> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <
>>>> a.benede...@sease.io> wrote:
>>>>
>>>>> Hi all,
>>>>> we have finalized all the options proposed by

Re: Dimensions Limit for KNN vectors - Next Steps

2023-05-12 Thread David Smiley
Both what Allesandro said and what Bruno said: make it configurable and
move it.  Both are good and not mutually exclusive and could happen in any
order.

Marcus, are you against configurability?  In particular, I propose a
simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024). Your response
suggests trying to somehow perfect what the _default_ limit should be, but
I've not seen an argument _against_ configurability.  Especially in this
way -- a toggle that doesn't bind Lucene's APIs in any way.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, May 11, 2023 at 7:59 AM Uwe Schindler  wrote:

> That's actually a good idea.
>
> +1
> Am 10.05.2023 um 09:22 schrieb Bruno Roustant:
>
> *Proposed option:* Move the max dimension limit lower level to a HNSW
> specific implementation. Once there, this limit would not bind any other
> potential vector engine alternative/evolution.
>
> *Motivation:* There seem to be contradictory performance interpretations
> about the current HNSW implementation. Some consider its performance ok,
> some not, and it depends on the target data set and use-case. Increasing
> the max dimension limit where it is currently (in top level
> FloatVectorValues) would not allow potential alternatives (e.g. for other
> use-cases) to be based on a lower limit.
>
> Bruno
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>


Re: Lucene PMC Chair Greg Miller

2023-03-06 Thread David Smiley
Thank you Bruno!  Glad to see Greg take the chair!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 6, 2023 at 12:15 PM Bruno Roustant  wrote:

> Hello Lucene developers,
>
> Lucene Program Management Committee has elected a new chair, Greg Miller,
> and the Board has approved.
>
> Greg, thank you for stepping up, and congratulations!
>
>
> - Bruno
>


Re: [DISCUSS:] Reproducible Builds

2023-01-21 Thread David Smiley
The goals / purpose of "Reproducible Builds" makes sense to me.

However I wish the output that is the subject of reproducibility could be
the JAR *exclusive* of its MANIFEST.MF.  There is some interesting metadata
in there -- not essential but a shame to throw away in the name of
reproducibility.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Jan 21, 2023 at 4:27 PM Gus Heck  wrote:

> Some discussion on https://github.com/apache/lucene/pull/12096 lead to
> the question of whether or not reproducible builds (
> https://reproducible-builds.org/) are something we would like to work
> towards. I'm a fan, though unlikely to have time to work on it soon.
>
> What I can do is monitor this thread and if the consensus seems to be
> there, make a ticket that a volunteer can work on in the future (or maybe
> me in the far future, likely after I have some more direct experience from
> implementing it for Uno-Jar and JesterJ).
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


Re: Trying to understand cross-collection-join routing/hashing choices and behavior

2022-12-05 Thread David Smiley
Hello Zachariah,

You have sent this to the wrong list.  This is the Lucene dev list.  Your
message should go to d...@solr.apache.org
The Lucene & Solr projects have split; not long ago, this would have been
the right list.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Dec 1, 2022 at 12:56 AM Zachariah Kendall <
zachariahkend...@gmail.com> wrote:

> I'm trying to understand the cross-collection JOIN
> <https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#cross-collection-join>
>  documentation,
> behavior, choices, and viability.
>
> *# Terminology language choice*
>
> """routerField - If the documents are routed to shards using the
> CompositeID router by the join field, then that field name should be
> specified in the configuration here. This will allow the parser to optimize
> the resulting HashRange query."""
>
> """routed - If true, the cross collection join query will use each shard’s
> hash range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but
> it depends on the local collection being routed by the to field. If this
> parameter is not specified, the cross collection join query will try to
> determine the correct value automatically."""
>
> *Question 1*: Why overload terminology like "route" when these parameters
> do NOT route AFAICT. Based on my reading of the code all they do is add a
> hash_range fq parameter to the remote join query request. Filtering results
> is not routing, so this fosters confusion. Is there reasoning behind this
> or just happenstance?
>
> *# Implied vs Actual behavior*
>
> My reading of the code base is this: the hash_range parameter is always
> populated with the "fromField" value. The routerField is only used to check
> against the "toField" for equality to enable the hash_range parameter
> usage, this is only done as a fall back if "routed" is not set.
>
> It's a little strange to me that "routerField" is not used as a router
> field, or even as a hash field. It is only used as a flag for "if a query
> is joining to this field then use hash_range filter on the fromField" (or
> at least that's how I read the code).
>
> *Question 2:* Is my reading of the code correct? Can we try to update the
> documentation to be more explicit about this?
>
>
> *# Routing *
>
> *Question 3:* Is there a reason why actual routing was not used? I'm not
> familiar with the Solr code base, but it seems like it'd be nicer to
> instead use existing routing behavior in this context instead of querying
> all and filtering results. This seems like it would need 2 things: First,
> the _route_ value from the current "local" request, and second, either the
> local client (like how solrj does) or the remote "/export" handler would
> need to recognize and handle this parameter. Is that obviously doable or
> not doable? Trying to understand why that approach wasn't taken originally.
>
>
> *# Hashing*
>
> Here is the behavior touted in the docs for HashRangeQueryParser
> <https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#hash-range-query-parser>
> .
> """In the cross collection join case, the hash range query parser is used
> to ensure that each shard only gets the set of join keys that would end up
> on that shard. This query parser uses the MurmurHash3_x86_32. This is the
> same as the default hashing for the default composite ID router in Solr."""
>
> The documentation mentions "CompositeID router", which we know is based on
> prefixes (split on "!") being hashed and routed with the first/top 16 bits
> of info (with the later 16 bits provided by the rest of the doc "id" on
> inserts).
>
> The CrossCollectionJoinQuery uses 16 bits from the current/local shard
> range, which seems fine and good. However, the HashRangeQuery appears to hash
> the entire field
> <https://github.com/apache/solr/blob/26195c82493422cb9d6d4bdf9d4452046e7b3f67/solr/core/src/java/org/apache/solr/search/join/HashRangeQuery.java#L116-L117>.
> So I'm struggling to understand how this would work, especially since the
> join field and the "route" field are sourced from the same value. Either
> the join field is a compositeId in which case the HashRangeQuery code
> appears to be invalid, as it would not hash "A!B" the same as the actual
> router would hash "A", or the join field is not a compositeId in which case
> for it to work it would have to be the exa

Re: Welcome Luca Cavanna as Lucene committer

2022-10-05 Thread David Smiley
Welcome Luca!

On Wed, Oct 5, 2022 at 12:04 PM Adrien Grand  wrote:

> I'm pleased to announce that Luca Cavanna has accepted the PMC's
> invitation to become a committer.
>
> Luca, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
>
> --
> Adrien
>
-- 
Sent from Gmail Mobile


Re: IMPORTANT: Please update your gradle.properties file in your Lucene checkout!

2022-09-27 Thread David Smiley
> If you do not want Gradle to auto-provision the Java 19 for compilation
of those Preview classes, pass environment variable
JAVA19_HOME=/path/to/jdk19 to your build!

That seems inverted; maybe I misunderstand?  If say we're working locally
without Java 19 and don't want to bother it during dev, we should still
have an env variable pointing to it?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Sep 26, 2022 at 9:57 AM Uwe Schindler  wrote:

> Hi,
>
> with deleting the file, I meant the "gradle.properties" in the lucene
> checkout.
>
> Uwe
> Am 26.09.2022 um 15:44 schrieb Uwe Schindler:
>
> Hey,
>
> after merge of Java 19 support to main, 9.x and to-be-released 9.4, there
> is a small change needed in your gradle.properties file. In earlier version
> we disabled auto-provisioning of JDK releases for compilation, but now it
> is required.
>
> If your build hangs at :lucene:core:compileMain19Java saying that theres
> no release of Java 19 available, please change your gradle.properties in
> your home folder to enable this feature:
>
> org.gradle.java.installations.auto-download=true
>
> If you delete the file and let the build system regenerate it, all will
> work out of box. So you have the choice: Delete the file to regenerate
> defaults or modify above property!
>
> Please also not that depending on your build system, the classes in
> lucene/core/src/java19 may not compile (e.g. in Eclipse). I will work on
> this in the following weeks. For now just ignore the compilation unit or
> delete it from your IDE config. I may do something automatically using our
> IDE autoconfiguration.
>
> If you do not want Gradle to auto-provision the Java 19 for compilation of
> those Preview classes, pass environment variable JAVA19_HOME=/path/to/jdk19
> to your build!
>
> To actually test the new code: Build the Lucene JAR and run the test suite
> with RUNTIME_JAVA_HOME=/path/to/jdk19; alternatively compile your
> application and pass "--enable-preview" to the Java command line!
>
> Thanks,
>
> Uwe
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>


Re: Welcome Vigya Sharma as Lucene committer

2022-07-29 Thread David Smiley
Congrats Vigya!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Jul 28, 2022 at 3:34 AM Adrien Grand  wrote:

> I'm pleased to announce that Vigya Sharma has accepted the PMC's
> invitation to become a committer.
>
> Vigya, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>


Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-27 Thread David Smiley
FYI I had filed https://issues.apache.org/jira/browse/INFRA-23503

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jul 26, 2022 at 3:54 PM Michael Sokolov  wrote:

> searching JIRA for "slkjfdf" I found a few issues in other projects,
> but none seems to be getting the same degree of spam love
>
> On Tue, Jul 26, 2022 at 3:50 PM Mike Sokolov (Jira) 
> wrote:
> >
> >
> > [
> https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571588#comment-17571588
> ]
> >
> > Mike Sokolov commented on LUCENE-10054:
> > ---
> >
> > what is it with this issue that spammers love so much!? I wonder if we
> > could somehow lock it as read-only ...
> >
> >
> >
> > > Handle hierarchy in HNSW graph
> > > --
> > >
> > > Key: LUCENE-10054
> > > URL:
> https://issues.apache.org/jira/browse/LUCENE-10054
> > > Project: Lucene - Core
> > >  Issue Type: Task
> > >Reporter: Mayya Sharipova
> > >Priority: Major
> > >  Labels: vector-based-search
> > > Fix For: 9.1
> > >
> > >  Time Spent: 20h 20m
> > >  Remaining Estimate: 0h
> > >
> > > Currently HNSW graph is represented as a single layer graph.
> > >  We would like to extend it to handle hierarchy as per [discussion|
> https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216
> ].
> > >
> > > TODO tasks:
> > > - add multiple layers in the HnswGraph class
> > >  - modify the format in  Lucene90HnswVectorsWriter and
> Lucene90HnswVectorsReader to handle multiple layers
> > > - modify graph construction and search algorithm to handle hierarchy
> > >  - run benchmarks
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.20.10#820010)
> >
> > -
> > To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: issues-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Read-only Jira after the GitHub issues migration?

2022-07-18 Thread David Smiley
I suppose someone bent on not using GitHub could also email the patch to
the dev list, starting a thread around it.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jul 17, 2022 at 9:14 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi Team,
>
> Thanks to Tomoko's amazing hard work (
> https://github.com/apache/lucene-jira-archive), we are getting close to
> having strong tooling and a solid plan to migrate all past Jira issues to
> GItHub issues!
>
> But one contentious point is whether to leave Jira read-only or read-write
> after the migration.  So let's DISCUSS and maybe VOTE to reach concensus?
>
> My opinion: I think it'd be crazy to leave Jira read/write.  We would
> effectively have two issue trackers.  New users who find Jira through
> Google, or through links we have in old blog posts, etc., might
> accidentally open new Jira issues or comment on old ones and we may not
> even notice.  I think that would harm our community.
>
> I would prefer that we make a nearly atomic switch -- up until time X we
> use Jira, then it goes read-only and at time X + t (t being how long the
> migration takes, likely a day or two?), GitHub issues opens for business.
> This way we clarly have only one issue tracker at (nearly) all times.  This
> would make a clean migration, and reduce risk of trapping users.
>
> Other opinions?
>
> Thanks,
>
> Mike
> --
> Mike McCandless
>
> http://blog.mikemccandless.com
>


Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-15 Thread David Smiley
I'm not a fan of the automated copying of any issues into GitHub, which
will create a divergence / duplicity of an issue's identity.  It will only
be a relatively temporary annoyance to have two systems to "work" on an
issue.  Eventually, JIRA will only be historical; let's say Lucene 11.  At
that point if there's an older issue of resumed interest, which would be
getting increasingly rare, someone could manually copy the original
description and title into GitHub plus a historical reference back.  Again
-- rare by then.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jun 15, 2022 at 4:18 PM Tomoko Uchida 
wrote:

> It looks like we talked about two or three things at the same time -
> and I'm afraid the discussion will quickly turn into a disordered
> state and I won't be able to track it.
>
> Let me decide one thing: Let's NOT try to move histories to GitHub.
> Closed issues will remain in Jira forever and we can refer to them
> anytime from anywhere. I think I said that before several times.
>
> I would like to focus on the future here - can we make a decision on
> how to handle active (unresolved) issues and issues that will be
> opened in the future.
>
> Thank you,
> Tomoko
>
> 2022年6月16日(木) 4:18 Dawid Weiss :
>
> >
> >
> >> Totally agree. The history of closed issues answer “when did this
> change and why?”. Migrate them all. Computers can do that. It avoids asking
> humans to think about where stuff is.
> >
> >
> > We do have different views of that. To me, the history is preserved
> perfectly well in Jira, it's not being phased out. Moving to github as the
> issue tracking system is fine but different to me than code transitions
> (cvs->svn->git). With code, you do have an existing state and history you
> build from. With issue tickets - not so much. And even if you want to
> create a ticket in the new system, you can easily link to the previous one.
> It's the "web" of hyperlinks, right?
> >
> > I'm a bit afraid that moving hundreds of jira issues to github will have
> the reverse effect - duplicate the same information but with quality
> degraded, for example automatic links that work in Jira will no longer work
> or point at the ported github issues ("this is related to LUCENE-xyz or
> SOLR-abc, blah, blah blah.")?
> >
> > I don't want to stand in the way of progress but we've gone through a
> similar transition at our company and I never had a problem using both
> systems at the same time; jira just gradually atrophied into a read-only
> state once issues in there got stale or resolved.
> >
> > Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Welcome Greg Miller to the Lucene PMC

2022-06-07 Thread David Smiley
Welcome Greg!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jun 7, 2022 at 2:44 AM Adrien Grand  wrote:

> I'm pleased to announce that Greg Miller has accepted an invitation to
> join the Lucene PMC!
>
> Congratulations Greg, and welcome aboard!
>
> --
> Adrien
>


Re: Bugfix release Lucene/Solr 8.11.2

2022-06-06 Thread David Smiley
I merged SOLR-16227 to main, 9, 8.11 some minutes ago.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jun 3, 2022 at 1:59 PM Mike Drob  wrote:

> Yes, please fix and backport SOLR-16227, it looks almost ready from the
> conversation on the PR. I will plan to do the first RC on Monday if we can
> get the backport completed today.
>
> On Thu, Jun 2, 2022 at 9:18 AM kiran chitturi 
> wrote:
>
>> Hi Mike,
>>
>> I found a new issue in Solr SQL (SOLR-16227) and have a fix for it (
>> https://github.com/apache/solr/pull/887). Can you wait on the release
>> for 8.11.2 till I can backport this?
>>
>> Thank you,
>> Kiran.
>>
>> On Tue, May 31, 2022 at 9:21 AM Mike Drob  wrote:
>>
>>> Howdy folks, now that Lucene 9.2 has wrapped and we're past the holiday
>>> weekend in the United States, I'd like to take a look at getting this
>>> rolling by the end of the week. I see an open PR for
>>> backporting LUCENE-10236 but it doesn't look like anything else would
>>> really be waiting at the moment.
>>>
>>> I will plan to build a release candidate on Thursday (sooner if
>>> LUCENE-10236 is committed, later if somebody else shouts that they have
>>> other issues).
>>>
>>> Thanks!
>>>
>>> On Tue, May 24, 2022 at 3:48 PM Jan Høydahl 
>>> wrote:
>>>
>>>> I bumped Jackson in https://issues.apache.org/jira/browse/SOLR-16213 and
>>>> also backported to 8_11. Wdyt?
>>>>
>>>> Jan
>>>>
>>>> 18. mai 2022 kl. 15:22 skrev Gus Heck :
>>>>
>>>> SOLR-16194 is in and ported to 8.11,.2
>>>>
>>>> On Wed, May 18, 2022 at 7:12 AM Jan Høydahl 
>>>> wrote:
>>>>
>>>>> I was pinged on https://issues.apache.org/jira/browse/SOLR-16019 because
>>>>> I have an in-flight PR with a backport. I'll complete and merge that PR.
>>>>>
>>>>> Jan
>>>>>
>>>>>
>>>>> 13. mai 2022 kl. 01:03 skrev Mike Drob :
>>>>>
>>>>> To: dev@lucene, dev@solr
>>>>>
>>>>> NOTICE:
>>>>>
>>>>> I am planning on preparing a bugfix release from branch branch_8_11
>>>>> (likely mid next week)
>>>>>
>>>>> Please observe the normal rules for committing to this branch:
>>>>>
>>>>> * Before committing to the branch, reply to this thread and argue
>>>>>   why the fix needs backporting and how long it will take.
>>>>> ** If you're backporting stuff this week still or over the weekend,
>>>>> then skip
>>>>> the bit about how long it will take.
>>>>> * All issues accepted for backporting should be marked with 8.11.2
>>>>>   in JIRA, and issues that should delay the release must be marked as
>>>>> Blocker
>>>>> * All patches that are intended for the branch should first be
>>>>> committed
>>>>>   to the unstable branch, merged into the stable branch, and then into
>>>>>   the current release branch.
>>>>> * Only Jira issues with Fix version 8.11.2 and priority "Blocker" will
>>>>> delay
>>>>>   a release candidate build.
>>>>>
>>>>> Also, please observe that since 9.0 already exists, there cannot be
>>>>> any index format breaking changes. It really should only be bug fixes that
>>>>> have already been verified on the 9x branch.
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>>
>>>>


Re: Welcome Lu Xugang as Lucene committer

2022-06-01 Thread David Smiley
Welcome Lu!


Re: Welcome Chris Hegarty as Lucene committer

2022-06-01 Thread David Smiley
Welcome Chris!


Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-05-30 Thread David Smiley
+1 Approve (PMC)

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, May 30, 2022 at 11:40 AM Tomoko Uchida 
wrote:

> Hi everyone!
>
> As we had previous discussion thread [1], I propose migration to GitHub
> issue from Jira.
> It'd be technically possible (see [2] for details) and I think it'd be
> good for the project - not only for welcoming new developers who are not
> familiar with Jira, but also for improving the experiences of long-term
> committers/contributors by consolidating the conversation platform.
>
> You can see a short summary of the discussion, some stats on current Jira
> issues, and a draft migration plan in [2].
> Please review [2] if you haven't seen it and vote for this proposal.
>
> The vote will be open until 2022-06-06 16:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
> *IMPORTANT NOTE*
> I set a local protocol for this vote.
> There are 95 committers on this project [3] - the vote will be effective
> if it successfully gains more than 15% of voters (>= 15) from committers
> (including PMC members). This means, that although only PMC member votes
> are counted for the final result, the votes from all committers are
> important to make the vote result effective.
>
> If there are less than 15 votes at 2022-06-06 16:00 UTC, I will expand the
> term to 2022-06-13 16:00 UTC. If this fails to get sufficient voters after
> the expanded time limit, I'll cancel this vote regardless of the result.
> But why do I set such an extra bar? My fear is that if such things are
> decided by the opinions of a few members, the result shouldn't yield a good
> outcome for the future. It isn't my goal to just pass the vote [4].
>
> [1] https://lists.apache.org/thread/78wj0vll73sct065m5jjm4z8gqb5yffk
> [2] https://issues.apache.org/jira/browse/LUCENE-10557
> [3] https://projects.apache.org/committee.html?lucene
> [4] I'm sorry for being overly cautious, but I have never met in person or
> virtually any of the committers (with a very few exceptions), therefore
> cannot assess if the vote result is reliable or not unless there is certain
> explicit feedback.
>
> Tomoko
>


Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-05 Thread David Smiley
Is anyone familiar with using GitHub (or maybe GitLab I suppose) for
tracking metadata about the issue -- something JIRA excels at?  For example
the version of our project that a given PR is released in -- aka the JIRA
"Fix Version"?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, May 4, 2022 at 10:24 PM Tomoko Uchida 
wrote:

> Hello everyone!
>
> Recently, we relaxed the requirement for creating a Jira issue when
> opening a pull request (LUCENE-10545
> <https://issues.apache.org/jira/browse/LUCENE-10545>).
>
> As the next and bigger (perhaps ambitious) step, I opened a rough proposal
> for migration to GitHub issue from Jira.
> https://issues.apache.org/jira/browse/LUCENE-10557
>
> According to the INFRA issue for the RocketMQ project (Michael McCandless
> gave the pointer in a comment on the issue; thanks!), a PMC agreement or
> Vote result is needed for the decision.
> https://issues.apache.org/jira/browse/INFRA-15702
>
> Eventually, we'd need a formal vote, but before that, I'd like to hear
> general opinions/thoughts (or feelings) on this topic from developers.
>
> In brief, I think it'd be technically possible and also be good for the
> project - not only for welcoming new developers who are not familiar with
> Jira, but also for improving the experiences of long-term contributors by
> consolidating the conversation platform.
> It'll need some administrative work though, I'm willing to work for it if
> we reach an agreement.
>
> Please note that:
> * This is not a VOTE. Simple vote-style feedback (+/- 1) is welcome, but
> we don't aim to reach a conclusion in this thread.
> * Let's not discuss "how to migrate existing Jira issues" for now. Once we
> decide the migration will be good for us, then we can try to figure out a
> reasonable solution for technical/administrative matters.
>
> I may be too optimistic about it; but - a bit of stupidness will be needed
> to start such a move, and I'm serious about this proposal :)
>
> Thanks and regards,
> Tomoko
>


Re: Lucene PMC Chair Bruno Roustant

2022-03-23 Thread David Smiley
Thanks for your service Michael and congrats Bruno!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Mar 23, 2022 at 9:03 AM Michael Sokolov  wrote:

> Hello, Lucene developers. Lucene Program Management Committee has
> elected a new chair, Bruno Roustant, and the Board has approved.
> Bruno, thank you for stepping up, and congratulations!
>
> -Mike
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Welcome Guo Feng as Lucene committer

2022-02-06 Thread David Smiley
Congrats and welcome!

On Tue, Jan 25, 2022 at 4:09 AM Adrien Grand  wrote:

> I'm pleased to announce that Guo Feng has accepted the PMC's
> invitation to become a committer.
>
> Feng, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
Sent from Gmail Mobile


Re: Mirroring the later 8.x release tags in the "new" split repositories

2022-01-04 Thread David Smiley
+1 to Houston's proposal.  Given all the release tags seen here:
https://github.com/apache/solr/tags it makes sense that it would include
the tag for 8.11 and the others we're missing.  I think this is a really
easy decision as it's weird/inconsistent that these particular versions are
omitted yet the many older ones exist.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jan 4, 2022 at 4:01 PM Houston Putman  wrote:

> Dawid,
>
> I did mean that we should be pushing the tags as well as their associated
> commits. I was unaware that you could push the tags without the commits,
> sorry if I caused confusion there.
>
> Jan,
>
> Looking in the diff between the "history/branches/lucene-solr/branch_8x"
> tag in apache/solr and the current "branch_8_11" in apache/lucene-solr,
> there is around 12 MB of commits to add. This is a rough estimate, but it
> should be close enough.
>
> The best approximation I have of the apache solr repository is that it's
> size is around 400 MB. So adding these tags/refs would cause a 3% increase
> in the size of the repo. The lucene repo is a little larger currently, but
> the new tag sizes should be identical.
>
> - Houston
>
> On Tue, Jan 4, 2022 at 3:36 PM Jan Høydahl  wrote:
>
>> We have edit history ever since the earliest svn commits, we just lack a
>> years worth of commits from the latest 8.x versions, so from a traceability
>> view it makes sense, instead of having to look in two repos. Do you know
>> how much weight it will add to a full clone?
>>
>> Jan Høydahl
>>
>> > 4. jan. 2022 kl. 21:01 skrev Dawid Weiss :
>> >
>> > 
>> >>
>> >> You can push a tag to a repo that doesn't already have that commit (or
>> history of commits)
>> > in an existing branch, without issue.
>> >
>> > But why do it? These are refs - if they point to non-existing commits
>> > then I honestly don't see any value in having them. It would
>> > confuse the hell out of me.
>> >
>> >> They are separate projects, but with a shared history. I'd like to be
>> able to go to the apache/solr github
>> > and be able to go through the history of a file in different release
>> > versions, even if that specific release happened
>> > under apache/lucene-solr.
>> >
>> > This is a different requirement, actually. If Solr (or Lucene) would
>> > like to keep such a history then I think it should just fetch those
>> > release refs and all the commits leading to them. Since these projects
>> > share a common root, there is nothing to prevent this from happening.
>> > Then tags point at actual revisions and everything makes sense.
>> >
>> > This does not change the fact that I don't really see much value in
>> > doing all this.
>> >
>> > Dawid
>> >
>> >> On Tue, Jan 4, 2022 at 8:30 PM Houston Putman 
>> wrote:
>> >>
>> >> They don't have those commits, but they also don't have the commits
>> for the
>> >> previous release tags in the repo. You can go to any of the release
>> tags, choose
>> >> a commit to view and you will get a message saying:
>> >>
>> >>>
>> >>> This commit does not belong to any branch on this repository,
>> >>> and may belong to a fork outside of the repository.
>> >>
>> >>
>> >> You can push a tag to a repo that doesn't already have that commit (or
>> history of commits)
>> >> in an existing branch, without issue.
>> >>
>> >> They are separate projects, but with a shared history. I'd like to be
>> able to go to the apache/solr github
>> >> and be able to go through the history of a file in different release
>> versions, even if that specific release happened
>> >> under apache/lucene-solr.
>> >>
>> >> - Houston
>> >>
>> >>> On Tue, Jan 4, 2022 at 2:02 PM Dawid Weiss 
>> wrote:
>> >>>
>> >>>> As mentioned in SOLR-15874, we are not hosting the tags for the
>> latest 8.x releases in the split apache/solr and apache/lucene
>> repositories. All release tags made prior to the repository split exist in
>> the new repos, so I see no reason that the newer 8.x tags cannot exist in
>> the new repos as well.
>> >>>
>> >>> I'm not sure I understand - to create a tag you'd need that particular
>> >>> commit - the "new" repositories for each project don't have those
>> >>> commits (and arguably shouldn't have since they're, well, separate
>> >>> projects now).
>> >>>
>> >>> Dawid
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>> >>> For additional commands, e-mail: dev-h...@solr.apache.org
>> >>>
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>> > For additional commands, e-mail: dev-h...@solr.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>> For additional commands, e-mail: dev-h...@solr.apache.org
>>
>>


Re: [Heads up] Test framework package rename

2021-12-21 Thread David Smiley
I wouldn't have thought to do such a change in a minor release, but I
suppose for tests it's fine.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Dec 21, 2021 at 9:10 AM Uwe Schindler  wrote:

> Hi,
> TortoiseGit's Viewer for diffs also helps well. But I just quicky skimmed
> through it and only stopped at classes where different changes to imports
> were done.
>
> Thanks,
> Uwe
>
> Am 21. Dezember 2021 14:01:15 UTC schrieb Robert Muir :
>>
>> On Tue, Dec 21, 2021 at 8:20 AM Mark Jens  wrote:
>>
>>>
>>>  You can use 
>>> https://patch-diff.githubusercontent.com/raw/apache/lucene/pull/551.diff to 
>>> render plain text and keep the browser responsive.
>>>  Another option is 
>>> https://patch-diff.githubusercontent.com/raw/apache/lucene/pull/551.patch 
>>> to see each commit separately.
>>>
>>>
>> Thank you Mark.
>>
>> This does work easily for a large diff, and I get colored diff if I
>> load it in vim (syntax on/set background=dark).
>> But personally, especially for such a large diff, I get a much better
>> color scheme for reviewing via 'git diff'.
>> Seems I need to tweak the vim colors better for "downloaded diff" to
>> be more useful...
>>
>> In any case, fetching Dawid's remote branch and testing it out myself
>> was worth the trouble. It found a bug
>> (https://issues.apache.org/jira/browse/LUCENE-10331)
>> --
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>


Re: Log4j < 2.15.0 may still be vulnerable even if -Dlog4j2.formatMsgNoLookups=true is set

2021-12-19 Thread David Smiley
I like the idea of using our Wiki more as you describe.Not so much
*new* news entries because I think search-ability of these CVEs is fine to
an existing entry.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Dec 18, 2021 at 4:39 PM Gus Heck  wrote:

> Thinking about it some more, maybe the problem with my suggestion is
> the table on that page is organized by the library version and, if
> unmitigated, the version of the library is still a problem. Maybe another
> way to be clearer about it and avoid rewriting things that people have
> already read would be to add independent entries to the security news page
> for the newer CVE's
>
> On Sat, Dec 18, 2021 at 12:20 PM Gus Heck  wrote:
>
>> I think perhaps in the shock of such a deep and surprising vulnerability
>> with such high visibility, we've begun to break with how we normally handle
>> CVE's that don't apply to our usage of the library. Previously, they just
>> got added to the list of known false positives
>> <https://cwiki.apache.org/confluence/display/SOLR/SolrSecurity#SolrSecurity-SolrandVulnerabilityScanningTools>.
>> Normally we wouldn't even mention them on the security news page, but
>> because of the high visibility we should simply have a line mentioning that
>> these two CVE's are on our false positives page and explain details there.
>> The wiki would provide revision history automatically.
>>
>> On Sat, Dec 18, 2021 at 11:25 AM Jan Høydahl 
>> wrote:
>>
>>> We make edits to the log4j advisory almost daily, see
>>> https://github.com/apache/solr-site/commits/e10a6a9fe0eed8dcba3ad1a076c8208e014e76ff/content/solr/security/2021-12-10-cve-2021-44228.md
>>> I wonder if we should include a "Revision history" paragraph in the
>>> advisory for transparency?
>>>
>>> Jan
>>>
>>> 15. des. 2021 kl. 19:09 skrev Uwe Schindler :
>>>
>>> Hi all, I prepared a PR about the followup CVE-2021-45046:
>>> https://github.com/apache/solr-site/pull/59
>>>
>>> Please verify and make suggestion. I will merge this into
>>> main/production later.
>>>
>>> Uwe
>>>
>>> -
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>>
>>> *From:* Uwe Schindler 
>>> *Sent:* Wednesday, December 15, 2021 3:31 PM
>>> *To:* 'dev@lucene.apache.org' 
>>> *Subject:* RE: Log4j < 2.15.0 may still be vulnerable even if
>>> -Dlog4j2.formatMsgNoLookups=true is set
>>>
>>> We should add this to the webpage. Another one asked on the security
>>> mailing list.
>>>
>>> Uwe
>>>
>>> -
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>>
>>> *From:* Gus Heck 
>>> *Sent:* Wednesday, December 15, 2021 12:39 AM
>>> *To:* dev 
>>> *Subject:* Re: Log4j < 2.15.0 may still be vulnerable even if
>>> -Dlog4j2.formatMsgNoLookups=true is set
>>>
>>> Perhaps we could tweak it to say that the system property fix is
>>> sufficient *for Solr* (i.e. not imply that it is a valid work around for
>>> all cases)
>>>
>>> On Tue, Dec 14, 2021 at 6:20 PM Uwe Schindler  wrote:
>>>
>>> The other attack vectors are also not possible with Solr:
>>>
>>> - Logger.printf("%s", userInput) is not used
>>> - custom message factory is not used
>>>
>>> Uwe
>>> Am 14. Dezember 2021 22:59:26 UTC schrieb Uwe Schindler >> >:
>>>
>>> It is still a valid mitigation.
>>>
>>> Mike Drobban I explained it. MDC is the other attack vector and that's
>>> not an issue with Solr.
>>>
>>> Please accept this, just because the documentation of log4j changes,
>>> there's no additional risk. We may update the mitigation to mention that in
>>> Solr's case the system property is fine.
>>>
>>> Uwe
>>> Am 14. Dezember 2021 22:52:29 UTC schrieb solr :
>>>
>>> Ok.
>>>
>>> But FTR - apache/log4j has discredited just setting the system property as 
>>> a mitigation measure, so I still think the SOLR security-page should be 
>>> changed to not list this as a valid mitigation:
>>>
>>> https://logging.apache.org/log4j/2.x/security.html
>>> "Older (discredited) mitigation measures
>>>
>>> This page previously menti

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread David Smiley
Congratulations Haoyu!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Dec 19, 2021 at 4:12 AM Dawid Weiss  wrote:

> Hello everyone!
>
> Please welcome Haoyu Zhai as the latest Lucene committer. You may also
> know Haoyu as Patrick - this is perhaps his kind gesture to those of
> us whose tongues are less flexible in pronouncing difficult first
> names. :)
>
> It's a tradition to briefly introduce yourself to the group, Patrick.
> Welcome and thank you!
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [VOTE] Release Lucene/Solr 8.11.1 RC1

2021-12-15 Thread David Smiley
+1

SUCCESS! [1:27:33.004868]


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Dec 15, 2021 at 11:17 AM Jan Høydahl  wrote:

> I think ASF allows exception to the 72h voting rule for urgent fixes. The
> current vote result is 7 "+1" and no "-1". So if we figure out how to
> trigger that exception we could push it e.g. tomorrow instad of Friday?
>
> Jan
>
> > 15. des. 2021 kl. 15:29 skrev Uwe Schindler :
> >
> > Hi,
> >
> > Policeman Jenkins tested the relaese with Smoketester:
> >
> > SUCCESS! [1:28:23.237262]
> > Finished: SUCCESS
> >
> >
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Release-Tester/38/console
> >
> > I did not do futher checks, I just want to get the release out soon!
> Thanks
> > to Jan to do the release so fast.
> >
> > In the release notes of Lucene we should just mention that log4j was
> updated
> > (Luke and possibly Replicator). A changes entry was forgotten, but that's
> > not urgent.
> >
> > So here's my +1
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >> -Original Message-
> >> From: Jan Høydahl 
> >> Sent: Tuesday, December 14, 2021 3:36 PM
> >> To: Lucene Dev 
> >> Subject: [VOTE] Release Lucene/Solr 8.11.1 RC1
> >>
> >> Please vote for release candidate 1 for Lucene/Solr 8.11.1
> >>
> >> The artifacts can be downloaded from:
> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.1-RC1-
> >> rev0b002b11819df70783e83ef36b42ed1223c14b50
> >>
> >> You can run the smoke tester directly (from a fresh branch_8_11
> checkout),
> >> with this command:
> >>
> >> python3 -u dev-tools/scripts/smokeTestRelease.py \
> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.1-RC1-
> >> rev0b002b11819df70783e83ef36b42ed1223c14b50
> >>
> >> The vote will be open for at least 72 hours i.e. until 2021-12-17 15:00
> > UTC.
> >>
> >> [ ] +1  approve
> >> [ ] +0  no opinion
> >> [ ] -1  disapprove (and reason why)
> >>
> >> Here is my +1
> >>
> >> SUCCESS! [0:54:56.979538]
> >>
> >> NOTE: You must run the smoke tester from latest commit on branch_8_11,
> >> since my surname contains a unicode-character, needing a fix in the gpg
> >> command ran by the smoketester.
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Lucene/Solr 8.11.1 release

2021-12-13 Thread David Smiley
Looks good; thanks Jan!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Dec 13, 2021 at 9:34 AM Jan Høydahl  wrote:

> Including Lucene dev as well.
>
> As I see no Lucene level bug fixes for 8.11.1, I have prepared an "empty"
> release announcement:
> https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote8_11_1
> Please edit as you see fit.
>
> The Solr announcement is slightly updated, proof-read welcome
> https://cwiki.apache.org/confluence/display/SOLR/ReleaseNote8_11_1
>
> There are now 18 CHANGES entries for Solr:
> https://github.com/apache/lucene-solr/blob/branch_8_11/solr/CHANGES.txt
>
> Jan
>
> > 13. des. 2021 kl. 02:24 skrev Jan Høydahl :
> >
> > There seems to be no open blockers for 8.11.1, so I'll proceed with RC1
> soon.
> > Shout out if you want me to wait for a specific important bugfix.
> >
> > Please also review the Release Notes at
> https://cwiki.apache.org/confluence/display/SOLR/ReleaseNote8_11_1
> >
> > Jan
> >
> >> 8. des. 2021 kl. 02:48 skrev Timothy Potter :
> >>
> >> agreed! thanks for stepping up to be the RM Jan ;-)
> >>
> >> On Tue, Dec 7, 2021 at 6:05 PM Jan Høydahl 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Solr has 13 bug fixes lined up in branch_8_11 already. Lucene has no
> changes.
> >>> Now that Lucene 9.0 is out the door (congrats!), let's do the 8.11.1
> release.
> >>>
> >>> I volunteer as RM and propose a first RC on Monday 13th. That should
> allow some time to merge any last bug fixes both for Lucene and Solr.
> >>> Please feel free to backport bug fixes with your own judgement (no new
> features please). If you are uncertain whether a backport is "safe", please
> raise a question here.
> >>>
> >>> I'll re-enable Jenkins for branch_8_11 now. Commits that fix or
> @BadApples unstable tests highly appreciated.
> >>>
> >>> Jan
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> >>> For additional commands, e-mail: dev-h...@solr.apache.org
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> >> For additional commands, e-mail: dev-h...@solr.apache.org
> >>
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Closing GitHub PRs

2021-12-09 Thread David Smiley
I prefer to not auto-close anything.  An issue that's open forever doesn't
seem to be harmful.  That said, I don't feel strongly enough to veto
whatever the consensus is.  I love the bulk-comment proposal!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Dec 8, 2021 at 11:40 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> I think the script is already proving helpful, finding PRs whose
> corresponding issues were closed.  I guess it is possible that some of
> those PRs might still be relevant, but likely most of them should be
> closed?  This seems helpful.  I spot checked a couple of these.  One of
> them indeed looked like it was merged
> <https://github.com/apache/lucene-solr/pull/1064>, so I closed it with a
> comment.  But the second one I checked
> <https://github.com/apache/lucene-solr/pull/906/files> looked like the src
> changes were merged but maybe the unit test in the PR failed to be merged
> <https://github.com/apache/lucene/commit/49631ace9f1ee110d52a207377e4926baef74929>
> ?
>
> And the script can be used to bulk-add comments.  I'm still +1 on that.
>
> But I really don't want to bulk-close all of the PRs.  That just makes
> these artifacts harder to find in the future.  Some of them are still
> relevant.  I just poked around a bit and found this still-open PR from
> Simon <https://github.com/apache/lucene-solr/pull/1925> which is/was a
> nice cleanup, from ~ one year ago now, of how DocumentsWriterPerThread
> tracks its (tricky!) lifecycle.  There are important changes in these
> still-open PRs, so I really don't think we should close them.  Maybe Simon
> or Nhat or myself comes back and cracks the rust off of this PR.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Dec 8, 2021 at 8:57 AM Robert Muir  wrote:
>
>> I'm also now even -1 against bulk-comment. You guys are trying to be
>> too sneaky/passive-aggressive/bypass consensus. I'm stopping this shit
>> right now in its tracks
>>
>> On Wed, Dec 8, 2021 at 8:50 AM Robert Muir  wrote:
>> >
>> > I'm -1 against auto-closing issues, as I already stated on this thread.
>> >
>> > On Wed, Dec 8, 2021 at 7:53 AM Jan Høydahl 
>> wrote:
>> > >
>> > > Calm down :)
>> > >
>> > > As you can read from the last comment, we can choose whether to
>> > > * Close with comment and label
>> > > * Comment and label only
>> > > * Comment only
>> > > * Do nothing
>> > >
>> > > The lucene-solr repo is not dead, it will still be used for
>> back-porting bugfixes to branch_8_11 for probably another 12 months.
>> > > Byt several branches are dead/archived, and it really makes no sense
>> to keep PRs for those alive either.
>> > >
>> > > This is a proposal for a one-time action, introducing a stale-bot for
>> the project, which I can see is more controversial and annoying for sure.
>> > >
>> > > Jan
>> > >
>> > > > 8. des. 2021 kl. 13:04 skrev Robert Muir :
>> > > >
>> > > > i mean you dont even have anything close to fucking consensus about
>> > > > "bulk close" on this thread. most are against it. why be so fucking
>> > > > sneaky about it? I don't get it!
>> > > >
>> > > > On Wed, Dec 8, 2021 at 7:03 AM Robert Muir 
>> wrote:
>> > > >>
>> > > >> On Wed, Dec 8, 2021 at 7:01 AM Robert Muir 
>> wrote:
>> > > >>>
>> > > >>> I added my vote against bulk close functionality.
>> > > >>> it is pretty clear from this thread that several of us are
>> opposed to
>> > > >>> bulk close.
>> > > >>>
>> > > >>> somehow the discussion jumped from bulk commenting to bulk close.
>> fuck that!
>> > > >>>
>> > > >>> On Wed, Dec 8, 2021 at 5:39 AM Jan Høydahl 
>> wrote:
>> > > >>>>
>> > > >>>> I gave it a shot, and it works, so with this change to
>> githubPRs.py script: https://github.com/apache/lucene-solr/pull/2625 we
>> can close all open PRs with a comment and label with a single command. The
>> script can also easily be adapted to other use cases.
>> > > >>>>
>> > > >>>> Jan
>> > > >>>>
>> > > >>>>> 8. des. 2021 kl. 01:33 skrev Jan Høydahl > >:
>> > > >>>>>
>

Re: Welcome Julie Tibshirani to the Lucene PMC

2021-12-02 Thread David Smiley
Welcome Julie!--
Sent from Gmail Mobile


Re: What should we do of branch_8x?

2021-11-20 Thread David Smiley
+1 to “leave the door open” despite it seeming an awkward endeavor.



On Sat, Nov 20, 2021 at 1:15 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Since there is no concrete design available for that as of today, that's
> why I mentioned about "keeping the door open" for 8.12. I'm not proposing a
> 8.12 today, nor am I saying 8.12 is needed. But, in case we need one, we
> should have the ability to release it. Anyway, this discussion should
> rather happen on the Solr list.
>
> On Sat, 20 Nov, 2021, 10:10 pm Timothy Potter, 
> wrote:
>
>> A Solr 8.12 with Lucene 8.11? Not sure of the details on that but
>> sounds like a giant mess waiting to happen (at the very least, would
>> require a bunch of complicated changes to the release process). We
>> need to stop adding features to 8x and focus on 9. I can foresee an
>> 8.11.2 with bug fixes only (8.11.1 is already planned to drop
>> soon'ish). Why would Solr need an 8.12? I suspect it's related to
>> upgrading plugins, but that's been an open issue for a long while and
>> seems to keep getting pushed out. We can't just keep planning new
>> feature releases because of this plugin upgrade problem. If we really
>> need an 8.12, then we need to see a concrete design on how the upgrade
>> process will work in an 8.12. Perhaps there's a better approach that
>> only relies on code changes to the 9x line? Tough to say, we have no
>> designs or descriptions of the upgrade problem at this point.
>>
>> As of today, I'd be strongly against a Solr 8.12 release.
>>
>> Tim
>>
>> On Sat, Nov 20, 2021 at 8:32 AM Ishan Chattopadhyaya
>>  wrote:
>> >
>> > I think we should keep the door open for a 8.12 release of Solr (based
>> on 8.11 Lucene). This might mean some split in the codebase, and this can
>> either happen in the lucene-solr repo or the solr repo (I'm okay with
>> either).
>> >
>> > On Sat, Nov 20, 2021 at 7:59 PM Adrien Grand  wrote:
>> >>
>> >> Uwe brought up the question on a the vote thread: we are not going to
>> do a 8.12 release, so what should we do of branch_8x?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> --
Sent from Gmail Mobile


Re: [VOTE] Release Lucene/Solr 8.11.0 RC1

2021-11-11 Thread David Smiley
+1
SUCCESS! [0:57:23.948714]


Re: Bump minimum Java version to 17 on main (10.0)

2021-11-04 Thread David Smiley
I prefer that we require JDK 17 for build/test but allow our artifacts
(except lucene-test-framework maybe) to be run on JDK 11 (or 14?) via
setting the "target".  This allows us some time to appreciate some of the
benefits of Java/JDK 17 without insisting that our users switch.  This
approach doesn't prevent us from fully-committing to JDK 17 for Lucene 10
if we want.  When we consider that Lucene is a library and not a full app,
we should be somewhat conservative here.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Nov 4, 2021 at 6:10 AM Uwe Schindler  wrote:

> Hi,
>
>
>
> I agree with this plan, lets go to JDK17 in Lucene 10 (main), but whenever
> a new Java version comes out, update Gradle and the –release=XX switch.
> Plain simple! The stable branch has a defined java version (currently
> “11”), “main” should be always latest. I don’t think this is a problem,
> because the Java release cycles have changed and people who are old-syled
> are still on 8 (so would be stuck with Lucene 8). For some larger companies
> they stick with officially “Oracle supported LTS” versions, but those
> people won’t upgrade soo. Nowadays with Docker and Kubernetes, it is so
> easy to start Solr or Elasticsearch with any Java version (and you don’t
> care, you just take what’s shipped with your image), so beeing bleeding
> edge on main is perfectly fine. When we release a new major version, we
> take what’s latest at that time (based on main branch, hopefully with
> Panama).
>
>
>
> Based on my previous statement, JDK 17 is not the final goal for Lucene
> 10, and not even 18 it is: JDK 18 won’t contain Panama (they have a second
> icubator of Total-Panama), so it is likely to be part of “java.base” Module
> in JDK 19 (still requiring some extra enabler-command line param).
>
>
>
> About 17: What I like most is the multiline-Strings and the new switch
> statement. In addition to Robert’s comment: I like it not only because of
> the break-hell, more because it is not a simple statement, but an
> expression (having return value). So the anti-pattern like a variable and
> then a switch stament assigning a value to this variable in each case is
> then finally obsolete. You have then “variable = switch(….)”. And finally
> we will get a switch for instanceofs a bit later (hopefully at same time
> when Panama comes out) 
>
>
>
> Records are bullshit, sorry. It’s only useful for the
> Hibernate/Spring/Foobar-like Entities-For-Everything business logic. It may
> be useful at some point when they are no instances on heap anymore and just
> data wrappers, but based on classes I see no reason to use them for Lucene.
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Dawid Weiss 
> *Sent:* Thursday, November 4, 2021 8:27 AM
> *To:* Lucene Dev 
> *Subject:* Re: Bump minimum Java version to 17 on main (10.0)
>
>
>
>
>
> Now you're talking.
>
> +1.
>
>
>
>
>
> On Thu, Nov 4, 2021 at 1:49 AM Robert Muir  wrote:
>
> On Wed, Nov 3, 2021 at 1:36 PM Dawid Weiss  wrote:
> >
> > I principally agree with you - we should leverage new Java features and
> I'm all for it. I just don't see much difference between
> > Java 11 and 17 in the context of Lucene... Upgrading for the sake of
> upgrading doesn't justify the move to
> > me. But if you can point at a feature of Java 17 and say - here, this is
> great and was not there before, it's worth using, then I'm all in.
> >
> > D.
>
> absolute-bulk-get methods on Byte/Short/Int/Long/Float/DoubleBuffers?
>
> I think we should investigate it for MMapDirectory and
> ByteBuffersDirectory at least? Maybe it can create new opportunities,
> e.g. reduce overhead vs position()+get().  Or maybe expand our
> random-access API to include it, and perhaps bit-unpacking can be
> simplified or sped up (e.g. DirectReader). Especially now that we have
> varhandles it seems to make more things possible. Or maybe there's no
> performance win for us and it only simplifies existing code in the
> short-term.
>
> I like the new PRNGs, maybe we should replace our handrolled
> xorshift128 stuff that is used for segment IDs (see StringHelper). The
> new API has nice set of algorithms:
>
> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html
> .
> Good to look at for the HNSW vector stuff, too. Maybe, we should
> switch over unit tests eventually too.
>
> The JFR runtime streaming api looks interesting, maybe we could
> improve tests.profile to use it, or mike's benchma

Re: Lucene 9.0 release

2021-10-29 Thread David Smiley
I think build & distribution oriented efforts shouldn't be held up by a
feature-freeze.  FF is more for getting stability on runtime code.  But
this is Adrien's call.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Oct 29, 2021 at 3:09 PM Adrien Grand  wrote:

> This sounds good to me Dawid. Please update this thread when you are done
> and I will proceed with branching.
>
> Le ven. 29 oct. 2021 à 20:13, Dawid Weiss  a
> écrit :
>
>>
>> Hi Adrien,
>>
>> Can you hold for LUCENE-10200? There is a patch under LUCENE-10192 that
>> restructures the binary distribution - nobody spoke anything so I assume
>> everyone liked it?... :)
>>
>> I'll try to polish the remaining little issues over the weekend so it
>> should be good for Tuesday but I'd give it a few eyes before we create the
>> branch? Alternatively, we can make a branch and just cherry-pick the
>> necessary changes there. I think it's easier if they land on main though.
>>
>> Dawid
>>
>> On Fri, Oct 29, 2021 at 6:00 PM Adrien Grand  wrote:
>>
>>> Hearing no objections, I will be moving forward with the plan I outlined
>>> above. Next Monday is a holiday in France so I'll actually be cutting
>>> branch_8_11, branch_9x and branch_9_0 on Tuesday.
>>>
>>> On Sun, Oct 17, 2021 at 3:40 PM Michael Sokolov 
>>> wrote:
>>>
>>>> > Mike, your previous email suggests that you would like someone else
>>>> to step up. If that's correct I'm happy to be the release manager for both
>>>> 8.11 and 9.0.
>>>>
>>>> Thanks, that would be very welcome!
>>>>
>>>> On Fri, Oct 15, 2021 at 1:33 PM Timothy Potter 
>>>> wrote:
>>>> >
>>>> > Sounds like a good plan Adrien, thanks for nailing down some concrete
>>>> > milestones and dates :-)
>>>> >
>>>> > Cheers,
>>>> > Tim
>>>> >
>>>> > On Fri, Oct 15, 2021 at 7:04 AM David Smiley 
>>>> wrote:
>>>> > >
>>>> > > +1 Adrien.  Thanks for moving things along.
>>>> > >
>>>> > > ~ David Smiley
>>>> > > Apache Lucene/Solr Search Developer
>>>> > > http://www.linkedin.com/in/davidwsmiley
>>>> > >
>>>> > >
>>>> > > On Fri, Oct 15, 2021 at 3:30 AM Adrien Grand 
>>>> wrote:
>>>> > >>
>>>> > >> For visibility, I recently opened a new issue about a case of
>>>> index corruption which is a blocker for 9.0. Nhat is looking into it.
>>>> > >>
>>>> > >> We've been discussing releasing 9.0 for a long time now and I
>>>> think that everybody agrees with moving forward, there's even some good
>>>> momentum around making the build and release tooling ready. So I'd like to
>>>> propose the following timeline for the 9.0 release to get some feedback:
>>>> > >>
>>>> > >> 2021-11-01: Feature freeze:
>>>> > >>  - branch_9x gets created from main
>>>> > >>  - branch_8_11 gets created from branch_8x
>>>> > >> This gives us ~2 weeks to do some last-minute work. The reasoning
>>>> for doing 8.11 as well is that we have some enhancements merged to
>>>> branch_8x that I suspect some users would like to see released in 8.x.
>>>> Important note: 8.11 will be the last minor release of major version 8.
>>>> There might be new patch releases in the future such as 8.11.1 or 8.11.2,
>>>> but there won't be a 8.12 or a 8.13.
>>>> > >>
>>>> > >> 2021-11-04: First RC for 8.11
>>>> > >> Since we had 8.10 not long ago, hopefully the release process will
>>>> go smoothly.
>>>> > >>
>>>> > >> ~2021-11-10: First RC for 9.0
>>>> > >> The date is indicative, the plan would be to move forward with the
>>>> first 9.0 RC as soon as the following conditions are met:
>>>> > >>  - 8.11 is out
>>>> > >>  - all 9.0 blockers have been addressed
>>>> > >>
>>>> > >> Mike, your previous email suggests that you would like someone
>>>> else to step up. If that's correct I'm happy to be the release manager for
>>>> both 8.11 and 9.0.
>>>> > >>
&g

Re: Lucene 9.0 release

2021-10-15 Thread David Smiley
+1 Adrien.  Thanks for moving things along.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Oct 15, 2021 at 3:30 AM Adrien Grand  wrote:

> For visibility, I recently opened a new issue about a case of index
> corruption <https://issues.apache.org/jira/browse/LUCENE-10159> which is
> a blocker for 9.0. Nhat is looking into it.
>
> We've been discussing releasing 9.0 for a long time now and I think that
> everybody agrees with moving forward, there's even some good momentum
> around making the build and release tooling ready. So I'd like to propose
> the following timeline for the 9.0 release to get some feedback:
>
> 2021-11-01: Feature freeze:
>  - branch_9x gets created from main
>  - branch_8_11 gets created from branch_8x
> This gives us ~2 weeks to do some last-minute work. The reasoning for
> doing 8.11 as well is that we have some enhancements merged to branch_8x
> that I suspect some users would like to see released in 8.x. Important
> note: 8.11 will be the last minor release of major version 8. There might
> be new patch releases in the future such as 8.11.1 or 8.11.2, but there
> won't be a 8.12 or a 8.13.
>
> 2021-11-04: First RC for 8.11
> Since we had 8.10 not long ago, hopefully the release process will go
> smoothly.
>
> ~2021-11-10: First RC for 9.0
> The date is indicative, the plan would be to move forward with the first
> 9.0 RC as soon as the following conditions are met:
>  - 8.11 is out
>  - all 9.0 blockers have been addressed
>
> Mike, your previous email suggests that you would like someone else to
> step up. If that's correct I'm happy to be the release manager for both
> 8.11 and 9.0.
>
>
> On Sat, Oct 2, 2021 at 11:54 PM Michael Sokolov 
> wrote:
>
>> Yes! I'm curious to give it a go, but getting pulled in many different
>> directions. If nobody else steps up, I will be free to shepherd the
>> release along in a  couple of weeks, assuming the current firestorm
>> subsides...
>>
>> On Thu, Sep 30, 2021 at 9:54 AM Jan Høydahl 
>> wrote:
>> >
>> > +1
>> >
>> > Blockers seem to be done with. So I guess we just need an RM to get the
>> ball rolling? :)
>> >
>> > I know that the Release Wizard in new Lucene repo needs some updates
>> https://issues.apache.org/jira/browse/LUCENE-9809 - I may help some with
>> that...
>> >
>> > Cross-ref other 9.0 release mail-threads:
>> > - "Now that 8.10 is out ... let's get rolling on 9!"
>> https://lists.apache.org/thread.html/r868028d42a19ae02d5bbe2e3329da26869045002b9bb4760b8056c56%40%3Cdev.lucene.apache.org%3E
>> > - "9.0 release":
>> https://lists.apache.org/thread.html/r7bef0af668860fdbfedb4b58261efd01d9fb26dc280915284c121065%40%3Cdev.lucene.apache.org%3E
>> >
>> > Jan
>> >
>> > 17. aug. 2021 kl. 11:13 skrev Adrien Grand :
>> >
>> > +1 to your suggestions
>> >
>> > I just commented on LUCENE-9959 to suggest reverting since the changes
>> are currently half baked and I don't think that they should block 9.0.
>> There are no other blockers left to my knowledge.
>> >
>> > On Sat, Aug 14, 2021 at 6:24 PM Michael Sokolov 
>> wrote:
>> >>
>> >> It's been two years since our last release, we had lots of +1 when we
>> >> raised this last December, and IMO we are close to baked at this
>> >> point.
>> >>
>> >> I checked JIRA and found two remaining Blockers
>> >>
>> >> 1. https://issues.apache.org/jira/browse/LUCENE-10016
>> >> VectorReader.search needs rethought, o.a.l.search integration?
>> >> 2. https://issues.apache.org/jira/browse/LUCENE-8638 Remove deprecated
>> >> code in main
>> >>
>> >> The first one is very close to resolved;
>> >>
>> >> On the deprecations, the issue has lingered for 1-1/2 years now, and
>> >> some progress has been made, but more work remains. Some new
>> >> deprecations have been added since it was opened too. Maybe we make a
>> >> concerted effort to clean out as much as we can, and then decide if
>> >> it's enough? Anyway this seems to be the only outstanding issue, so
>> >> let's see if we can make progress there
>> >>
>> >> Q: any other blockers?
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>> >
>> > --
>> > Adrien
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Adrien
>


Re: Welcome Michael Gibney as Lucene committer

2021-10-07 Thread David Smiley
Welcome Michael Gibney!  It's about time :-)

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Oct 6, 2021 at 9:34 AM Dawid Weiss  wrote:

> Hello everyone!
>
> Please welcome Michael Gibney as the latest Lucene committer. Michael
> - it's a tradition for you to introduce yourself, even if we've been
> seeing you for quite a while! :)
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Soften Jira's note when opening new issues?

2021-09-24 Thread David Smiley
Agreed Walter.  At least it's better than before.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Sep 24, 2021 at 11:09 AM Walter Underwood 
wrote:

> it seems odd too start with a statement that there is a mailing list
> without any idea why the person cares. That is why my suggestions started
> with the person’s need, not with the bare fact of the mailing list.
>
> People are likely to skip over that whole paragraph after they scan “This
> project has a use mailing list…”. The first few words are by far the most
> important. Again, I strongly suggest starting with "If you want help or
> have a feature idea…”
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Sep 24, 2021, at 12:57 AM, Adrien Grand  wrote:
>
> Infra helped me change the message
> <https://issues.apache.org/jira/browse/INFRA-22353> yesterday, thanks for
> the discussion on this thread.
>
> +1 on your PR to the project's README.
>
> The problem I saw with Jira recently - and I acknowledge that there might
> be a bias - is that users had read our HowToContribute guide already, which
> suggests opening a Jira. But then Jira told these contributors to go to the
> mailing-list first before we updated the message. I like the idea of
> linking HowToContribute from the perspective that it would be welcoming and
> encourage contributions, but it would increase the amount of text that you
> have to read when using Jira yet the anecdotal evidence I have is that
> these contributors were already familiar with the HowToContribute since it
> is the thing that led them to Jira in the first place. No strong feelings,
> I could be convinced otherwise but wanted to give this perspective.
>
> On Thu, Sep 23, 2021 at 7:41 PM Greg Miller  wrote:
>
>> Hi Adrien- that's totally fair. There are probably better places for
>> the additional content I'm proposing. A couple things along these
>> lines:
>>
>> 1. Do you think it would be worth linking this guide from the JIRA
>> message (maybe after updating it)?
>> https://cwiki.apache.org/confluence/display/lucene/HowToContribute. It
>> could be a nice hook for new users to learn more (and it's what we
>> link from our README). Maybe it would make the message too long
>> though?
>> 2. I just put up a very brief PR to add my proposed "friendly message"
>> to the README before linking off to the above-mentioned guide:
>> https://github.com/apache/lucene/pull/318.
>>
>> Back to your original proposal though, I'll add my +1 as I think it's
>> a big improvement from the current messaging. Thanks for bringing this
>> up!
>>
>> Cheers,
>> -Greg
>>
>> On Wed, Sep 22, 2021 at 9:23 AM Walter Underwood 
>> wrote:
>> >
>> > Hmm. How is this? It is a single longer sentence, but essentially a
>> string of simple ones.
>> >
>> > If you want help or have a feature idea, please ask on the mailing list
>> or IRC channel before submitting a Jira issue.
>> >
>> > wunder
>> > Walter Underwood
>> > wun...@wunderwood.org
>> > http://observer.wunderwood.org/  (my blog)
>> >
>> > On Sep 22, 2021, at 9:18 AM, Adrien Grand  wrote:
>> >
>> > Greg, I understand and agree with the intent, but I also would like to
>> keep this as short as possible since the screen to create a new issue in
>> JIRA is already quite intimidating with all its text boxes, and the current
>> version is already taking two lines even though it's short. Maybe this is
>> the sort of thing that we could try to better emphasize in our project's
>> README?
>> >
>> > On Wed, Sep 22, 2021 at 6:07 PM Walter Underwood 
>> wrote:
>> >>
>> >> Two excellent points. So it could be:
>> >>
>> >> Are you looking for support for Lucene? Have you seen unexpected
>> behavior? Have an idea for a new feature or improvement? Please ask for
>> help on the Lucene user mailing list or the IRC channel. If it is a new
>> problem or idea, then you can submit a Jira issue.
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wun...@wunderwood.org
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >> On Sep 22, 2021, at 6:38 AM, Greg Miller  wrote:
>> >>
>> >> Love this idea!
>> >>
>> >> I wonder if there's a way to make the messaging clear that ideas for
>> >> new features/improvements are also always welcome? When I read the
>> >>

Re: 8.10 release soon?

2021-09-07 Thread David Smiley
Yes, in theory all commits to branch_8x are releasable at the time in the
minds of whoever is doing the commit/merge.  Nevertheless, we humans are
very fallible and a baking period of a few days allows fix-ups and such
that ultimately improve quality a little.  Of course this can delay
features a tiny bit and it's annoying to back-port to yet another branch
when there are last-minute fixes.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Sep 7, 2021 at 11:53 AM Timothy Potter  wrote:

> That's the process I've been following, so not sure what your point is
> David? The release branch didn't get cut last week because there were
> a number of JIRAs marked as blockers for 8.10 and we needed clarity
> about how to move forward from the committers involved with those
> tickets. Also, I don't see the point in requiring an extra backport
> even if trivial if we know some features are coming in soon. From
> where I sit, any code change that hits 8.x should be releasable
> immediately or it shouldn't be committed.
>
> Tim
>
> On Tue, Sep 7, 2021 at 9:31 AM David Smiley  wrote:
> >
> > The release branch separate from cutting the RC used to be quite normal;
> not exceptional.  We should continue the practice for release stability.
> It also gives clarity for those of us like me that have something ready-ish
> to merge on which 8x version it should go into.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Tue, Sep 7, 2021 at 10:26 AM Timothy Potter 
> wrote:
> >>
> >> Thanks for the heads up on LUCENE-10088 Mike. I'd still like to cut
> >> the release branch today, but won't start the RC until I get word back
> >> from you on that issue. I know this adds another backport for you but
> >> I can do that once it lands on 8x.
> >>
> >> On Tue, Sep 7, 2021 at 7:18 AM Michael McCandless
> >>  wrote:
> >> >
> >> > Hi Timothy,
> >> >
> >> > Heads up: I'm currently digging on this issue (LUCENE-10088: "too
> many open files" in two failed builds, one in main, one in 8.x), and I'm
> worried that maybe the root cause was backported to 8.x, i.e. maybe we have
> a new file handle leak.
> >> >
> >> > If so, this might be a blocker for 8.10.0 release.
> >> >
> >> > I'll try to make progress today on getting to the root cause.
> >> >
> >> > Mike McCandless
> >> >
> >> > http://blog.mikemccandless.com
> >> >
> >> >
> >> > On Thu, Sep 2, 2021 at 1:43 PM Timothy Potter 
> wrote:
> >> >>
> >> >> Thanks for the feedback.
> >> >>
> >> >> My plan right now is to cut the release branch for 8.10 on *Tuesday,
> >> >> Sept. 7* (Monday is a US holiday).  Hopefully this gives enough time
> >> >> to get the remaining issues addressed.
> >> >>
> >> >> After the release branch is cut, there should be some discussion on
> >> >> any new changes coming into that branch before RC1 comes out (I
> >> >> usually give a day or two after creating the release branch before
> >> >> creating RC1)
> >> >>
> >> >> May I please ask for some help with the Lucene 8.10 release notes?
> >> >> I'll do the notes for Solr. Please add the notes here:
> >> >> https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote8_10
> >> >>
> >> >> Cheers,
> >> >> Tim
> >> >>
> >> >> On Thu, Sep 2, 2021 at 11:15 AM Nicholas Knize 
> wrote:
> >> >> >
> >> >> > +1 to cutting the 8.10 branch and getting the release moving. Will
> be good to get the #9981 patch out there. Tests seem happy for a while.
> Thanks for chasing these blockers.
> >> >> >
> >> >> > Nicholas Knize, Ph.D., GISP
> >> >> > Principal Engineer - Search  |  Amazon
> >> >> > Apache Lucene PMC Member and Committer
> >> >> > nkn...@apache.org
> >> >> >
> >> >> >
> >> >> > On Sat, Aug 28, 2021 at 11:18 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
> >> >> >>
> >> >> >> Yeah +1!  Lucene's first "non floating point" compliant release
> in a long time?
> >> >> >>
> >> >> >> Mike
> >> >> >>
> >> >&

Re: 8.10 release soon?

2021-09-07 Thread David Smiley
The release branch separate from cutting the RC used to be quite normal;
not exceptional.  We should continue the practice for release stability.
It also gives clarity for those of us like me that have something ready-ish
to merge on which 8x version it should go into.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Sep 7, 2021 at 10:26 AM Timothy Potter  wrote:

> Thanks for the heads up on LUCENE-10088 Mike. I'd still like to cut
> the release branch today, but won't start the RC until I get word back
> from you on that issue. I know this adds another backport for you but
> I can do that once it lands on 8x.
>
> On Tue, Sep 7, 2021 at 7:18 AM Michael McCandless
>  wrote:
> >
> > Hi Timothy,
> >
> > Heads up: I'm currently digging on this issue (LUCENE-10088: "too many
> open files" in two failed builds, one in main, one in 8.x), and I'm worried
> that maybe the root cause was backported to 8.x, i.e. maybe we have a new
> file handle leak.
> >
> > If so, this might be a blocker for 8.10.0 release.
> >
> > I'll try to make progress today on getting to the root cause.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Thu, Sep 2, 2021 at 1:43 PM Timothy Potter 
> wrote:
> >>
> >> Thanks for the feedback.
> >>
> >> My plan right now is to cut the release branch for 8.10 on *Tuesday,
> >> Sept. 7* (Monday is a US holiday).  Hopefully this gives enough time
> >> to get the remaining issues addressed.
> >>
> >> After the release branch is cut, there should be some discussion on
> >> any new changes coming into that branch before RC1 comes out (I
> >> usually give a day or two after creating the release branch before
> >> creating RC1)
> >>
> >> May I please ask for some help with the Lucene 8.10 release notes?
> >> I'll do the notes for Solr. Please add the notes here:
> >> https://cwiki.apache.org/confluence/display/LUCENE/ReleaseNote8_10
> >>
> >> Cheers,
> >> Tim
> >>
> >> On Thu, Sep 2, 2021 at 11:15 AM Nicholas Knize 
> wrote:
> >> >
> >> > +1 to cutting the 8.10 branch and getting the release moving. Will be
> good to get the #9981 patch out there. Tests seem happy for a while. Thanks
> for chasing these blockers.
> >> >
> >> > Nicholas Knize, Ph.D., GISP
> >> > Principal Engineer - Search  |  Amazon
> >> > Apache Lucene PMC Member and Committer
> >> > nkn...@apache.org
> >> >
> >> >
> >> > On Sat, Aug 28, 2021 at 11:18 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
> >> >>
> >> >> Yeah +1!  Lucene's first "non floating point" compliant release in a
> long time?
> >> >>
> >> >> Mike
> >> >>
> >> >> On Thu, Aug 26, 2021 at 4:09 AM Adrien Grand 
> wrote:
> >> >>>
> >> >>> +1 to a 8.10 release and cutting a branch next week
> >> >>>
> >> >>> On Tue, Aug 24, 2021 at 8:02 PM Timothy Potter <
> thelabd...@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi folks,
> >> >>>>
> >> >>>> Looks like we have a number of nice enhancements and bug fixes in
> >> >>>> Lucene and Solr for 8.10.
> >> >>>>
> >> >>>>
> https://github.com/apache/lucene-solr/blob/branch_8x/lucene/CHANGES.txt
> >> >>>>
> https://github.com/apache/lucene-solr/blob/branch_8x/solr/CHANGES.txt
> >> >>>>
> >> >>>> However, there are a few open blockers marked for Solr 8.10, see:
> >> >>>> https://issues.apache.org/jira/browse/SOLR-15596?filter=12350839
> >> >>>>
> >> >>>> The blockers (SOLR-15596, SOLR-15412, SOLR-14593) are not assigned
> to
> >> >>>> anyone. Is anyone looking at these? If not, do they need to block
> the
> >> >>>> 8.10 release?
> >> >>>>
> >> >>>> I propose we should cut the release branch next week but that
> >> >>>> obviously depends on our decision around these open blockers.
> >> >>>>
> >> >>>> Cheers,
> >> >>>> Timothy Potter
> >> >>>>
> >> >>>> PS ~ I volunteer to be the Release Manager ;-)
> >> >>>>
> >> >>>>
> -
> >> >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Adrien
> >> >>
> >> >> --
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [lucene] branch branch_8x created (now b186bb6)

2021-07-02 Thread David Smiley
Yes, simply delete it, I think.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jul 2, 2021 at 12:56 PM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> oops, sorry, i mixed up the "origin" and "origin-lucene" remotes when
> backporting from lucene/main to lucene-solr/branch_8x ...
>
> is there a way to prevent such accidental branch creation?
>
> do i just go and delete it now or something else?
>
> apologies.
>
> christine
>
> From: dev@lucene.apache.org At: 07/02/21 17:51:50 UTC+1:00
> To: comm...@lucene.apache.org
> Subject: [lucene] branch branch_8x created (now b186bb6)
>
> This is an automated email from the ASF dual-hosted git repository.
>
> cpoerschke pushed a change to branch branch_8x
> in repository https://gitbox.apache.org/repos/asf/lucene.git.
>
>
> at b186bb6 remove no-longer-accurate sentence in
> TopTermsScoringBooleanQueryRewrite javadocs (#197)
>
> This branch includes the following new commits:
>
> new afce3b0 SOLR-15245: Document zk-read permission and use zk-read
> permission for content
> new d6ffe1c Disable the "mvn" build, as this fails on jenkins
> (HTTP->HTTPS). As it's no longer maintained, don't spend time into fixing
> it. We just need to make sure the POM files are generated and pass
> validtation.
> new 7686018 Fix HTTP->HTTPS migration of Maven Central aso in POM file
> dependencies check
> new c2d5f41 LUCENE-9836: Remove ant run-maven-build (won't bootstrap
> anymore), fix more places where the secured Maven Central repo is needed
> to
> execute
> new f171417 LUCENE-9791 Allow calling BytesRefHash#find
> <#m_1254181069536134733_find> concurrently (#8)
> new bd1dbd6 LUCENE-9791: add CHANGES.txt entry
> new 80e8fed LUCENE-9791: remove unused import, switch from JDK 9+
> Arrays.equals API to FutureArrays.equals
> new e0599d7 LUCENE-9836: Prevent snapshot checks on Cloudera repo
> new 03b7a6d remove accidental extra K
> new 7521cba LUCENE-9836: Fix 8.x Maven Validation and publication to work
> with Maven Central and HTTPS again; remove pure Maven build (did not work
> anymore) (#2469)
> new b61b19c LUCENE-9663: Add compression to terms dict from
> SortedSet/Sorted DocValues.
> new e66d88f SOLR-15267: fix jekyll layouts and CheckLinksAndAnchors to
> ensure link checking is applied to every page regardless of layout
> new 5d75fad SOLR-11233: Add optional JAVA8_GC_LOG_FILE_OPTS for bin/solr.
> (#2284)
> new c401dd4 SOLR-15249 Properly set ZK ACLs on /security.json
> new 2557026 SOLR-15291: ref-guide note clarifying 'safe' way to do
> De-Duplication w/SignatureUpdateProcessorFactory in SolrCloud
> new c83f321 SOLR-15273: Distributed Group Query supports rename unique
> key field name (#35)
> new bbc0804 SOLR-15191: Fix testFacetEnumSearch
> new 0814f9a SOLR-15154: Document new options for credentials (#14)
> new db44134 SOLR-15155: Let CloudHttp2SolrClient accept an external
> Http2SolrClient Builder (#15)
> new 9a9676e LUCENE-9887: fix error param use in RadixSelector
> new 34bb532 Disable login autocomplete (#11)
> new 86d2672 SOLR-13608: Ensure backup tests avoid SimpleText codec
> new 4f62826 LUCENE-9888: re-enable CheckIndex verification that indexSort
> is the same across all segments
> new 1f0cea3 SOLR-15212: fix links to Solr website + fixes Tika and PDFBox
> links
> new e812a50 LUCENE-9385: Add FacetsConfig option to control which
> drill-down terms are indexed for a FacetLabel (#2471)
> new eaba604 SOLR-15243: Update MoreLikeThis docs
> new c066944 LUCENE-9870: Fix Circle2D intersectsLine t-value (distance)
> range clamp (#41)
> new 048a9c1 LUCENE-9762,LUCENE-9744: Update CHANGES.txt
> new 1cfd0df LUCENE-9507 Custom order for leaves (#2473)
> new 0f39073 LUCENE-9877: Allow up to 7 exceptions in PForUtil (instead of
> 3). Backporting this since it's fully backwards-compatible. (#2474)
> new c91be08 SOLR-15243: fix build failure due to changed section title in
> branch_8x
> new eb25c78 SOLR-15292: An ERROR is logged if
> SignatureUpdateProcessorFactory is used in SolrCloud cluster in a way that
> is
> known to be problematic with multiple replicas
> new 3d5105b Ref Guide: fix bad link and list out of order
> new b81252f SOLR-15217: Use shardsWhitelist in ReplicationHandler.
> new b704dc6 SOLR-15288: Hardening NODEDOWN event event in PRS mode (#2479)
> new 58a1e2f SOLR-15288: precommit errors
> new babd41b SOLR-15233: Set doAs in ConfigurableInternodeAuthHadoopPlugin
> new b4e08f5 SOLR-11921: Move "cursorMark" logic from QueryComponent to
> SearchHandler so it can work with things like QueryElevationComponent that
> modify the SortSpec in prepare(), as well 

Re: Two-phase range queries?

2021-06-29 Thread David Smiley
Greg, you may find it interesting to see some code in spatial-extras that
works similar to what you describe. There is a so-called
CompositeSpatialStrategy built off a grid using the terms index (not BKD)
plus another that stores the vector geometry into DocValues.  The Query
that does the work is here:
https://github.com/apache/lucene/blob/main/lucene/spatial-extras/src/java/org/apache/lucene/spatial/composite/IntersectsRPTVerifyQuery.java
Only some "shapes" that are at the border of the query shape will require
the potentially expensive DocValues lookup.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jun 29, 2021 at 9:30 AM Greg Miller  wrote:

> Hi folks-
>
> I've been spending a little time getting familiar with the BKD-tree-based
> range query support currently implemented in Lucene, and wonder if there's
> ever been a discussion around supporting two-phase iteration in this space.
> If I'm understanding the current implementation properly (specifically
> looking at PointRangeQuery), it appears that all matches are determined
> up-front by 1) identifying segments of the tree that contain candidate
> matches (i.e., containing part of the query range), and then 2) confirming
> whether-or-not the contained points actually fall within the range. I'm
> also a little low on coffee this morning so it's entirely possible I'm
> misunderstanding the current implementation (please correct me if so).
>
> With this approach, it seems like we could potentially be doing quite a
> bit of wasted effort in some situations. I have no thoughts on how to
> actually implement this yet, but I wonder if we could support two-phase
> iteration by 1) returning all docs with points contained in candidate
> BKD-tree segments as an approximation, and then 2) only checking the points
> against the query range when confirming matches in the second phase? I
> think the idea would extend to LatLonPointDistanceQuery as well (and maybe
> others?).
>
> I did a Jira search for a related issue but came up empty. Anyone know if
> this idea has been discussed previously, or if there's some inherent flaw
> with the approach that would make it a non-starter? I don't really have any
> cycles to work on this at the moment, but can at least open a Jira issue to
> track if it seems like a reasonable thing to explore.
>
> Cheers,
> -Greg
>


Re: 9.0 release

2021-06-29 Thread David Smiley
There are also deprecations to remove:
https://issues.apache.org/jira/browse/LUCENE-8638

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jun 29, 2021 at 2:43 PM Mike Drob  wrote:

> Looks like just LUCENE-9334 remains?
>
> On Wed, Apr 14, 2021 at 10:18 PM Julie Tibshirani 
> wrote:
> >
> > Hello everyone! I will pick up LUCENE-9908.
> >
> >
> > I had marked LUCENE-9583 as a blocker but I'm on board with removing its
> blocker status given we can make changes later. I hope to come back to the
> issue soon with some ideas.
> >
> >
> > Julie
> >
> >
> > On Wed, Apr 14, 2021 at 12:42 PM Adrien Grand  wrote:
> >>
> >> I agree that we can remove the blocker status from LUCENE-9583 and take
> advantage of the fact that these new APIs are experimental to improve
> things later.
> >>
> >> For the renaming issue, maybe we could just make vectors plural for now
> for consistency and revisit other options later.
> >>
> >> On Wed, Apr 14, 2021 at 8:21 PM Michael Sokolov 
> wrote:
> >>>
> >>> Thanks Adrien; I plan to tackle LUCENE-9905.
> >>>
> >>>  I don't have ideas about how to move forward on LUCENE-9583; I spent
> >>> significant amount of time trying various permutations on that API,
> >>> and what we have was the best compromise I could find at the time, so
> >>> I'm not sure I agree it's a Blocker, yet I'm open to improvements.
> >>> Maybe Julie will propose something?
> >>>
> >>> There is also a vector-related renaming issue Tomoko had opened, which
> >>> I thought was marked Blocker, but I guess no longer is. Previously I
> >>> had hoped to get some strong consensus, but that proved challenging.
> >>> Given that, I'm OK leaving things as-is, marking these apis
> >>> @experimental and potentially revisiting naming issues later, eg once
> >>> we have a second vector ANN implementation.
> >>>
> >>> On Wed, Apr 14, 2021 at 11:07 AM Adrien Grand 
> wrote:
> >>> >
> >>> > Hi Mike,
> >>> >
> >>> > Here's what I know about the remaining blockers:
> >>> >
> >>> > LUCENE-9908 - Move VectorValues#search to VectorReader and LeafReader
> >>> > This was discussed on the mailing list and it looks like there was
> agreement on making that change. If someone has cycles and can take it,
> please go ahead, otherwise I'll try to allocate some time to it. I'm
> expecting this change to be rather straightforward.
> >>> >
> >>> > LUCENE-9905 - Revise approach to specifying NN algorithm
> >>> > This is a change to how we've been thinking about configuring the
> ANN algorithm. I don't know if someone plans to work on it.
> >>> >
> >>> > LUCENE-9583 - How should we expose VectorValues.RandomAccess
> >>> > We'd like to get rid of this sub interface, but I'm not the best
> person to comment on how much work this needs. Maybe Mike S or Julie can
> give more info.
> >>> >
> >>> > LUCENE-9334 - Require consistency between data-structures on a
> per-field basis
> >>> > Mayya has been working on this one and it's very close.
> >>> >
> >>> > LUCENE-9047 - Directory APIs should be little endian
> >>> > Ignacio and Julie have been working on this one and it is close as
> well.
> >>> >
> >>> >
> >>> > On Tue, Apr 13, 2021 at 10:59 PM Mike Drob  wrote:
> >>> >>
> >>> >> Michael, did you get a chance to mark the issues you were thinking
> of as blockers?
> >>> >>
> >>> >> Adrien, I see that the remaining open blockers look mostly like
> your open issues. Two of them have recent activity, but LUCENE-9047 would
> need to be brought back to the lucene repo. Is this an accurate view of the
> state of things?
> >>> >>
> >>> >> Now that I'm done with 8.8.2, I would love to see how we can
> continue to make headway on 9.0!
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Mon, Mar 29, 2021 at 3:25 PM Michael Sokolov 
> wrote:
> >>> >>>
> >>> >>> There has been some discussion around a few code visibility and
> naming
> >>> >>> issues related to "VectorFormat" as it is called today. I'd like to
> >>> >>> get that sort

Re: Welcome Mayya Sharipova to the Lucene PMC

2021-06-28 Thread David Smiley
Welcome Mayya!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 28, 2021 at 9:17 AM Robert Muir  wrote:

> I am pleased to announce that Mayya has accepted an invitation to join
> the Lucene PMC!
>
> Congratulations, and welcome aboard!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Propose changing the "dist" layout

2021-06-11 Thread David Smiley
We (all?) agree to do away with "contrib" :-).
I think a folder grouping the modules (that which can plug inside Solr) is
useful as there are a number of them -- as such this is a nice organization
IMO.  There's a bunch of other stuff at the top level and I'd rather not
intermix all our modules at this layer.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jun 11, 2021 at 4:41 PM Mike Drob  wrote:

> We can have modules, but why do they need to be in an additional folder
> deep? Why not just have langid next to solrj and core? Contrib to me
> connotes experimental or unsupported, which these things are decidedly not.
>
> On Fri, Jun 11, 2021 at 2:59 PM David Smiley  wrote:
>
>> The contrib folder is just a folder of modules -- optional plugins for
>> solr-core.  IMO we should simply rename "contrib" to "modules".  I think
>> the only non-module in there is the prometheus exporter which could move
>> out.  Mike, I'm not sure if you have a different notion of what "module"
>> is?  I believe most of us would be happy to move away from "contrib"
>> wording, anyway.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Fri, Jun 11, 2021 at 3:03 PM Mike Drob  wrote:
>>
>>> I think related to this, I would like to see some "contibs" moved out
>>> from the contrib folder and into proper modules. Right now the
>>> definition of contrib seems to be anything that isn't core or solrj,
>>> but maybe there is room for a backup module that has gcs and s3 and
>>> hdfs all under it. LangId is already mentioned in our ref guide but we
>>> pretend like it is always present and don't think of it as a contrib.
>>> We kind of think of contrib as optional extra stuff, so maybe we call
>>> the things what they are - plugins and extensions? Then we don't have
>>> to think as hard about why certain things are showing up in which lib
>>> folders.
>>>
>>> Also, minor benefit, I would then be able to type c instead of
>>> having to type cor to disambiguate from con in my terminal.
>>>
>>> On Fri, Jun 11, 2021 at 8:09 AM David Smiley  wrote:
>>> >
>>> > I believe we can do a fair amount of re-organization pertaining to
>>> Jetty without losing the Jetty configuration that I think is valuable to
>>> users who want to tweak something.
>>> >
>>> > ~ David Smiley
>>> > Apache Lucene/Solr Search Developer
>>> > http://www.linkedin.com/in/davidwsmiley
>>> >
>>> >
>>> > On Fri, Jun 11, 2021 at 8:01 AM Jan Høydahl 
>>> wrote:
>>> >>
>>> >> +1 to a cleanup here for 9.0. As clean and neat organization as
>>> possible. Perhaps rename "dist" -> "lib"?
>>> >>
>>> >> I wish we could get rid of the server (jetty) folder altogether, and
>>> move everything from server/solr-webapp/webapp/WEB-INF/lib to "lib/deps/".
>>> But that ties into custom boot-class, getting rid of web.xml and building
>>> Jetty context in Java code.. I'm willing to help here if others also want
>>> to go this direction. This would further hide Jetty as an impl detail and
>>> let us organize stuff more freely.
>>> >>
>>> >> Jan
>>> >>
>>> >> 11. jun. 2021 kl. 13:29 skrev David Smiley :
>>> >>
>>> >> Bumping this conversation up, based on recent communication.  I have
>>> yet to take action but really any of us can.
>>> >>
>>> >> ~ David Smiley
>>> >> Apache Lucene/Solr Search Developer
>>> >> http://www.linkedin.com/in/davidwsmiley
>>> >>
>>> >>
>>> >> On Mon, Nov 23, 2020 at 8:48 AM David Smiley 
>>> wrote:
>>> >>>
>>> >>> I'll proceed on this with lazy consensus.  I suspect most of us
>>> don't care, unsurprisingly since I doubt anyone has any fondness for the
>>> "dist" folder.
>>> >>>
>>> >>> ~ David Smiley
>>> >>> Apache Lucene/Solr Search Developer
>>> >>> http://www.linkedin.com/in/davidwsmiley
>>> >>>
>>> >>>
>>> >>> On Sun, Nov 15, 2020 at 7:31 AM Erick Erickson <
>>> erickerick...@gmail.com> wrote:
>>> >>>>
>>> >>>> Well, Solr has grow

Re: Propose changing the "dist" layout

2021-06-11 Thread David Smiley
The contrib folder is just a folder of modules -- optional plugins for
solr-core.  IMO we should simply rename "contrib" to "modules".  I think
the only non-module in there is the prometheus exporter which could move
out.  Mike, I'm not sure if you have a different notion of what "module"
is?  I believe most of us would be happy to move away from "contrib"
wording, anyway.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jun 11, 2021 at 3:03 PM Mike Drob  wrote:

> I think related to this, I would like to see some "contibs" moved out
> from the contrib folder and into proper modules. Right now the
> definition of contrib seems to be anything that isn't core or solrj,
> but maybe there is room for a backup module that has gcs and s3 and
> hdfs all under it. LangId is already mentioned in our ref guide but we
> pretend like it is always present and don't think of it as a contrib.
> We kind of think of contrib as optional extra stuff, so maybe we call
> the things what they are - plugins and extensions? Then we don't have
> to think as hard about why certain things are showing up in which lib
> folders.
>
> Also, minor benefit, I would then be able to type c instead of
> having to type cor to disambiguate from con in my terminal.
>
> On Fri, Jun 11, 2021 at 8:09 AM David Smiley  wrote:
> >
> > I believe we can do a fair amount of re-organization pertaining to Jetty
> without losing the Jetty configuration that I think is valuable to users
> who want to tweak something.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Jun 11, 2021 at 8:01 AM Jan Høydahl 
> wrote:
> >>
> >> +1 to a cleanup here for 9.0. As clean and neat organization as
> possible. Perhaps rename "dist" -> "lib"?
> >>
> >> I wish we could get rid of the server (jetty) folder altogether, and
> move everything from server/solr-webapp/webapp/WEB-INF/lib to "lib/deps/".
> But that ties into custom boot-class, getting rid of web.xml and building
> Jetty context in Java code.. I'm willing to help here if others also want
> to go this direction. This would further hide Jetty as an impl detail and
> let us organize stuff more freely.
> >>
> >> Jan
> >>
> >> 11. jun. 2021 kl. 13:29 skrev David Smiley :
> >>
> >> Bumping this conversation up, based on recent communication.  I have
> yet to take action but really any of us can.
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> On Mon, Nov 23, 2020 at 8:48 AM David Smiley 
> wrote:
> >>>
> >>> I'll proceed on this with lazy consensus.  I suspect most of us don't
> care, unsurprisingly since I doubt anyone has any fondness for the "dist"
> folder.
> >>>
> >>> ~ David Smiley
> >>> Apache Lucene/Solr Search Developer
> >>> http://www.linkedin.com/in/davidwsmiley
> >>>
> >>>
> >>> On Sun, Nov 15, 2020 at 7:31 AM Erick Erickson <
> erickerick...@gmail.com> wrote:
> >>>>
> >>>> Well, Solr has grown “organically” so some things just _are_, like
> sunrises and plagues ;)
> >>>>
> >>>> On a serious note, AFAIC rearrange as you see fit. I wonder how much
> of this is left over from the war days? Anything that’s lasted through all
> the transformations Solr has is bound to need cleaning up betimes.
> >>>>
> >>>> How would it relate to splitting Solr off into its own TLP? On the
> surface, I’d guess the two efforts would be orthogonal, I mention it just
> in case rearranging the layout would make that task easier or harder...
> >>>>
> >>>> > On Nov 15, 2020, at 12:18 AM, David Smiley 
> wrote:
> >>>> >
> >>>> > I've been doing a bit of dependency work in one of our contribs,
> and observing more closely than usual exactly what we produce in the
> distribution layout (result of gradlew assemble).  There are some tricks
> Dawid did in gradle/solr/packaging.gradle to pull off this stunt to keep
> things as they have been for many years.  The distribution layout is
> awkward, I think.  We produce this "dist" folder at the top level that has
> every JAR this project produces, *even contribs*.  But why?  I think
> contribs should keep to themselves.  It's ridiculous that /contribs/ltr/ is
> empty except for a README.

Re: Propose changing the "dist" layout

2021-06-11 Thread David Smiley
I believe we can do a fair amount of re-organization pertaining to Jetty
without losing the Jetty configuration that I think is valuable to users
who want to tweak something.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jun 11, 2021 at 8:01 AM Jan Høydahl  wrote:

> +1 to a cleanup here for 9.0. As clean and neat organization as possible.
> Perhaps rename "dist" -> "lib"?
>
> I wish we could get rid of the server (jetty) folder altogether, and move
> everything from server/solr-webapp/webapp/WEB-INF/lib to "lib/deps/". But
> that ties into custom boot-class, getting rid of web.xml and building Jetty
> context in Java code.. I'm willing to help here if others also want to go
> this direction. This would further hide Jetty as an impl detail and let us
> organize stuff more freely.
>
> Jan
>
> 11. jun. 2021 kl. 13:29 skrev David Smiley :
>
> Bumping this conversation up, based on recent communication.  I have yet
> to take action but really any of us can.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Nov 23, 2020 at 8:48 AM David Smiley  wrote:
>
>> I'll proceed on this with lazy consensus.  I suspect most of us don't
>> care, unsurprisingly since I doubt anyone has any fondness for the "dist"
>> folder.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sun, Nov 15, 2020 at 7:31 AM Erick Erickson 
>> wrote:
>>
>>> Well, Solr has grown “organically” so some things just _are_, like
>>> sunrises and plagues ;)
>>>
>>> On a serious note, AFAIC rearrange as you see fit. I wonder how much of
>>> this is left over from the war days? Anything that’s lasted through all the
>>> transformations Solr has is bound to need cleaning up betimes.
>>>
>>> How would it relate to splitting Solr off into its own TLP? On the
>>> surface, I’d guess the two efforts would be orthogonal, I mention it just
>>> in case rearranging the layout would make that task easier or harder...
>>>
>>> > On Nov 15, 2020, at 12:18 AM, David Smiley  wrote:
>>> >
>>> > I've been doing a bit of dependency work in one of our contribs, and
>>> observing more closely than usual exactly what we produce in the
>>> distribution layout (result of gradlew assemble).  There are some tricks
>>> Dawid did in gradle/solr/packaging.gradle to pull off this stunt to keep
>>> things as they have been for many years.  The distribution layout is
>>> awkward, I think.  We produce this "dist" folder at the top level that has
>>> every JAR this project produces, *even contribs*.  But why?  I think
>>> contribs should keep to themselves.  It's ridiculous that /contribs/ltr/ is
>>> empty except for a README.txt... IMO it ought to have the JAR in a "lib"
>>> subdirectory there mixed with its dependencies (LTR has none but others
>>> sure do).  Today, each contrib's JAR is in "/dist".  And what about SolrJ?
>>> I think SolrJ is important enough that it deserves its very own top-level
>>> directory "solrj", and like the contribs, with a "lib" alongside it.  Maybe
>>> Solrj's optional dependencies could be in a lib-optional dir next to it or
>>> lib/opt/ (beneath it).  Then... we don't need "dist" at all.  It contains
>>> the solr-core JAR but this is redundant.  Furthermore, the server webapp
>>> could be configured to add the SolrJ libs so that we don't need to
>>> redundantly put any of them in the distribution.  There might be some
>>> duplicated jars overall, but not many.  Logging libs might be explicitly
>>> excluded so that they are only in one spot.  (Logging in Java is a mess)
>>> >
>>> > WDYT?
>>> >
>>> > ~ David Smiley
>>> > Apache Lucene/Solr Search Developer
>>> > http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>


Re: Propose changing the "dist" layout

2021-06-11 Thread David Smiley
Bumping this conversation up, based on recent communication.  I have yet to
take action but really any of us can.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Nov 23, 2020 at 8:48 AM David Smiley  wrote:

> I'll proceed on this with lazy consensus.  I suspect most of us don't
> care, unsurprisingly since I doubt anyone has any fondness for the "dist"
> folder.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Nov 15, 2020 at 7:31 AM Erick Erickson 
> wrote:
>
>> Well, Solr has grown “organically” so some things just _are_, like
>> sunrises and plagues ;)
>>
>> On a serious note, AFAIC rearrange as you see fit. I wonder how much of
>> this is left over from the war days? Anything that’s lasted through all the
>> transformations Solr has is bound to need cleaning up betimes.
>>
>> How would it relate to splitting Solr off into its own TLP? On the
>> surface, I’d guess the two efforts would be orthogonal, I mention it just
>> in case rearranging the layout would make that task easier or harder...
>>
>> > On Nov 15, 2020, at 12:18 AM, David Smiley  wrote:
>> >
>> > I've been doing a bit of dependency work in one of our contribs, and
>> observing more closely than usual exactly what we produce in the
>> distribution layout (result of gradlew assemble).  There are some tricks
>> Dawid did in gradle/solr/packaging.gradle to pull off this stunt to keep
>> things as they have been for many years.  The distribution layout is
>> awkward, I think.  We produce this "dist" folder at the top level that has
>> every JAR this project produces, *even contribs*.  But why?  I think
>> contribs should keep to themselves.  It's ridiculous that /contribs/ltr/ is
>> empty except for a README.txt... IMO it ought to have the JAR in a "lib"
>> subdirectory there mixed with its dependencies (LTR has none but others
>> sure do).  Today, each contrib's JAR is in "/dist".  And what about SolrJ?
>> I think SolrJ is important enough that it deserves its very own top-level
>> directory "solrj", and like the contribs, with a "lib" alongside it.  Maybe
>> Solrj's optional dependencies could be in a lib-optional dir next to it or
>> lib/opt/ (beneath it).  Then... we don't need "dist" at all.  It contains
>> the solr-core JAR but this is redundant.  Furthermore, the server webapp
>> could be configured to add the SolrJ libs so that we don't need to
>> redundantly put any of them in the distribution.  There might be some
>> duplicated jars overall, but not many.  Logging libs might be explicitly
>> excluded so that they are only in one spot.  (Logging in Java is a mess)
>> >
>> > WDYT?
>> >
>> > ~ David Smiley
>> > Apache Lucene/Solr Search Developer
>> > http://www.linkedin.com/in/davidwsmiley
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: Branch branch_8_9 has been cut and versions updated to 8.10 on stable branch

2021-06-08 Thread David Smiley
I reviewed and merged it to the relevant branches.  Thanks for your
patience Mayya!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jun 8, 2021 at 9:44 AM Thomas Wöckinger 
wrote:

> I already fixed it, I hope David can review it as soon as possible. As it
> is a behaviour change, I think we should get this one in 8.9
>
> On Tue, Jun 8, 2021 at 3:13 PM Mayya Sharipova
>  wrote:
>
>> Hi Thomas,
>> thanks for letting us know about this issue. I see that you opened
>> SOLR-15457  <https://issues.apache.org/jira/browse/SOLR-15457>. Is this
>> a blocker for 8.9? Should we wait till it gets fixed?
>>
>> On Tue, Jun 8, 2021 at 1:45 AM Thomas Wöckinger <
>> thomas.woeckin...@gmail.com> wrote:
>>
>>> Hi Mayya,
>>>
>>> I tested the branch_8_9 yesterday and found a behavioral change when
>>> using Faceting with Enum fields, the returned 'val' in the buckets has
>>> changed from EnumFieldValue to a String of the corresponding ordinal.
>>> I think it is related to
>>> https://issues.apache.org/jira/browse/SOLR-15191, I will open a new
>>> issue for it.
>>>
>>> On Fri, Jun 4, 2021 at 8:24 PM Mayya Sharipova
>>>  wrote:
>>>
>>>> Please observe the normal rules:
>>>>
>>>> * No new features may be committed to the branch.
>>>> * Documentation patches, build patches and serious bug fixes may be
>>>>   committed to the branch. However, you should submit all patches you
>>>>   want to commit to Jira first to give others the chance to review
>>>>   and possibly vote against the patch. Keep in mind that it is our
>>>>   main intention to keep the branch as stable as possible.
>>>> * All patches that are intended for the branch should first be committed
>>>>   to the unstable branch, merged into the stable branch, and then into
>>>>   the current release branch.
>>>> * Normal unstable and stable branch development may continue as usual.
>>>>   However, if you plan to commit a big change to the unstable branch
>>>>   while the branch feature freeze is in effect, think twice: can't the
>>>>   addition wait a couple more days? Merges of bug fixes into the branch
>>>>   may become more difficult.
>>>> * Only Jira issues with Fix version 8.9 and priority "Blocker" will
>>>> delay
>>>>   a release candidate build.
>>>>
>>>


Re: Release Lucene/Solr 8.9.0 should we have it soon

2021-06-01 Thread David Smiley
+1 to Jan's comment; no need to hold up the release.

I also think we should be open to more releases in the future for 8.x.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jun 1, 2021 at 4:55 PM Jan Høydahl  wrote:

> Let's not hold up the release due to this incomplete PR. It obviously
> needs more time for completion and there is always a new train to catch.
> As far as I understand, Circuit breakers are pluggable, so anyone can
> configure their own implementation in the meantime?
>
> Jan
>
> 1. jun. 2021 kl. 22:13 skrev Atri Sharma :
>
> I appreciate you fixing this and adding the new circuit breaker and look
> forward to having it in the hands of our users soon.
>
> However, the current state of PR, with significant API churn for a single
> change and overlapping code is not yet ready.
>
> If this is too much of a rework, I am happy to take the existing PR and do
> the changes, post which I believe the PR should be close to completion.
>
> Let me know if you need me to help, but unfortunately, the two objections
> I raised are blockers, atleast until we establish that they cannot be done
> away with.
>
>
> On Wed, 2 Jun 2021, 01:37 Walter Underwood,  wrote:
>
>> I would appreciate a second opinion on the pull request. Substantive
>> issues have been resolved. At this point, the discussion is about code
>> style and coding standards. I don’t have detailed knowledge about the Solr
>> coding style, so I’d appreciate another set of eyes.
>>
>> The current behavior is buggy, and we are not able to use it at Chegg.
>> The patch fixes those bugs.
>>
>> https://github.com/apache/solr/pull/96
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Jun 1, 2021, at 12:27 PM, Walter Underwood 
>> wrote:
>>
>> I answered the comments. I don’t see those answers on github, oddly.
>>
>> I’ll re-answer them. Most of your questions are already answered in the
>> discussion on Jira.
>>
>> I central issues is that load average is not always a CPU measure. In
>> some systems, it includes threads in iowait. So it is potentially
>> misleading to label it as CPU and document it as CPU. The updated
>> documentation makes that clear, so that should have already answered your
>> comment. that is why it is important to rename the existing circuit breaker.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Jun 1, 2021, at 12:20 PM, Atri Sharma  wrote:
>>
>> I tool a look at the PR and gave comments for SOLR-15056, and the last I
>> checked, my comments were not addressed?
>>
>> On Wed, 2 Jun 2021, 00:31 Walter Underwood, 
>> wrote:
>>
>>> Could someone else please take a look at SOLR-15056? This is a small
>>> blast radius change that improves the circuit breakers. It includes unit
>>> tests and documentation and has been ready since January.
>>>
>>> https://github.com/apache/solr/pull/96/files
>>> https://issues.apache.org/jira/browse/SOLR-15056
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>> On Jun 1, 2021, at 11:53 AM, Mayya Sharipova <
>>> mayya.sharip...@elastic.co.INVALID> wrote:
>>>
>>> Thank you for the update, Houston.
>>>
>>> I've started the release process, the branch 8.9 is now cut.
>>>
>>> On Tue, Jun 1, 2021 at 11:21 AM Houston Putman 
>>> wrote:
>>>
>>>> Mayya, SOLR-14978 is now in 8.x. So no longer a blocker.
>>>>
>>>> - Houston
>>>>
>>>> On Thu, May 27, 2021 at 11:42 PM David Smiley 
>>>> wrote:
>>>>
>>>>> SOLR-15412 is rather serious as the title suggests.  I haven't been
>>>>> tracking the progress so if it's already resolved, that's unknown to me 
>>>>> and
>>>>> isn't reflected in JIRA.
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>>
>>>>> On Thu, May 27, 2021 at 5:24 PM Mayya Sharipova <
>>>>> mayya.sharip...@elastic.co.invalid> wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>> I wonder if everyone is ok for May 31st (Monday) as the 

Re: Welcome Greg Miller as Lucene committer

2021-05-30 Thread David Smiley
Uh; I mean, "Congratulations and Welcome Greg"   LOL sorry

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, May 30, 2021 at 5:24 PM David Smiley  wrote:

> Congrats and welcome Mike!
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sat, May 29, 2021 at 3:47 PM Adrien Grand  wrote:
>
>> I'm pleased to announce that Greg Miller has accepted the PMC's
>> invitation to become a committer.
>>
>> Greg, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations and welcome!
>>
>> --
>> Adrien
>>
>


Re: Welcome Greg Miller as Lucene committer

2021-05-30 Thread David Smiley
Congrats and welcome Mike!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, May 29, 2021 at 3:47 PM Adrien Grand  wrote:

> I'm pleased to announce that Greg Miller has accepted the PMC's invitation
> to become a committer.
>
> Greg, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>


Re: Release Lucene/Solr 8.9.0 should we have it soon

2021-05-27 Thread David Smiley
SOLR-15412 is rather serious as the title suggests.  I haven't been
tracking the progress so if it's already resolved, that's unknown to me and
isn't reflected in JIRA.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, May 27, 2021 at 5:24 PM Mayya Sharipova
 wrote:

> Hello everyone,
> I wonder if everyone is ok for May 31st (Monday) as the date for the
> feature freeze date and branch cut?
> I've noticed that `releaseWizard.py` is also asking for the length of
> feature freeze. What is the custom length to put there?
>
> Looks like Lucene
> <https://issues.apache.org/jira/projects/LUCENE/versions/12349562>
> doesn't have any unresolved issues for 8.9.
> SOLR <https://issues.apache.org/jira/projects/SOLR/versions/12349563> has:
> -  SOLR-15412  Strict validation on Replica metadata can cause complete
> outage  (Looks like it may be resolved already?)
> - SOLR-15410 GC log is directed to console when starting Solr with Java 11
> Open J9 on Windows
> - SOLR-15056  CPU circuit breaker needs to use CPU utilization, not Unix
> load average
>
> Are we ok to postpone these issues to later releases if they are not
> resolved and merged before feature freeze?
>
> Thank you.
>
>
>
>
>
>
> On Tue, May 25, 2021 at 12:41 PM Colvin Cowie 
> wrote:
>
>> Hello,
>> Eric was going to have a look at the PR.
>> But if it isn't done in time then I don't think it needs to block the
>> release
>>
>> Thanks
>>
>> On Tue, 25 May 2021 at 15:50, Mayya Sharipova
>>  wrote:
>>
>>> Hello Colvin,
>>> I am wondering if you still want to merge SOLR-15410 for the Lucene/Solr
>>> 8.9 release?
>>> Should we have a deadline for feature freeze? Say May 30th (Sunday)?
>>>
>>> Thank you.
>>>
>>> On Tue, May 18, 2021 at 8:49 AM Noble Paul  wrote:
>>>
>>>> +1
>>>>
>>>>
>>>> On Tue, May 18, 2021 at 9:30 PM Colvin Cowie <
>>>> colvin.cowie@gmail.com> wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > I raised SOLR-15410 yesterday with a PR to fix an issue with GC
>>>> logging when using new versions of OpenJ9. It's small, so if somebody could
>>>> have a look at it in time for 8.9 that would be great
>>>> >
>>>> > Thanks,
>>>> > Colvin
>>>> >
>>>> > On Thu, 13 May 2021 at 17:52, Nhat Nguyen 
>>>> > 
>>>> wrote:
>>>> >>
>>>> >> Hi Mayya,
>>>> >>
>>>> >> I would like to backport LUCENE-9935, which enables bulk-merge for
>>>> stored fields with index sort, to 8.x this weekend. The patch is ready, but
>>>> we prefer to give CI some cycles before backporting. Please let me know if
>>>> it's okay with the release plan.
>>>> >>
>>>> >> Thanks,
>>>> >> Nhat
>>>> >>
>>>> >> On Thu, May 13, 2021 at 12:44 PM Gus Heck 
>>>> wrote:
>>>> >>>
>>>> >>> Perhaps https://issues.apache.org/jira/browse/SOLR-15378 should be
>>>> investigated before 8.9, maybe make it a blocker?
>>>> >>>
>>>> >>> On Thu, May 13, 2021 at 1:35 AM Robert Muir 
>>>> wrote:
>>>> >>>>
>>>> >>>> Mayya, I created backport for Adrien's issue here, to try to help
>>>> out:
>>>> >>>> https://github.com/apache/lucene-solr/pull/2495
>>>> >>>>
>>>> >>>> Personally, I felt that merging non-trivial changes from main
>>>> branch
>>>> >>>> to 8.x has some additional risks when cherry-picking:
>>>> >>>> * structural changes in main branch making merging more difficult
>>>> >>>> (e.g. LUCENE-9705 reorganization of codec versioning, great change
>>>> >>>> moving forwards though)
>>>> >>>> * there are many style changes due to spotless in main branch which
>>>> >>>> add noise to merging against old code.
>>>> >>>> * In the specific case of LUCENE-9827, the usual additional tricky
>>>> >>>> backwards compatibility for 8.x must be added in the backport (due
>>>> to
>>>> >>>> minor version bumps there) which can go wrong.
>>>> >>>>
>>>> >>>> I still think

Re: Lucene 9.0 snapshot names

2021-05-24 Thread David Smiley
Sounds like this would be a good addition to /dev-docs/...

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, May 24, 2021 at 12:56 PM Uwe Schindler  wrote:

> Thank for this tipp! Helps for Solr, too. I was giving up because it
> always wanted to sign, that Jenkins can't easily do.
>
> Uwe
>
> Am May 24, 2021 8:03:51 AM UTC schrieb Alan Woodward  >:
>>
>> Passing -x signJarsPublication skipped the signing step so I’m good to
>> go.  Thanks everyone for the help!
>>
>> On 23 May 2021, at 21:11, Dawid Weiss  wrote:
>>
>>
>> Create a temporary pgp key for use with signing and use it to sign your
>> maven artifacts? I don't know if there is a way to use an agent - perhaps
>> there is. Hoss did some work with manual artifact signing recently (and
>> this used the agent). I never had the need for this.
>>
>> Dawid
>>
>> On Sat, May 22, 2021 at 4:50 PM Alan Woodward 
>> wrote:
>>
>>> Passing -Dversion.suffix does indeed work, thanks Uwe!  The next Yak to
>>> shave is that gradle is now complaining that it can’t sign the artefacts.
>>> From my reading it seems that I have to set things up in my
>>> gradle.properties file, including my password in plain text.  This seems …
>>> wrong?  I don’t actually need these artefacts signed anyway, so does anyone
>>> with more gradle-fu than me know either a) how to skip the signing step or
>>> b) how to set things up so that they are signed correctly without having my
>>> PGP password sitting in a plain text file.
>>>
>>> Thanks!
>>>
>>> On 20 May 2021, at 14:19, Uwe Schindler  wrote:
>>>
>>> The default suffix in this system prop is "SNAPSHOT" and the timestamp
>>> comes then from Maven's internal Logic, this cannot be changed.
>>>
>>> By overriding the suffix explicit (as said before and find by Jenkins)
>>> you convert it to an official "release" in Maven's sense and it is no
>>> longer a snapshot. So you are free with versioning.
>>>
>>> Uwe
>>>
>>> Am May 20, 2021 1:15:12 PM UTC schrieb Uwe Schindler :
>>>>
>>>> Jenkins does this already:
>>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/242/
>>>>
>>>> It uses build number!
>>>>
>>>> The system property "version suffix" is responsible and is set by
>>>> Jenkins. See in command line: [Lucene-Artifacts-main] $
>>>> /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-Artifacts-main/gradlew
>>>> -Dlucene.javadoc.url=
>>>> https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc
>>>> -Dversion.suffix=jenkins242 assemble
>>>>
>>>> Uwe
>>>>
>>>> Am May 20, 2021 12:25:48 PM UTC schrieb Michael Sokolov <
>>>> msoko...@gmail.com>:
>>>>>
>>>>> In principal it makes sense, but is there any chance the build
>>>>> artifact could vary for the same SHA? We hope not, I think, but stranger
>>>>> things have happened. Probably an edge case not worth worrying about
>>>>> though, and relying on the build server's clock doesn't seem great, so +1
>>>>> from me, although I don't use these so my interest is mostly theoretical.
>>>>>
>>>>> On Thu, May 20, 2021, 8:20 AM Alan Woodward 
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I’m preparing a local lucene 9.0 snapshot build and I notice that the
>>>>>> jar files generated by `./gradlew mavenToLocalFolder` are called 
>>>>>> something
>>>>>> like `lucene-suggest-9.0.0-20210520.111833-1-javadoc.jar` - in other 
>>>>>> words,
>>>>>> they are including a timestamp.  For my setup I’d like to replace this 
>>>>>> with
>>>>>> the git SHA of the commit the snapshot is based on.  So I have two
>>>>>> questions:
>>>>>>
>>>>>> 1) Is there a simple override or gradle property that I can pass on
>>>>>> the command line that will change the output names of artefacts?
>>>>>> 2) I think in general commit SHAs are better than timestamps for
>>>>>> snapshot names - two identical snapshots taken from identical sources at
>>>>>> different times shouldn’t really have different names.  Should we look at
>>>>>> changing the existing snapshot generation code to switch to using SHAs?
>>>>>>
>>>>>> - Alan
>>>>>> -
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>> --
>>>> Uwe Schindler
>>>> Achterdiek 19, 28357 Bremen
>>>> https://www.thetaphi.de
>>>
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://www.thetaphi.de
>>>
>>>
>>>
>>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>


Re: debugging query execution plan

2021-05-07 Thread David Smiley
Thanks for the clarification Greg. I've been looking into this recently and
filed https://issues.apache.org/jira/browse/LUCENE-9938 based on a hunch
that these DocIdSetIterator.all(maxDoc) iterators have a
non-negligible cost inside ConjunctionDISI.  Ultimately I closed the issue
because the TPI design seems to prohibit removing them  :-(.  Feel free to
comment there nonetheless if you have any thoughts on the matter.  For my
part, I have some benchmarking to do in Solr for a related matter that
would move certain queries that work at the collector stage to be TPIs
-- SOLR-14164.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, May 7, 2021 at 6:06 PM Greg Miller  wrote:

> Just chiming in here to answer David's question since I have some
> familiarity:
>
> In this specific case, the logic was implemented inside a Collector
> and we tried to move it into a Query abstraction using a
> TwoPhaseIterator with a high matchCost. The first-phase would match on
> all docs (essentially: DocIdSetIterator.all(reader.maxDoc())) and the
> second phase would do the costly check. The matchCost was advertised
> as reader.maxDoc(). ("reader" in this example is from the
> LeafReaderContext).
>
> Moving the logic behind a Query abstraction caused performance
> regressions. So one theory is that it was somehow leading iteration
> with an expensive "match all docs" DISI, but we don't actually know if
> that's true right now.
>
> Cheers,
> -Greg
>
> On Fri, May 7, 2021 at 8:41 AM David Smiley  wrote:
> >
> > Instead of a Collector, why isn't this a TwoPhaseIterator with a high
> matchCost?
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Thu, May 6, 2021 at 6:43 PM Michael Sokolov 
> wrote:
> >>
> >> Thanks Adrien, that is something like what I had in mind. If you are
> >> able to share, that could be very helpful. And -- deleted docs is not
> >> something I had considered, it's possibly a problem here. I'd have to
> >> go check - I think these "filter" Queries were implemented in the
> >> second part of the two-phase iteration.
> >>
> >> On Thu, May 6, 2021 at 4:24 PM Adrien Grand  wrote:
> >> >
> >> > We have something like that in Elasticsearch that wraps queries in
> order to be able to report cost, matchCost and the number of calls to
> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in
> the query tree.
> >> >
> >> > It's not perfect as it needs to disable some optimizations in order
> to work properly. For instance bulk scorers are disabled and conjunctions
> are not inlined, which means that clauses may run in a different order. So
> results need to be interpreted carefully as the way the query gets executed
> when observed may differ a bit from how it gets executed normally. That
> said it has still been useful in a number of cases. I don't think our
> implementation works when IndexSearcher is configured with an executor but
> we could maybe put it in sandbox and iterate from there?
> >> >
> >> > For your case, do you think it could be attributed to deleted docs?
> Deleted docs are checked before two-phase confirmation and collectors but
> after disjunctions/conjunctions of postings.
> >> >
> >> > Le jeu. 6 mai 2021 à 20:20, Michael Sokolov  a
> écrit :
> >> >>
> >> >> Do we have a way to understand how BooleanQuery (and other composite
> >> >> queries) are advancing their child queries? For example, a simple
> >> >> conjunction of two queries advances the more restrictive (lower
> >> >> cost()) query first, enabling the more costly query to skip over more
> >> >> documents. But we may not be making the best choice in every case,
> and
> >> >> I would like to know, for some query, how we are doing. For example,
> >> >> we could execute in a debugging mode, interposing something that
> wraps
> >> >> or observes the Scorers in some way, gathering statistics about how
> >> >> many documents are visited by each Scorer, which can be aggregated
> for
> >> >> later analysis.
> >> >>
> >> >> This is motivated by a use case we have in which we currently
> >> >> post-filter our query results in a custom collector using some
> filters
> >> >> that we know to be expensive (they must be evaluated on every
> >> >> document),

Re: debugging query execution plan

2021-05-07 Thread David Smiley
Instead of a Collector, why isn't this a TwoPhaseIterator with a high
matchCost?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, May 6, 2021 at 6:43 PM Michael Sokolov  wrote:

> Thanks Adrien, that is something like what I had in mind. If you are
> able to share, that could be very helpful. And -- deleted docs is not
> something I had considered, it's possibly a problem here. I'd have to
> go check - I think these "filter" Queries were implemented in the
> second part of the two-phase iteration.
>
> On Thu, May 6, 2021 at 4:24 PM Adrien Grand  wrote:
> >
> > We have something like that in Elasticsearch that wraps queries in order
> to be able to report cost, matchCost and the number of calls to
> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in
> the query tree.
> >
> > It's not perfect as it needs to disable some optimizations in order to
> work properly. For instance bulk scorers are disabled and conjunctions are
> not inlined, which means that clauses may run in a different order. So
> results need to be interpreted carefully as the way the query gets executed
> when observed may differ a bit from how it gets executed normally. That
> said it has still been useful in a number of cases. I don't think our
> implementation works when IndexSearcher is configured with an executor but
> we could maybe put it in sandbox and iterate from there?
> >
> > For your case, do you think it could be attributed to deleted docs?
> Deleted docs are checked before two-phase confirmation and collectors but
> after disjunctions/conjunctions of postings.
> >
> > Le jeu. 6 mai 2021 à 20:20, Michael Sokolov  a
> écrit :
> >>
> >> Do we have a way to understand how BooleanQuery (and other composite
> >> queries) are advancing their child queries? For example, a simple
> >> conjunction of two queries advances the more restrictive (lower
> >> cost()) query first, enabling the more costly query to skip over more
> >> documents. But we may not be making the best choice in every case, and
> >> I would like to know, for some query, how we are doing. For example,
> >> we could execute in a debugging mode, interposing something that wraps
> >> or observes the Scorers in some way, gathering statistics about how
> >> many documents are visited by each Scorer, which can be aggregated for
> >> later analysis.
> >>
> >> This is motivated by a use case we have in which we currently
> >> post-filter our query results in a custom collector using some filters
> >> that we know to be expensive (they must be evaluated on every
> >> document), but we would rather express these post-filters as Queries
> >> and have them advanced during the main Query execution. However when
> >> we tried to do that, we saw some slowdowns (in spite of marking these
> >> Queries as high-cost) and I suspect it is due to the iteration order,
> >> but I'm not sure how to debug.
> >>
> >> Suggestions welcome!
> >>
> >> -Mike
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Publishing a snapshots to nightlies.apache.org

2021-05-06 Thread David Smiley
The Solr project is interested in consuming specific snapshots of Lucene
builds in order to pin the dependency temporarily until 9.0 is released.
By specific snapshots, I don't mean something that changes daily, I mean an
immutable versioned snapshot.  Cassandra suggested that
https://nightlies.apache.org/ might be used.  Does someone have
experience/know-how here on what would be involved in having a Jenkins job
publish artifacts in a Maven repo structure?  I don't mean to propose doing
this on a regular basis; it would be upon request with those having
permission to do so.  If this is easy then great!  If it isn't, then
no worries -- I could perform some manual steps and deploy to
http://home.apache.org/~dsmiley/ instead.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: Welcome Zach Chen as Lucene committer

2021-04-19 Thread David Smiley
Welcome Zach!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Apr 19, 2021 at 10:14 AM Adrien Grand  wrote:

> I'm pleased to announce that Zach Chen has accepted the PMC's invitation
> to become a committer.
>
> Zach, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>


Re: Welcome Peter Gromov as Lucene committer

2021-04-06 Thread David Smiley
Welcome Peter!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Apr 6, 2021 at 1:48 PM Robert Muir  wrote:

> I'm pleased to announce that Peter Gromov has accepted the PMC's
> invitation to become a committer.
>
> Peter, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
>


Re: Redirect build logs from Solr 8.x branch to bui...@solr.apache.org

2021-04-02 Thread David Smiley
+1 Certainly; shouldn't be controversial at all?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Apr 2, 2021 at 11:02 AM Michael Sokolov  wrote:

> +1 it would be nice to be able to sort these out differently with filters
>
> On Fri, Apr 2, 2021 at 3:54 AM Dawid Weiss  wrote:
> >
> >
> > Hi folks!
> >
> > I know the development repository for 8x stays in the previous location
> but can we (should we) update the mailing list address on Solr 8x build
> jobs to point at bui...@solr.apache.org?
> >
> > D.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>


Re: New blog post about merge-on-refresh!

2021-03-30 Thread David Smiley
Nice post -- this work item was definitely a big collaboration!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 22, 2021 at 10:40 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi Team,
>
> I just published a new blog post about all the fun open-source
> excitement in building Lucene's new merge-on-refresh (in 8.7.0) and
> merge-on-commit features (in 8.6.0):
>
> https://twitter.com/mikemccand/status/1373602525505540099?s=20
>
>
> http://blog.mikemccandless.com/2021/03/open-source-collaboration-or-how-we.html
>
> The full story was surprisingly complex and tricky :)
>
> Thank you to everyone who helped!
>
> This feature drops our (Amazon product search) average per-shard segment
> count across our full world-wide fleet by ~25%.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


Re: [NOTICE] Old git branches will be pruned (in lucene.git repo)

2021-03-15 Thread David Smiley
What's the point of even having a tag for "branch_8x", "branch_7x" etc.?
Their very existence was fundamentally to commit code to, and were
constantly moving forward as work happens.  They will still exist in
lucene-solr repo, so "no history is lost" will be true as well.
Having tags for actual releases (e.g. for 8.8, 8.7, etc.) is great for
doing quick IDE comparisons to see how code changed.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 15, 2021 at 9:50 AM Jan Høydahl  wrote:

> Hi,
>
> With the new lucene.git repo up and running, we (Uwe, Dawid and I) like to
> get rid of some clutter.
>
> We  discussed on Slack and later her on list[1] the option of pruning all
> the 112 old branches. It makes no sense to keep stale branch_x_y branches
> in lucene.git repo, as any future 8.x or 7.x release will happen from
> lucene-solr.git, so keeping them as branches in lucene.git is duplication
> and only gives room for developer mistakes. If branch_8_8 does not exist in
> lucene.git repo, noone will push to it, and rather remember to make a patch
> for lucene-solr.git instead.
>
> So my plan is to remove all branches in the new lucene.git repostiory and
> leave only the "main" branch. We just did this in solr.git repo (SOLR-15253
> [3]).
>
> We'll do this by replacing each branch with a git tag, e.g. branch_8x will
> be replaced with tag history/branches/lucene-solr/branch_8x. This is the
> same procedure we did when moving from svn to git. *No history is lost!*
>
> The script I intend to run in a few days is attached on LUCENE-9835 [2].
>
> *Should you have a work-in-progress on a branch currently scheduled for
> removal, please reply here to excempt it from removal until it is merged.*
>
> After the removal you can run "git fetch --prune origin" to not see the
> remote branches in your local clone.
>
> PS: The lucene-solr.git repo, where 8.x development continues, will not be
> affected.
>
> [1]
> https://lists.apache.org/thread.html/rc5ac744aa8b081e1e0edb17281d7bb42398a04dcaf6f47421e4a6c41%40%3Cdev.lucene.apache.org%3E
> <https://lists.apache.org/thread.html/rc5ac744aa8b081e1e0edb17281d7bb42398a04dcaf6f47421e4a6c41@%3Cdev.lucene.apache.org%3E>
> [2] https://issues.apache.org/jira/browse/LUCENE-9835
> [3] https://issues.apache.org/jira/browse/SOLR-15253
>


Re: Welcome Bruno to the Apache Lucene PMC

2021-03-11 Thread David Smiley
Welcome Bruno!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Mar 10, 2021 at 7:56 PM Mike Drob  wrote:

> I am pleased to announce that Bruno has accepted an invitation to join the
> Lucene PMC!
>
> Congratulations, and welcome aboard!
>
> Mike
>


Re: Branch cleaning/ archiving

2021-03-10 Thread David Smiley
On Wed, Mar 10, 2021 at 3:17 PM Dawid Weiss  wrote:

> ...People work on their local repos these days anyway, it's
> not like everyone pollutes the same workspace.
>

I very much concur.  When I got started with git, I treated it closer to
what I was previously more familiar with and created branches upstream in
the main repo.  Now I know better -- I use my fork.

If a branch hasn't received a commit in > 2 years, we might auto-purge them
and do this practice regularly.  Before auto-purge, we could even list the
people who last did a commit on the branches to let them know.  With that
notification system, even more regular purges would be good.


Re: Lucene and Solr repositories mirrored, main branch ready

2021-03-10 Thread David Smiley
Thank *you* Dawid!  You and Jan have been big heroes of this transition!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Mar 10, 2021 at 9:36 AM Dawid Weiss  wrote:

> Thank you everyone for the collective effort to clean up stale project
> references, templates, etc.
>
> D.
>
> On Wed, Mar 10, 2021 at 1:04 PM Dawid Weiss  wrote:
> >
> > First of all, apologies for the e-mail commit bomb... Things like that
> > can happen, hard to tell in advance. Thanks to infra for helping out.
> >
> > Solr and Lucene repositories have been cloned at commit 7ada403218.
> >
> > Master branch is wiped out of content on all repositories, branch_8x
> > is wiped on lucene and solr repositories to avoid confusion (8x
> > development takes place at the joint repository).
> >
> > I've removed lucene/solr from each other. Things should work out of
> > the box but if something does not, please file an issue (or better -
> > try to fix it).
> >
> > There is going to be a lot of mundane cleanup work to remove cross
> > references and get the documentation going but it's all a follow-up.
> >
> > Here is a short help guide to port existing PRs:
> > https://github.com/apache/lucene-solr/blob/master/PRs.md
> >
> > Github actions should work too, as shown here:
> > https://github.com/apache/lucene/pull/2
> >
> > Builds can be enabled (perhaps slowly, at first? :).
> >
> > Solr developers: Lucene can be built and installed in your local maven
> > repositories with:
> > gradlew mavenToLocalRepo
> >
> > Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Repository fork (master) about to happen (Wednesday)

2021-03-09 Thread David Smiley
Thanks Jan!

I thought I saw in this thread something about the lucene-solr repo
becoming read-only?  I imagine it would be a good thing to continue
discussions on PRs, even if to simply add a URL pointing to follow-up PRs.
And we'll need to make commits to branch_8x and previous branches for
future releases (e.g. for a vulnerability).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 8, 2021 at 6:58 PM Jan Høydahl  wrote:

> Made a branch in my normal repo:
> https://github.com/cominvent/lucene-solr/tree/silly-pr and spun it up as
> a PR against your solr-only test repo
> https://github.com/dsmiley/lucene-solr/pull/1
> Looks like this won't be a problem :)
>
> I imagine a documentation like this HOWTO I created in the Solr Wiki:
>
> https://cwiki.apache.org/confluence/display/SOLR/Move+your+PR+to+the+new+Solr+GitHub+repo
>
>
> Jan
>
> 8. mar. 2021 kl. 20:15 skrev David Smiley :
>
> Answering my own question -- apparently Solr on master now depends on
> snapshot builds of Lucene published by ASF Nexus:
> https://issues.apache.org/jira/browse/SOLR-14759
> Cool!
>
> About PRs:  Someone should experiment here to see what's involved before
> the split.  To get this started, I pushed a "solr_main" branch to my GitHub
> Solr fork.  All I did was create a branch off of master, then remove Lucene
> and commit & pushed that.  Someone, please try to take one of your existing
> PRs and send it to my fork against solr_main to see how that goes?  This
> needs to be figured out before the split so we all have guidance on how to
> do this without all of us trying to redundantly figure it out at the same
> time.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Mar 8, 2021 at 1:43 PM David Smiley  wrote:
>
>> One of us will get there first and should share.  For my part, I intend
>> to add a new remote, pull the branches from it, then use "git worktree" to
>> add a new local work tree alongside my existing ones (for lucene_solr
>> master, lucene_solr branch_8x).  I call my current remote "apache" but I
>> might first rename it to "apache_pre9". I am not yet sure if I will use
>> another worktree for the new Lucene repo or if I'll do a new clone.
>>
>> I think there's a case to be made for the Lucene repo to rewrite history
>> to remove Solr.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Mon, Mar 8, 2021 at 1:24 PM Mike Drob  wrote:
>>
>>> Can we provide a sequence of git commands for folks to run? Or will the
>>> official guidance be to create new local clones of each repo?
>>>
>>> On Mon, Mar 8, 2021 at 12:18 PM David Smiley  wrote:
>>>
>>>> Yeah, I agree with Jan -- don't rename the GitHub repo.  It's going to
>>>> be painful no matter what and a rename doesn't seem appropriate.
>>>>
>>>> I am curious as to the status of /solr code being buildable without
>>>> /lucene.  The steps above at #2 say for each project to remove the other
>>>> side.  Is Solr ready?  Where will Solr get the Lucene binaries?
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Mon, Mar 8, 2021 at 12:53 PM Jan Høydahl 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> There are 324 open PRs. Some numbers:
>>>>> Number updated last month: 40
>>>>> Number not touched last 4 months: 249 (77%)
>>>>> With LUCENE in title: 93
>>>>> With SOLR in title: 181
>>>>>
>>>>> It would be nice to auto migrate but some times you just have to face
>>>>> changes and do some extra work :)
>>>>> Given how easy it is to create a new PR in the new repo based on the
>>>>> existing PR branch, I say we just clearly document how to do it, and let
>>>>> the ~50 PRs that are actually being worked on be re-created. PR author can
>>>>> add a link to the old one to reference review comments that cannot be
>>>>> carried over.
>>>>>
>>>>> It would be misleading to just rename to either solr or lucene. Much
>>>>> better to leave old repo there with a README notice that people need to
>>>>> clone the new repo(s) or update their remotes.
>>>>>
>>>>> 

The d...@solr.apache.org list

2021-03-09 Thread David Smiley
Just a reminder that Solr development discussions have moved to
d...@solr.apache.org.  I sent a message there yesterday about tracing.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: Repository fork (master) about to happen (Wednesday)

2021-03-08 Thread David Smiley
Answering my own question -- apparently Solr on master now depends on
snapshot builds of Lucene published by ASF Nexus:
https://issues.apache.org/jira/browse/SOLR-14759
Cool!

About PRs:  Someone should experiment here to see what's involved before
the split.  To get this started, I pushed a "solr_main" branch to my GitHub
Solr fork.  All I did was create a branch off of master, then remove Lucene
and commit & pushed that.  Someone, please try to take one of your existing
PRs and send it to my fork against solr_main to see how that goes?  This
needs to be figured out before the split so we all have guidance on how to
do this without all of us trying to redundantly figure it out at the same
time.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 8, 2021 at 1:43 PM David Smiley  wrote:

> One of us will get there first and should share.  For my part, I intend to
> add a new remote, pull the branches from it, then use "git worktree" to add
> a new local work tree alongside my existing ones (for lucene_solr master,
> lucene_solr branch_8x).  I call my current remote "apache" but I might
> first rename it to "apache_pre9". I am not yet sure if I will use another
> worktree for the new Lucene repo or if I'll do a new clone.
>
> I think there's a case to be made for the Lucene repo to rewrite history
> to remove Solr.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Mar 8, 2021 at 1:24 PM Mike Drob  wrote:
>
>> Can we provide a sequence of git commands for folks to run? Or will the
>> official guidance be to create new local clones of each repo?
>>
>> On Mon, Mar 8, 2021 at 12:18 PM David Smiley  wrote:
>>
>>> Yeah, I agree with Jan -- don't rename the GitHub repo.  It's going to
>>> be painful no matter what and a rename doesn't seem appropriate.
>>>
>>> I am curious as to the status of /solr code being buildable without
>>> /lucene.  The steps above at #2 say for each project to remove the other
>>> side.  Is Solr ready?  Where will Solr get the Lucene binaries?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Mon, Mar 8, 2021 at 12:53 PM Jan Høydahl 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> There are 324 open PRs. Some numbers:
>>>> Number updated last month: 40
>>>> Number not touched last 4 months: 249 (77%)
>>>> With LUCENE in title: 93
>>>> With SOLR in title: 181
>>>>
>>>> It would be nice to auto migrate but some times you just have to face
>>>> changes and do some extra work :)
>>>> Given how easy it is to create a new PR in the new repo based on the
>>>> existing PR branch, I say we just clearly document how to do it, and let
>>>> the ~50 PRs that are actually being worked on be re-created. PR author can
>>>> add a link to the old one to reference review comments that cannot be
>>>> carried over.
>>>>
>>>> It would be misleading to just rename to either solr or lucene. Much
>>>> better to leave old repo there with a README notice that people need to
>>>> clone the new repo(s) or update their remotes.
>>>>
>>>> Jan
>>>>
>>>>
>>>> > 8. mar. 2021 kl. 17:21 skrev Uwe Schindler :
>>>> >
>>>> > Hi again,
>>>> >
>>>> > we can maybe "improve" the situation a bit: On Github you can (with
>>>> Admin/Ownership rights) rename a project. So my suggestion:
>>>> >
>>>> > - Check pull requests and count how many affect solr and how many
>>>> affect Lucene.
>>>> > - In cooperation with Infra rename the Github project
>>>> ("apache/lucene-solr.git") to "apache/lucene.git" (if more pull requests
>>>> affect Lucene) or "apache/solr.git" (if more are Solr). The PRs will
>>>> survive the rename. Also the old GitHub URL will redirect to the renamed
>>>> one. The other project should be created as a fork - of course without PRs.
>>>> >
>>>> > We can only do this in cooperation with Apache Infra stuff, because
>>>> we can' change the Github repo settings or rename them using the Github UI.
>>>> >
>>>> > Uwe
>>>> >
>>>> > -
>>>> > Uwe Schindler
>>>> > A

Re: Repository fork (master) about to happen (Wednesday)

2021-03-08 Thread David Smiley
One of us will get there first and should share.  For my part, I intend to
add a new remote, pull the branches from it, then use "git worktree" to add
a new local work tree alongside my existing ones (for lucene_solr master,
lucene_solr branch_8x).  I call my current remote "apache" but I might
first rename it to "apache_pre9". I am not yet sure if I will use another
worktree for the new Lucene repo or if I'll do a new clone.

I think there's a case to be made for the Lucene repo to rewrite history to
remove Solr.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 8, 2021 at 1:24 PM Mike Drob  wrote:

> Can we provide a sequence of git commands for folks to run? Or will the
> official guidance be to create new local clones of each repo?
>
> On Mon, Mar 8, 2021 at 12:18 PM David Smiley  wrote:
>
>> Yeah, I agree with Jan -- don't rename the GitHub repo.  It's going to be
>> painful no matter what and a rename doesn't seem appropriate.
>>
>> I am curious as to the status of /solr code being buildable without
>> /lucene.  The steps above at #2 say for each project to remove the other
>> side.  Is Solr ready?  Where will Solr get the Lucene binaries?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Mon, Mar 8, 2021 at 12:53 PM Jan Høydahl 
>> wrote:
>>
>>> Hi,
>>>
>>> There are 324 open PRs. Some numbers:
>>> Number updated last month: 40
>>> Number not touched last 4 months: 249 (77%)
>>> With LUCENE in title: 93
>>> With SOLR in title: 181
>>>
>>> It would be nice to auto migrate but some times you just have to face
>>> changes and do some extra work :)
>>> Given how easy it is to create a new PR in the new repo based on the
>>> existing PR branch, I say we just clearly document how to do it, and let
>>> the ~50 PRs that are actually being worked on be re-created. PR author can
>>> add a link to the old one to reference review comments that cannot be
>>> carried over.
>>>
>>> It would be misleading to just rename to either solr or lucene. Much
>>> better to leave old repo there with a README notice that people need to
>>> clone the new repo(s) or update their remotes.
>>>
>>> Jan
>>>
>>>
>>> > 8. mar. 2021 kl. 17:21 skrev Uwe Schindler :
>>> >
>>> > Hi again,
>>> >
>>> > we can maybe "improve" the situation a bit: On Github you can (with
>>> Admin/Ownership rights) rename a project. So my suggestion:
>>> >
>>> > - Check pull requests and count how many affect solr and how many
>>> affect Lucene.
>>> > - In cooperation with Infra rename the Github project
>>> ("apache/lucene-solr.git") to "apache/lucene.git" (if more pull requests
>>> affect Lucene) or "apache/solr.git" (if more are Solr). The PRs will
>>> survive the rename. Also the old GitHub URL will redirect to the renamed
>>> one. The other project should be created as a fork - of course without PRs.
>>> >
>>> > We can only do this in cooperation with Apache Infra stuff, because we
>>> can' change the Github repo settings or rename them using the Github UI.
>>> >
>>> > Uwe
>>> >
>>> > -
>>> > Uwe Schindler
>>> > Achterdiek 19, D-28357 Bremen
>>> > https://www.thetaphi.de
>>> > eMail: u...@thetaphi.de
>>> >
>>> >> -Original Message-
>>> >> From: Uwe Schindler 
>>> >> Sent: Monday, March 8, 2021 5:16 PM
>>> >> To: dev@lucene.apache.org
>>> >> Subject: RE: Repository fork (master) about to happen (Wednesday)
>>> >>
>>> >> I think the problem was what happens with the PR in Githubs user
>>> interface.
>>> >>
>>> >> This question was asked many times, answer is simple: NO you can't
>>> move over
>>> >> Pull requests to different repositories on Github! It's also
>>> impossible to export
>>> >> and reimport them. People have to recreate them.
>>> >>
>>> >> Dawid is correct: You can merge the pull request also into another
>>> rlocal
>>> >> repository, but this is impossible with the Github UI. So basically,
>>> you have to
>>> >> read the email that comes in with the Pull Request that lists the

Re: Repository fork (master) about to happen (Wednesday)

2021-03-08 Thread David Smiley
Yeah, I agree with Jan -- don't rename the GitHub repo.  It's going to be
painful no matter what and a rename doesn't seem appropriate.

I am curious as to the status of /solr code being buildable without
/lucene.  The steps above at #2 say for each project to remove the other
side.  Is Solr ready?  Where will Solr get the Lucene binaries?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 8, 2021 at 12:53 PM Jan Høydahl  wrote:

> Hi,
>
> There are 324 open PRs. Some numbers:
> Number updated last month: 40
> Number not touched last 4 months: 249 (77%)
> With LUCENE in title: 93
> With SOLR in title: 181
>
> It would be nice to auto migrate but some times you just have to face
> changes and do some extra work :)
> Given how easy it is to create a new PR in the new repo based on the
> existing PR branch, I say we just clearly document how to do it, and let
> the ~50 PRs that are actually being worked on be re-created. PR author can
> add a link to the old one to reference review comments that cannot be
> carried over.
>
> It would be misleading to just rename to either solr or lucene. Much
> better to leave old repo there with a README notice that people need to
> clone the new repo(s) or update their remotes.
>
> Jan
>
>
> > 8. mar. 2021 kl. 17:21 skrev Uwe Schindler :
> >
> > Hi again,
> >
> > we can maybe "improve" the situation a bit: On Github you can (with
> Admin/Ownership rights) rename a project. So my suggestion:
> >
> > - Check pull requests and count how many affect solr and how many affect
> Lucene.
> > - In cooperation with Infra rename the Github project
> ("apache/lucene-solr.git") to "apache/lucene.git" (if more pull requests
> affect Lucene) or "apache/solr.git" (if more are Solr). The PRs will
> survive the rename. Also the old GitHub URL will redirect to the renamed
> one. The other project should be created as a fork - of course without PRs.
> >
> > We can only do this in cooperation with Apache Infra stuff, because we
> can' change the Github repo settings or rename them using the Github UI.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >> -Original Message-
> >> From: Uwe Schindler 
> >> Sent: Monday, March 8, 2021 5:16 PM
> >> To: dev@lucene.apache.org
> >> Subject: RE: Repository fork (master) about to happen (Wednesday)
> >>
> >> I think the problem was what happens with the PR in Githubs user
> interface.
> >>
> >> This question was asked many times, answer is simple: NO you can't move
> over
> >> Pull requests to different repositories on Github! It's also impossible
> to export
> >> and reimport them. People have to recreate them.
> >>
> >> Dawid is correct: You can merge the pull request also into another
> rlocal
> >> repository, but this is impossible with the Github UI. So basically,
> you have to
> >> read the email that comes in with the Pull Request that lists the link
> to the
> >> branch and patch. Then use git command line (or Tortoise) and copypaste
> the
> >> URL there as "source branch" for the merge. Then you execute the merge,
> >> squash and commit/push.
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> https://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>> -Original Message-
> >>> From: Dawid Weiss 
> >>> Sent: Monday, March 8, 2021 3:25 PM
> >>> To: Lucene Dev 
> >>> Subject: Re: Repository fork (master) about to happen (Wednesday)
> >>>
> >>>> What happens to open PRs?
> >>>
> >>> A pull request on github is essentially a diff between two commits.
> >>> Existing PRs have to be placed on top of the new development branch.
> >>> Note these repositories are (initially) identical with lucene-solr so
> >>> if somebody clones the solr repo and the solr-lucene repo, they can
> >>> create the same PR over at the new project (pointing at the new main
> >>> branch as a reference).
> >>>
> >>> Dawid
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Removal of Apache HttpComponents/HttpClient for 9.0?

2021-03-05 Thread David Smiley
I filed an issue: https://issues.apache.org/jira/browse/SOLR-15223
"Deprecate HttpSolrClient, mark httpcomponents dep as "optional" in SolrJ"

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Mar 5, 2021 at 2:45 PM Jan Høydahl  wrote:

> Auth tie in to core, solrJ, AdminUI and bin/solr. Currently we can only
> make packages for core. Should look in to extend the package spec to
> support plugging in these other parts too.
>
> But guess we could make Core parts of the auth plug-ins into (1st party)
> packages and just leave the html and, bin/solr parts where they are.
>
> I vote for one package per auth type. Perhaps basic auth can remain in
> core, it does not have any jar reps?
>
> Jan Høydahl
>
> 5. mar. 2021 kl. 18:59 skrev Tomás Fernández Löbbe  >:
>
> 
> +1 David
> > Oh I see; there are entanglements with Solr's authentication plugins
> Maybe we should move the authentication plugins to contribs (don't know if
> we'll need one or two, one for client-side and one for server-side? I
> haven't looked much at the code). Plus, we are shipping with multiple
> authentication options, while at most one will be used.
>
> > An even smaller baby step is to mark the httpclient dependency as
> "optional" in the Maven pom we generate.  This is a clue to consumers to
> move on
> > Also marking HttpSolrClient deprecated
> +1
>
> On Fri, Mar 5, 2021 at 8:18 AM David Smiley 
> wrote:
>
>> An even smaller baby step is to mark the httpclient dependency as
>> "optional" in the Maven pom we generate.  This is a clue to consumers to
>> move on.
>> Also marking HttpSolrClient deprecated.
>>
>> ~ David
>>
>> On Fri, Mar 5, 2021 at 11:06 AM David Smiley 
>> wrote:
>>
>>> Oh I see; there are entanglements with Solr's authentication plugins :-(
>>> One step in this direction is to *move* it from SolrJ to solr-core.  If
>>> someone using SolrJ wants to pass whatever security tokens in headers, they
>>> can add their own interceptors.  Also, SolrJ 8 will likely work fine with
>>> SolrJ 9, so if there are unforeseen problems after 9.0, we can address them
>>> in 9.1 and users that are affected by whatever the problem is can still use
>>> SolrJ 8 as an option.
>>>
>>> Maintaining two HTTP client code paths is a pain.  It makes for possibly
>>> duplicative work in metrics, tracing, authentication, and shear mental
>>> overhead of what's going on.
>>>
>>> ~ David
>>>
>>>
>>> On Wed, Oct 14, 2020 at 8:55 AM Noble Paul  wrote:
>>>
>>>> +1 @David Smiley
>>>>
>>>> On Sun, Oct 11, 2020 at 4:07 AM Ishan Chattopadhyaya
>>>>  wrote:
>>>> >
>>>> > Maybe we need them for kerberos? I'm totally fine getting rid of
>>>> kerberos support from Solr core some day, but it might not be very easy to
>>>> refactor it into a package.
>>>> >
>>>> > On Sat, 10 Oct, 2020, 10:26 pm David Smiley, 
>>>> wrote:
>>>> >>
>>>> >> I think that historically, we are good at adding code but not good
>>>> at removing code.  We add new ways to do things but keep the old.  Removal
>>>> is more work often forgotten but doing nothing implicitly adds technical
>>>> debt henceforth.
>>>> >>
>>>> >> With that segue... given that our latest SolrClient implementations
>>>> are based on Jetty HttpClient (to support Http2 but should support 1.1?),
>>>> do we need the original Apache HttpComponents/HttpClient as well?  This is
>>>> an honest question... maybe there are subtle reasons they are needed and I
>>>> think it would be good as a project that we are clear on them.
>>>> >>
>>>> >> ~ David Smiley
>>>> >> Apache Lucene/Solr Search Developer
>>>> >> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>>
>>>> --
>>>> -
>>>> Noble Paul
>>>>
>>>


Re: Removal of Apache HttpComponents/HttpClient for 9.0?

2021-03-05 Thread David Smiley
An even smaller baby step is to mark the httpclient dependency as
"optional" in the Maven pom we generate.  This is a clue to consumers to
move on.
Also marking HttpSolrClient deprecated.

~ David

On Fri, Mar 5, 2021 at 11:06 AM David Smiley 
wrote:

> Oh I see; there are entanglements with Solr's authentication plugins :-(
> One step in this direction is to *move* it from SolrJ to solr-core.  If
> someone using SolrJ wants to pass whatever security tokens in headers, they
> can add their own interceptors.  Also, SolrJ 8 will likely work fine with
> SolrJ 9, so if there are unforeseen problems after 9.0, we can address them
> in 9.1 and users that are affected by whatever the problem is can still use
> SolrJ 8 as an option.
>
> Maintaining two HTTP client code paths is a pain.  It makes for possibly
> duplicative work in metrics, tracing, authentication, and shear mental
> overhead of what's going on.
>
> ~ David
>
>
> On Wed, Oct 14, 2020 at 8:55 AM Noble Paul  wrote:
>
>> +1 @David Smiley
>>
>> On Sun, Oct 11, 2020 at 4:07 AM Ishan Chattopadhyaya
>>  wrote:
>> >
>> > Maybe we need them for kerberos? I'm totally fine getting rid of
>> kerberos support from Solr core some day, but it might not be very easy to
>> refactor it into a package.
>> >
>> > On Sat, 10 Oct, 2020, 10:26 pm David Smiley, 
>> wrote:
>> >>
>> >> I think that historically, we are good at adding code but not good at
>> removing code.  We add new ways to do things but keep the old.  Removal is
>> more work often forgotten but doing nothing implicitly adds technical debt
>> henceforth.
>> >>
>> >> With that segue... given that our latest SolrClient implementations
>> are based on Jetty HttpClient (to support Http2 but should support 1.1?),
>> do we need the original Apache HttpComponents/HttpClient as well?  This is
>> an honest question... maybe there are subtle reasons they are needed and I
>> think it would be good as a project that we are clear on them.
>> >>
>> >> ~ David Smiley
>> >> Apache Lucene/Solr Search Developer
>> >> http://www.linkedin.com/in/davidwsmiley
>>
>>
>>
>> --
>> -
>> Noble Paul
>>
>


Re: Removal of Apache HttpComponents/HttpClient for 9.0?

2021-03-05 Thread David Smiley
Oh I see; there are entanglements with Solr's authentication plugins :-(
One step in this direction is to *move* it from SolrJ to solr-core.  If
someone using SolrJ wants to pass whatever security tokens in headers, they
can add their own interceptors.  Also, SolrJ 8 will likely work fine with
SolrJ 9, so if there are unforeseen problems after 9.0, we can address them
in 9.1 and users that are affected by whatever the problem is can still use
SolrJ 8 as an option.

Maintaining two HTTP client code paths is a pain.  It makes for possibly
duplicative work in metrics, tracing, authentication, and shear mental
overhead of what's going on.

~ David


On Wed, Oct 14, 2020 at 8:55 AM Noble Paul  wrote:

> +1 @David Smiley
>
> On Sun, Oct 11, 2020 at 4:07 AM Ishan Chattopadhyaya
>  wrote:
> >
> > Maybe we need them for kerberos? I'm totally fine getting rid of
> kerberos support from Solr core some day, but it might not be very easy to
> refactor it into a package.
> >
> > On Sat, 10 Oct, 2020, 10:26 pm David Smiley,  wrote:
> >>
> >> I think that historically, we are good at adding code but not good at
> removing code.  We add new ways to do things but keep the old.  Removal is
> more work often forgotten but doing nothing implicitly adds technical debt
> henceforth.
> >>
> >> With that segue... given that our latest SolrClient implementations are
> based on Jetty HttpClient (to support Http2 but should support 1.1?), do we
> need the original Apache HttpComponents/HttpClient as well?  This is an
> honest question... maybe there are subtle reasons they are needed and I
> think it would be good as a project that we are clear on them.
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
>
>
>
> --
> -
> Noble Paul
>


Re: [DISCUSS] Sunset the general@l.a.o mailing list?

2021-03-01 Thread David Smiley
+1 to remove.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Feb 28, 2021 at 4:03 PM Jan Høydahl  wrote:

> Hi
>
> The general@ list is not being used for practically anything. I see some
> user questions there and we announce releases. It may have had more purpose
> when there were 5 sub projects in Lucene. Now it is more confusing users
> and they do not get timely replies. The list has 1088 subscribers.
>
> I propose to discontinue the list, i.e. make it Read-Only and remove it
> from the web page. Anyone who would miss it?
>
> Jan Høydahl
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Review request - New Solr website

2021-03-01 Thread David Smiley
Thanks for doing this Jan!

Some quick feedback on the Lucene site:
* The news announcement that Solr has "graduated" to a separate TLP seems
off to me in use of this word.  To me, that word suggests it was too small
or immature to warrant it previously.
* IMO with Solr gone, the Lucene-core content should not be just some
sub-page but should move into the front page.  The front page would then
have the tabs that Lucene-core has.  PyLucene could be another tab.

Solr side:
* I like the note that shows up immediately to alert the user of the
switch!  I see that it doesn't re-appear on every return (e.g. due to a
cookie)?  I imagine we will stop doing this in a year or maybe sooner.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 1, 2021 at 1:03 PM Michael Sokolov  wrote:

> I clicked around a bit; didn't do a thorough copy edit or anything,
> but it seems as if the links are working, content looks accurate. The
> notices about the new TLP seem good to me, too. Thanks for forging
> ahead, Jan
>
> -Mike
>
> On Mon, Mar 1, 2021 at 3:56 AM Jan Høydahl  wrote:
> >
> > Hi,
> >
> > I have been working on https://issues.apache.org/jira/browse/SOLR-14499
> to prepare the separate website for Solr.
> > I believe the work is practically done, and would like a broader review
> before I actually publish the changes.
> >
> > The staging site which will eventually be solr.apache.org is at
> https://lucene-solrtlp.staged.apache.org/
> > The staging site which shows the lucene site without Solr is at
> https://lucene-new.staged.apache.org/
> >
> > Any feedback is welcome, here or in the JIRA issue. I intend to publish
> the new sites in a couple of days.
> >
> > Jan
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [ANNOUNCE] Apache Solr 8.8.1 released

2021-02-27 Thread David Smiley
The corresponding docker image has been released as well:
https://hub.docker.com/_/solr
(credit to Tobias Kässmann for helping)

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Feb 23, 2021 at 10:39 AM Timothy Potter 
wrote:

> The Lucene PMC is pleased to announce the release of Apache Solr 8.8.1.
>
>
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the Apache Lucene project. Its major features include powerful full-text
> search, hit highlighting, faceted search, dynamic clustering, database
> integration, rich document handling, and geospatial search. Solr is highly
> scalable, providing fault tolerant distributed search and indexing, and
> powers the search and navigation features of many of the world's largest
> internet sites.
>
>
> Solr 8.8.1 is available for immediate download at:
>
>
>   <https://lucene.apache.org/solr/downloads.html>
>
>
> ### Solr 8.8.1 Release Highlights:
>
>
> Fix for a SolrJ backwards compatibility issue when upgrading the server to
> 8.8.0 without upgrading SolrJ to 8.8.0.
>
>
> Please refer to the Upgrade Notes in the Solr Ref Guide for information on
> upgrading from previous Solr versions:
>
>
>   <https://lucene.apache.org/solr/guide/8_8/solr-upgrade-notes.html>
>
>
> Please read CHANGES.txt for a full list of bugfixes:
>
>
>   <https://lucene.apache.org/solr/8_8_1/changes/Changes.html>
>
>
> Solr 8.8.1 also includes bugfixes in the corresponding Apache Lucene
> release:
>
>
>   <https://lucene.apache.org/core/8_8_1/changes/Changes.html>
>
>
>
> Note: The Apache Software Foundation uses an extensive mirroring network
> for
>
> distributing releases. It is possible that the mirror you are using may not
> have
>
> replicated the release yet. If that is the case, please try another mirror.
>
> This also applies to Maven access.
>
> 
>


Re: Revisiting Standardized Test Names in Solr

2021-02-26 Thread David Smiley
Mark 2.0 speaks in riddles, which I'm not great at interpreting but I
think you're implying that the so-called "ref-branch" is not going to be
merged into anything, which is depressing because I now care much less
about that branch.  Markus, Jason -- lets get the standardization on with!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Feb 26, 2021 at 7:50 AM Mark Miller  wrote:

> I hope that doesn’t sound too negative, “clinging” never sounds as
> positive as I’d like and I do negative plenty well without doing it by
> accident. Not a pessimistic statement though, I made it even better than I
> was planning or remembering I could or however that works. Resistance is
> built into the equation - this isn’t rock and roll, I’m a science bachelor.
> Though only a small few liberal arts classes made me go, so I wouldn’t
> trust the cert myself. Anyway, I learned from multiple Star Wars movies
> what to do here, you have to setup an ambush on the trench run and then
> just make the thing look like a huge black star.
>
> On Fri, Feb 26, 2021 at 4:38 AM Mark Miller  wrote:
>
>> There are already so many conflicts, you will cry and then realize there
>> are more. Even worse, some things have been changed due to their
>> cost/benefit failings, things that someone, somewhere, will cling to like a
>> life vest.
>>
>> The ref branch waits for no man, and expects the same.
>>
>> It lives on ridiculous speed and stability and throws mergability to the
>> crows.
>>
>> It could not be merged into anything and survive, but it can absorb
>> anything, as long as it behaves like a boss or can be jostled into doing
>> so. So fear not for the fearless. You can’t let a specter freeze the
>> tireless day to day shifting and shuffling of names and rules and
>> locations. I swear, enough lucky shifts and this thing can rise to meet the
>> living. I’ve seen it see dead people.
>>
>> End of the day, if the ref branch can’t survive even a large and lengthy
>> divergence, if that is the freeze in its tracks, it’s not at all what I’ve
>> said ive been working on and so does it even matter?
>>
>>
>> On Mon, Feb 22, 2021 at 9:39 AM Jason Gerlowski 
>> wrote:
>>
>>> I'm fine with standardization, whichever convention we choose.  I have
>>> a slight preference for FooTest, for the same reason Gus mentioned,
>>> but any standard is better than none here IMO.
>>>
>>> > prefer that we not make a sweeping change like this until after Mark's
>>> "ref branch" is reconciled
>>>
>>> Personally I disagree about the need to wait.  It'd be one thing if
>>> there was an agreed-upon plan or a timeframe for merging "ref-branch".
>>> But since that's not the case today, I don't think it makes sense to
>>> ignore concrete/mergeable improvements.  It seems like a "bird in the
>>> hand vs two in the bush" situation.  Especially when there are
>>> strategies for handling the conflicts that might arise with Mark's
>>> "ref-branch" (e.g. do the test renames on both master and ref_impl).
>>>
>>> Jason
>>>
>>> On Sun, Feb 21, 2021 at 12:44 PM David Smiley 
>>> wrote:
>>> >
>>> > I look forward to a standardization on *something* but would prefer
>>> that we not make a sweeping change like this until after Mark's "ref
>>> branch" is reconciled.  I don't want that to hang over the project
>>> indefinitely, but we can wait; we've not had this standardization yet for
>>> many years, after all.
>>> >
>>> > That said, it would be good to choose the standard name now so that
>>> there is less to change later.  Can someone dig up the statistics on Solr's
>>> name choice to see if there is a clear winner (e.g. >60%)?  I don't have a
>>> strong opinion on whatever the standard should be so long as there is a
>>> standard :-)
>>> >
>>> >
>>> > ~ David Smiley
>>> > Apache Lucene/Solr Search Developer
>>> > http://www.linkedin.com/in/davidwsmiley
>>> >
>>> >
>>> > On Sun, Feb 21, 2021 at 12:18 PM Gus Heck  wrote:
>>> >>
>>> >> FWIW, I'm not really in favor of the convention Lucene adopted. I
>>> probably lost track of the debate and failed to object which is on me, but
>>> I guess it was because that was the lower number of changes there? It's
>>> certainly much less legible in the IDE to have a w

Re: Gradle: Verifying dependencies / version locks

2021-02-22 Thread David Smiley
Thanks for the background on that.  I suspected it was a new feature.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Feb 22, 2021 at 5:02 PM Mike Drob  wrote:

> This feature was added to Gradle 6.2, which wasn't available when we first
> did the conversion from ant.
>
> This plugin doesn't do any verification of license and notice files like
> we do, so that's one thing that we will still need our custom validation
> for.
>
> We could potentially move the checksum verification to the plugin, but
> that seems like a lot of effort for I'm not sure what the payoff is.
>
> I don't trust the state of signatures in open source repositories to know
> if going down that path is worthwhile, but I also suspect not.
>
>
> Mike
>
> On Mon, Feb 22, 2021 at 3:45 PM David Smiley  wrote:
>
>> I noticed that Gradle has a built-in dependency version locking mechanism
>> that is different than the one we are using:
>> https://docs.gradle.org/current/userguide/dependency_verification.html
>> Dawid (or anyone), why are we using something different?  Is our
>> mechanism completely defined ad-hoc in Groovy in
>> gradle/validation/jar-checks.gradle or is there some related plugin for
>> this?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>


Gradle: Verifying dependencies / version locks

2021-02-22 Thread David Smiley
I noticed that Gradle has a built-in dependency version locking mechanism
that is different than the one we are using:
https://docs.gradle.org/current/userguide/dependency_verification.html
Dawid (or anyone), why are we using something different?  Is our mechanism
completely defined ad-hoc in Groovy in gradle/validation/jar-checks.gradle
or is there some related plugin for this?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: Revisiting Standardized Test Names in Solr

2021-02-21 Thread David Smiley
I look forward to a standardization on *something* but would prefer that we
not make a sweeping change like this until after Mark's "ref branch" is
reconciled.  I don't want that to hang over the project indefinitely, but
we can wait; we've not had this standardization yet for many years, after
all.

That said, it would be good to choose the standard name now so that there
is less to change later.  Can someone dig up the statistics on Solr's name
choice to see if there is a clear winner (e.g. >60%)?  I don't have a
strong opinion on whatever the standard should be so long as there is a
standard :-)


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Feb 21, 2021 at 12:18 PM Gus Heck  wrote:

> FWIW, I'm not really in favor of the convention Lucene adopted. I probably
> lost track of the debate and failed to object which is on me, but I guess
> it was because that was the lower number of changes there? It's
> certainly much less legible in the IDE to have a wall of classes all
> starting with T. Maybe given that the projects are splitting Solr can Stick
> with FooTest not TestFoo? I think *Test suffix is more common in Solr...
> (though I haven't attempted to quantify it)
>
> On Sun, Feb 21, 2021 at 12:05 PM Eric Pugh <
> ep...@opensourceconnections.com> wrote:
>
>> Makes sense to me.
>>
>>
>> On Feb 20, 2021, at 2:42 PM, Marcus Eagan  wrote:
>>
>> Hi all,
>>
>> Now that Lucene’s standardization is complete and I believe enforced,
>> should we discuss if we could bring the same consistency to Solr?
>>
>> Best,
>>
>> Marcus
>> --
>> Marcus Eagan
>>
>>
>> ___
>> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
>> | http://www.opensourceconnections.com | My Free/Busy
>> <http://tinyurl.com/eric-cal>
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless
>> of whether attachments are marked as such.
>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


Re: OverseerStatusTest recent failures

2021-02-21 Thread David Smiley
Ah; that makes total sense; thanks.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Feb 21, 2021 at 12:06 PM Ilan Ginzburg  wrote:

> Searching in my jenkins folder for failures of this test (label:jenkins
> "FAILED:  org.apache.solr.cloud.OverseerStatusTest.test") 26 emails match.
> Searching for all jenkins master builds emails since the first failure
> email found above (2 days ago), I see 40 messages.
> 26 over 40 is not far from the expected 50% failure rate.
> I believe the ratio in the graph you sent David (currently at 5.7%) is
> averaged over a week, and includes failures from all branches (did some
> other stats on jenkins emails that tend to confirm this assumption).
>
> On Sun, Feb 21, 2021 at 10:53 AM Ilan Ginzburg  wrote:
>
>> Yes Marcus this is the commit.
>>
>> David I would have expected 50% failures, as 50% of the runs use
>> distributed updates. I’ll try to understand better as I fix the issue.
>>
>> Ilan
>>
>> On Sun 21 Feb 2021 at 06:17, David Smiley  wrote:
>>
>>> Interesting.  Do you have a guess as to why the failures there are ~5%
>>> and not 100% reproducible?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg 
>>> wrote:
>>>
>>>> Indeed the issue is due to my changes.
>>>>
>>>> In OverseerStatusCmd I've skipped some stat collection when running in
>>>> distributed cluster state updates mode because I thought these were only
>>>> stats related to cluster state updates.
>>>> Obviously that was too aggressive and some of the stats are related to
>>>> the Collection API.
>>>>
>>>> I will make sure to skip returning only the stats that are related to
>>>> cluster state updater and restore returning collection api stats (when
>>>> running in distributed cluster updates mode, otherwise all stats are
>>>> returned).
>>>>
>>>> Tomorrow...
>>>>
>>>> Ilan
>>>>
>>>> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg 
>>>> wrote:
>>>>
>>>>> Thank you David for reporting this.
>>>>>
>>>>> Seems due to my recent changes. I reproduce the failure locally and
>>>>> will look at this tomorrow.
>>>>>
>>>>> With the distributed cluster state updates i've introduced a
>>>>> randomization for using either Overseer based cluster state updates or
>>>>> distributed cluster state updates in tests. This failure seems to happen 
>>>>> in
>>>>> the distributed state update case. I suspect it is due to Overseer
>>>>> returning less stats than expected by the test (which is expected: 
>>>>> Overseer
>>>>> cannot return stats about cluster state updates if it does not handle
>>>>> cluster state updates).
>>>>>
>>>>> The following line in the logs tells that the run is using distributed
>>>>> cluster state:
>>>>> 972874 INFO  (jetty-launcher-8973-thread-2) [ ]
>>>>> o.a.s.c.DistributedClusterStateUpdater Creating
>>>>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>>>>> will be using distributed cluster state updates.
>>>>>
>>>>> Ilan
>>>>>
>>>>>
>>>>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley 
>>>>> wrote:
>>>>>
>>>>>> I encountered a failure from OverseerStatusTest locally.  According
>>>>>> to our test failure trends, this guy only just recently started failing
>>>>>> ~4-5% of the time, but previously was fine.  Only master branch.
>>>>>>
>>>>>>
>>>>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>>>>
>>>>>> ~ David Smiley
>>>>>> Apache Lucene/Solr Search Developer
>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>
>>>>>


Removing deprecations

2021-02-21 Thread David Smiley
There are two linked issues pertaining to the removal of deprecations for
9.0:
https://issues.apache.org/jira/browse/LUCENE-8638
https://issues.apache.org/jira/browse/SOLR-13138
and a branch where this work has been done:
https://github.com/apache/lucene-solr/tree/master-deprecations

I'm just calling attention to this because I think this ideally will get
done before the source control split of the projects because there is one
(existing) branch covering both.  And it needn't prevent an 8.9 from
happening either.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: Simplifying source pattern checks

2021-02-21 Thread David Smiley
Makes sense.  I see you haven't commented on the issue about this; I prefer
that tactic as it gets noticed by everyone "Watching" the original issue,
even if it's old.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Feb 20, 2021 at 5:14 PM Gus Heck  wrote:

> I noticed today that SOLR-10883 added checks for patterns that didn't play
> nice with PDF generation. Now that we don't generate the PDF anymore
> perhaps we can do away with those checks? Anyone have thoughts to the
> contrary?
>
> https://issues.apache.org/jira/browse/SOLR-10883
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


Re: Random disabling of asserts in tests is not working

2021-02-20 Thread David Smiley
I agree with Rob -- let's leave this sort of thing for Jenkins.  It's
really an edge case and I'd prefer consistently knowing the asserts always
work locally.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Feb 20, 2021 at 1:14 PM Dawid Weiss  wrote:

>
> cool thanks for the pointer. I really like this list 
>>
>
> The list is sort of internal detail... what I really wanted to have is a
> list of options and current "values" computed for a particular run of
> options and seed - this is the "testOpts" task that you can run for any
> project. Compare the output of these (note the flags - "C" for computed
> value, "!" for non-default value, etc.):
>
> gradlew -p lucene/core testOpts
> gradlew -p lucene/solr testOpts
> gradlew -p lucene/core testOpts -Ptests.asserts=false
>
> This can be improved even more... and I have written a plugin that cleans
> up management of such options, but haven't had the time to port Lucene's
> build yet, eh.
>
> Dawid
>
>
>
>
>> So I am not sure at which point we ever had randomization of security
>> manager and/or asserts. I assume, Policeman Jenkins never had that.
>>
>> I have the feeling that Elastic did this on their build servers, but
>> that’s also not proved. I suggested that change in one of my talks at
>> BerlinBuzzwords, but may have never implemented it.
>>
>>
>>
>> Anyways: I am open to add randomization on Jenkins, it’s just 2 lines of
>> code in the randomize-java-groovy file on Policeman Jenkins. Maybe disable
>> asserts/and or SecurityManager in 1/5th of all cases.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Dawid Weiss 
>> *Sent:* Friday, February 19, 2021 7:52 PM
>> *To:* Lucene Dev 
>> *Subject:* Re: Random disabling of asserts in tests is not working
>>
>>
>>
>>
>>
>> Hi Uwe,
>>
>>
>>
>> No, it's not randomized - always runs with the security manager enabled.
>> All the options are here:
>>
>>
>>
>>
>> https://github.com/apache/lucene-solr/blob/master/gradle/testing/randomization.gradle#L68-L103
>>
>>
>>
>> When the value says "random" we pick the random value at runtime (so that
>> it also works within IDEs). We could pick security manager at build-time
>> (derive from project seed). This is a no-brainer to do. As Robert said -
>> perhaps we should keep some things more strict for developers and just
>> shuffle on the CI-only. This requires passing -Ptests.*=... flags but is
>> simple, I think.
>>
>>
>>
>> Dawid
>>
>>
>>
>> On Fri, Feb 19, 2021 at 7:45 PM Uwe Schindler  wrote:
>>
>> Hi,
>>
>> I don’t fully remember what the setup previously was, but at least for
>> master and 8.x it does not automatically enable/disable asserts. We can of
>> course do this together with the other settings like GC or compressed OOPs,
>> its just a few more lines in the Groovy file.
>>
>>
>>
>> I was also thinking that we have Security Manager enabled/disabled from
>> time to time. But recently, I see no randomization for this on Jenkins,
>> unless it’s part of the Gradle build.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Robert Muir 
>> *Sent:* Friday, February 19, 2021 3:13 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: Random disabling of asserts in tests is not working
>>
>>
>>
>> I don't think it is enabled (at least in policeman jenkins). perhaps it
>> didn't work correctly when the build was cutover to gradle. Take a look at
>> any old build such as
>> https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-master-Linux/29491/
>> . You can see the variables it randomizes right there.
>>
>>
>>
>> You can confirm by clicking console->full log and it prints exact gradle
>> command that it runs:
>> https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-master-Linux/29491/consoleFull
>>
>>
>>
>> Let's look into it, in a couple weeks or so?
>>
>>
>>
>> On Fri, Feb 19, 2021 at 8:32 AM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>> On Fri, Feb 19, 2021 at 8:07 AM Robert Muir  wrote:
>>
>>
>>
>> I think it has a downside: having a bug in an assert is really more of a
>> corner case. This is the kind of thing jenkins is for?
>>
>>
>>
>> Ahh, that is indeed a really good point.  I would want/expect asserts to
>> always work correctly when running local tests ... if we randomly disabled
>> them in our checkouts it can cause a false sense of security, too soon.
>>
>>
>>
>> OK, I agree, let's leave it as randomization in Jenkins!  How do we know
>> that Jenkins job/s are still randomizing assertions?  Who tests the tester?
>>
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>


Re: OverseerStatusTest recent failures

2021-02-20 Thread David Smiley
Interesting.  Do you have a guess as to why the failures there are ~5% and
not 100% reproducible?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg  wrote:

> Indeed the issue is due to my changes.
>
> In OverseerStatusCmd I've skipped some stat collection when running in
> distributed cluster state updates mode because I thought these were only
> stats related to cluster state updates.
> Obviously that was too aggressive and some of the stats are related to the
> Collection API.
>
> I will make sure to skip returning only the stats that are related to
> cluster state updater and restore returning collection api stats (when
> running in distributed cluster updates mode, otherwise all stats are
> returned).
>
> Tomorrow...
>
> Ilan
>
> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg  wrote:
>
>> Thank you David for reporting this.
>>
>> Seems due to my recent changes. I reproduce the failure locally and will
>> look at this tomorrow.
>>
>> With the distributed cluster state updates i've introduced a
>> randomization for using either Overseer based cluster state updates or
>> distributed cluster state updates in tests. This failure seems to happen in
>> the distributed state update case. I suspect it is due to Overseer
>> returning less stats than expected by the test (which is expected: Overseer
>> cannot return stats about cluster state updates if it does not handle
>> cluster state updates).
>>
>> The following line in the logs tells that the run is using distributed
>> cluster state:
>> 972874 INFO  (jetty-launcher-8973-thread-2) [ ]
>> o.a.s.c.DistributedClusterStateUpdater Creating
>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>> will be using distributed cluster state updates.
>>
>> Ilan
>>
>>
>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley  wrote:
>>
>>> I encountered a failure from OverseerStatusTest locally.  According to
>>> our test failure trends, this guy only just recently started failing ~4-5%
>>> of the time, but previously was fine.  Only master branch.
>>>
>>>
>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>


OverseerStatusTest recent failures

2021-02-20 Thread David Smiley
I encountered a failure from OverseerStatusTest locally.  According to our
test failure trends, this guy only just recently started failing ~4-5% of
the time, but previously was fine.  Only master branch.

http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: [VOTE] Release Lucene/Solr 8.8.1 RC2

2021-02-18 Thread David Smiley
And it failed for me again; this time with some other test:

   [junit4] Tests with failures [seed: EDBAB91A9E12EDEA]:

   [junit4]   -
org.apache.solr.metrics.reporters.solr.SolrCloudReportersTest.testExplicitConfiguration

These failures all failed related to timeouts, AFAICT.  The computer I'm
doing this on is a somewhat old.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Feb 19, 2021 at 12:39 AM David Smiley  wrote:

> This morning, my attempt turned up 3 errors, none of which reproduced for
> their seeds:
>
>[junit4] Tests with failures [seed: 3B16FA15699FBCA8]:
>
>[junit4]   -
> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>
>[junit4]   -
> org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver
>
>[junit4]   - org.apache.solr.cloud.ZkSolrClientTest.testReconnect
>
>[junit4]   - org.apache.solr.cloud.ZkSolrClientTest (suite)
>
> I'll try again tonight.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Feb 18, 2021 at 11:18 PM Anshum Gupta 
> wrote:
>
>> +1 (binding)
>> SUCCESS! [1:04:04.738835]
>>
>> Tested with a sample app, basic indexing, search, and went through the
>> UI. Looks good.
>>
>> Thanks for the effort, Tim!
>>
>>
>>
>> On Tue, Feb 16, 2021 at 6:42 PM Timothy Potter 
>> wrote:
>>
>>> Please vote for release candidate 2 for Lucene/Solr 8.8.1
>>>
>>> The artifacts can be downloaded from:
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>>
>>> You can run the smoke tester directly with this command:
>>> python3 -u dev-tools/scripts/smokeTestRelease.py
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>>
>>> The vote will be open for at least 72 hours i.e. until 2021-02-20 03:00
>>> UTC.
>>>
>>> [ ] +1  approve
>>> [ ] +0  no opinion
>>> [ ] -1  disapprove (and reason why)
>>>
>>> Here is my +1 SUCCESS! [0:50:07.947952]
>>>
>>> Also, as with RC1, in addition to the smoke test, I built a Docker
>>> image from the RC locally and verified:
>>>
>>> a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC
>>> completes successfully w/o any NPEs or weirdness with leader election
>>> / recoveries.
>>> b. The base_url property is stored in replica state after the upgrade
>>> c. A basic client application built with SolrJ 8.7.0 can load cluster
>>> state info directly from ZK and query the 8.8.1 RC2 servers.
>>> d. Same client app built with SolrJ 8.8.0 works as well.
>>>
>>> As this bug-fix release is primarily needed to address a SolrJ
>>> back-compat break (SOLR-15145) and unfortunately our smoke tester
>>> framework does not test for backcompat of older SolrJ against the RC,
>>> I ask others to please test rolling upgrades of servers (ideally
>>> multi-node clusters) running pre-8.8.0 to this RC if possible. Also,
>>> please try client applications that are using an older SolrJ, esp.
>>> those that load cluster state directly from ZK.
>>>
>>> Best regards,
>>> Tim
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> --
>> Anshum Gupta
>>
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread David Smiley
Congratulations Jan!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta  wrote:

> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta
>


Re: JIRA issues to close?

2021-02-18 Thread David Smiley
On Thu, Feb 18, 2021 at 4:28 PM Eric Pugh 
wrote:

> Good question.I don’t have a great sense of exactly what the lifecycle
> is for a JIRA issue.   Is “Resolved” and “Closed” mean the same thing?
>
> I was assuming that “Resolved” meant it had been fixed, but maybe wasn’t
> in a released version of Solr.   And Closed would mean it had been in a
> shipped version of Solr?
>

Precisely.

The distinction is not a big deal to me but it's the workflow we have.


Re: [VOTE] Release Lucene/Solr 8.8.1 RC2

2021-02-18 Thread David Smiley
This morning, my attempt turned up 3 errors, none of which reproduced for
their seeds:

   [junit4] Tests with failures [seed: 3B16FA15699FBCA8]:

   [junit4]   -
org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory

   [junit4]   -
org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver

   [junit4]   - org.apache.solr.cloud.ZkSolrClientTest.testReconnect

   [junit4]   - org.apache.solr.cloud.ZkSolrClientTest (suite)

I'll try again tonight.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Feb 18, 2021 at 11:18 PM Anshum Gupta 
wrote:

> +1 (binding)
> SUCCESS! [1:04:04.738835]
>
> Tested with a sample app, basic indexing, search, and went through the UI.
> Looks good.
>
> Thanks for the effort, Tim!
>
>
>
> On Tue, Feb 16, 2021 at 6:42 PM Timothy Potter 
> wrote:
>
>> Please vote for release candidate 2 for Lucene/Solr 8.8.1
>>
>> The artifacts can be downloaded from:
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>
>> You can run the smoke tester directly with this command:
>> python3 -u dev-tools/scripts/smokeTestRelease.py
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>
>> The vote will be open for at least 72 hours i.e. until 2021-02-20 03:00
>> UTC.
>>
>> [ ] +1  approve
>> [ ] +0  no opinion
>> [ ] -1  disapprove (and reason why)
>>
>> Here is my +1 SUCCESS! [0:50:07.947952]
>>
>> Also, as with RC1, in addition to the smoke test, I built a Docker
>> image from the RC locally and verified:
>>
>> a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC
>> completes successfully w/o any NPEs or weirdness with leader election
>> / recoveries.
>> b. The base_url property is stored in replica state after the upgrade
>> c. A basic client application built with SolrJ 8.7.0 can load cluster
>> state info directly from ZK and query the 8.8.1 RC2 servers.
>> d. Same client app built with SolrJ 8.8.0 works as well.
>>
>> As this bug-fix release is primarily needed to address a SolrJ
>> back-compat break (SOLR-15145) and unfortunately our smoke tester
>> framework does not test for backcompat of older SolrJ against the RC,
>> I ask others to please test rolling upgrades of servers (ideally
>> multi-node clusters) running pre-8.8.0 to this RC if possible. Also,
>> please try client applications that are using an older SolrJ, esp.
>> those that load cluster state directly from ZK.
>>
>> Best regards,
>> Tim
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Anshum Gupta
>


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread David Smiley
Congratulations Mike!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Feb 17, 2021 at 4:32 PM Anshum Gupta  wrote:

> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
> President position.
>
> This year we nominated and elected Michael Sokolov as the Chair, a
> decision that the board approved in its February 2021 meeting.
>
> Congratulations, Mike!
>
> --
> Anshum Gupta
>


ZkTestServer Watch limit violations

2021-02-17 Thread David Smiley
I've noticed that it's quite common for a SolrCloud based test to conclude
with warnings about "Watch limit violations".  I don't know how to
interpret these violations; it's normal to get them. Can someone offer
insights as to what this matter is about and what we ought to do about it?

63605 WARN  (ZkTestServer Run Thread) [ ] o.a.s.c.ZkTestServer Watch
limit violations:
Maximum concurrent create/delete watches above limit:

4 /solr/aliases.json
4 /solr/clusterprops.json
3 /solr/packages.json
3 /solr/security.json
2 /solr/collections/ping_test/terms/shard2
2 /solr/collections/ping_test/terms/shard1
2 /solr/configs/conf

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: [DISCUSS] ConfigSet ZK to file system fallback

2021-02-16 Thread David Smiley
On Thu, Feb 4, 2021 at 12:23 PM Tomás Fernández Löbbe 
wrote:

> The point I was trying to make is that, having a single configset loading
> from both, local and zk may be confusing for the user and cause issues that
> may be difficult to track: Which file is Solr really reading right now? is
> it the local one or the remote one? Is there a local one in a node or not?
> is it being correctly overridden? How do I ensure that I always have a
> local version of a file to override the remote?
>

Fair point -- it is less clear than today.  I suppose anything we come up
with will be :-)


> So, I'm thinking that if we want to support this feature, a cleaner
> approach could be to just have a type of configset that's defined as
> "local", and then it belongs to the local filesystem. We can just prevent a
> node from starting if it's supposed to have a configset that doesn't have.
> It's 100% clear where a config file is being read from, etc. Maybe the
> "configOverlay.json" is an exception and should live in ZooKeeper (and
> never locally) for the config API to work, but having just "default to
> local when a file is not in ZooKeeper" just confuses things IMO.
>

Hmmm, okay.  While I agree configOverlay.json & params.json would always
belong in ZK... for the rest, it's debatable. Can we get the schema there
too if it's a "managed schema"?  What about resource files (e.g.
synonyms)?  Whatever the answers are there, it would solve my primary
motivation -- an easier upgrade path, at least where I work.

I spoke with Ilan a couple weeks ago about this and he proposed an
interesting idea:  Put a simple version number on the configSet, and let
them live in either ZK or local.  The greater version number chooses which
wins; the other is ignored.  This is somewhat similar to your idea.

Still... I'd prefer some way to establish defaults for specific
configuration elements that live on the node, while letting the other
aspects continue to reside in ZK (or have the option of resolving local as
well).  In my mind, this is just about making Solr's existing defaults in
the code become configurable.  It's a different way of looking at things
than saying where does this or that file live.  For example, imagine a node
resident default configSet that is effectively the default that all
configSets are overlayed on top of.  Field types, analyzers, merge
policies, request handlers -- it could define whatever it felt is needed.
Then the ZK part is what is specific to a configSet for a given search app,
and it doesn't need to specify the organization-wide settings.  My original
proposal doesn't quite do this directly because I thought of a cheap hack
in concert with some other Solr features that'd suffice for my aims.  But
maybe I should propose more explicitly a node-local default configSet,
designed to make setting defaults simple/easy in one place and specific to
a node.  One might call this configSet inheritance.  I think it would lead
to configSets that are simpler to read/maintain because they would only
contain what an app needs, and not the organization-wide needs and/or Solr
defaults.  WDYT?


Re: Circuit Breakers interaction with Shards

2021-02-16 Thread David Smiley
Walter, it sounds like you were doing rate limiting, just in a different
way that is more dynamic than a simple (yet fiddly) constant?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Feb 14, 2021 at 2:54 PM Walter Underwood 
wrote:

> Rate limiting is a good idea. It requires a lot of ongoing engineering to
> adjust the rates to the current cluster behavior. It doesn’t help with some
> kinds of overload. The ROI just doesn’t work out. It is too much work for
> not enough benefit.
>
> Rate limiting works if the collection size doesn’t change and the queries
> don’t change.
>
> At Netflix, we limited traffic based on number of connections to each
> server. This is basically the length of the queue of requests for that
> server. This is similar to limiting by load average, which is also the work
> waiting to be done. It has the same weaknesses as the load average circuit
> breaker, but it did not need to be changed when average CPU usage per query
> increased. It was “set and forget”. Rate limiters require constant
> adjustment.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Feb 14, 2021, at 11:44 AM, Atri Sharma  wrote:
>
> This is a debate better suited for  a different forum  -- but I would
> disagree with your assertion that rate limiting is a bad idea.
>
> Solr allows you to specify node level request quotas which also follow the
> principle of not limiting internal requests. I find that to be pretty
> useful in two forms: 1. Use it in conjunction with a global request limit
> which is typically 0.75 of my total load capacity given my average query
> resource consumption. 2. Allow per node request limits to ensure fairness
> and dedicated capacity for different types of requests. 3. Allow circuit
> breakers to handle cases where a couple of rogue queries can take down
> nodes.
>
> We digress -- as I said, it should be fairly simple to have a circuit
> breaker which rejects only external requests,  but should be clearly
> documented with its downsides.
>
> On Mon, 15 Feb 2021, 00:33 Walter Underwood, 
> wrote:
>
>> We’ve looked at and rejected rate limiters as high-maintenance and not
>> sufficient protection.
>>
>> We would have run nginx on each node, sent external traffic to nginx on a
>> different port and let internal traffic stay on the default Solr port. This
>> has other advantages (monitoring), but the rate limiting part is way too
>> fiddly.
>>
>> Rates depend on how much CPU is used per query and on the size of the
>> cluster (if they are not on each node). Some examples from our largest
>> cluster which would need a change in rate limits. Some of these could be
>> set by doing offline load benchmarks, some not.
>>
>> * Experiment cell that uses 2.5X more CPU for each query (running now in
>> prod)
>> * Increasing traffic allocated to that cell (did this last week)
>> * Increase in index size (number of docs and CPU requirements increase
>> about 5% every month)
>> * Website slowdown that shifts most traffic to mobile, where queries use
>> 2X as much CPU
>> * Horizontal scaling from 24 tp 48 nodes
>> * Vertical scaling from c5.8xlarge to c5.18xlarge
>>
>> And so on. Rate limiting would require almost weekly load benchmarks and
>> it still wouldn’t catch the outage-causing problems.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Feb 14, 2021, at 10:25 AM, Atri Sharma  wrote:
>>
>> The way I look at it is that for cluster level stability, rate limiters
>> should be used which allow rate limiting of only external requests. They
>> are "circuit breakers" in the sense of defending against cluster level
>> instability, which is what you describe.
>>
>> Circuit breakers, in Solr world, are targeted to be the last resort
>> defense of a node.
>>
>> As I said earlier, it is possible to write a circuit breaker which
>> rejects only external requests, but I personally do not see the benefit in
>> presence of rate limiters.
>>
>> On Sun, 14 Feb 2021, 23:50 Walter Underwood, 
>> wrote:
>>
>>> Ideally, it would only affect a few queries. In reality, with a sharded
>>> system, the impact will be large.
>>>
>>> I disagree that the goal is to protect a node. The goal is to make the
>>> entire cluster avoid congestion failure when overloaded, while providing
>>> good service for the load that it can handle.
>>>
>>> I have had Solr clusters take do

  1   2   3   4   5   6   7   8   9   10   >