question about impacts use case

2023-04-01 Thread Michael Sokolov
Hi, I've been working on seeing whether we can make use of impacts in Amazon search and I have some questions. To date, we haven't used Lucene's scoring APIs at all; all of our queries are constant score, we early terminate based on a sorted index rank and then re-rank using custom non-Lucene ranki

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-01 Thread Michael Sokolov
I'm also in favor of raising this limit. We do see some datasets with higher than 1024 dims. I also think we need to keep a limit. For example we currently need to keep all the vectors in RAM while indexing and we want to be able to support reasonable numbers of vectors in an index segment. Also we

Re: [GitHub] [lucene] david-sitsky commented on issue #12185: Using DirectIODirectory results in BufferOverflowException

2023-03-22 Thread Michael Sokolov
Using directio with nfs makes no sense at all to me, I think that is the problem in a nutshell. Directio tries to bypass the operating systems buffers, but that's not going to play nicely with nfs. On Wed, Mar 22, 2023, 4:38 PM david-sitsky (via GitHub) wrote: > > david-sitsky commented on issue

Re: Welcome Ben Trent as Lucene committer

2023-01-27 Thread Michael Sokolov
Welcome, Ben! Congratulations On Fri, Jan 27, 2023 at 4:52 PM Anshum Gupta wrote: > > Congratulations and welcome, Ben! > > On Fri, Jan 27, 2023 at 7:18 AM Adrien Grand wrote: >> >> I'm pleased to announce that Ben Trent has accepted the PMC's >> invitation to become a committer. >> >> Ben, the

Re: Is there a way to customize segment names?

2022-12-16 Thread Michael Sokolov
+1 trying to coordinate multiple writers running independently will not work. My 2c for availability: you can have a single primary active writer with a backup one waiting, receiving all the segments from the primary. Then if the primary goes down, the secondary one has the most recent commit repli

Re: [VOTE] Release Lucene 9.4.2 RC1

2022-11-18 Thread Michael Sokolov
ted signature, but it seems > like it's due to "can't connect to the agent: IPC connect call failed" > actually, which suggests an issue with the GPG agent? > > On Fri, Nov 18, 2022 at 3:00 PM Michael Sokolov wrote: >> >> I got this message when in

Re: [GitHub] [lucene] rmuir commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread Michael Sokolov
What I have in mind would be to implement entirely in the KnnVectorQuery. Since results are sorted by score, they can easily be post-filtered there: no need to implement anything at the codec layer I think. On Thu, Nov 17, 2022 at 10:10 AM GitBox wrote: > > > rmuir commented on PR #11946: > URL:

Re: [VOTE] Release Lucene 9.4.2 RC1

2022-11-18 Thread Michael Sokolov
I got this message when initially downloading the artifacts: Downloading https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.2-RC1-rev-858d9b437047a577fa9457089afff43eefa461db/lucene/lucene-9.4.2-src.tgz.asc File: /tmp/smoke_lucene_9.4.2_858d9b437047a577fa9457089afff43eefa461db/lucene.lucen

Re: HNSW search with threshold

2022-11-11 Thread Michael Sokolov
gt; it would be hard to predict whether a given radius would actually match a >>> small set of vectors. Should the query still require a `k` value in >>> addition to the radius to make sure it doesn't go wild? >>> >>> On Tue, Nov 8, 2022 at 7:26 AM Alexey Go

Re: Release Lucene 9.4.2

2022-11-11 Thread Michael Sokolov
+1 makes sense. I do think given this is the second similar-flavored bug we've found that we should be thorough and try to get them all rather than having a 9.4.3 ... On Wed, Nov 9, 2022 at 10:25 AM Julie Tibshirani wrote: > > +1 from me for a bugfix release once we've solidified testing. Thanks

Re: HNSW search with threshold

2022-11-07 Thread Michael Sokolov
+1 to adding a scoring threshold. I think it could be another parameter to KnnVectorQuery. Do you want to have a try at adding this? If so, please feel free to open a PR and I will be happy to guide you. On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko wrote: > > Hi! > > There are some use cases wh

Re: Dense union of doc IDs

2022-11-04 Thread Michael Sokolov
It sounds like a lot of complexity to handle an unusual edge case, but ... I guess this actually happened? Can you give any sense of the end-user behavior that caused it? On Fri, Nov 4, 2022 at 2:26 AM Patrick Zhai wrote: > > Hi Froh, > > The idea sounds reasonable to me, altho I wonder whether u

Re: HNSW and Multi-segments

2022-11-03 Thread Michael Sokolov
The way I think of this is that segmenting the graph will generally lead to higher recall and higher costs (at query time) for a given set of HNSW parameters. Indexing costs will tend to be lower for multiple segmented graphs. I don't think that increased irrelevant docs should be a concern since a

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Michael Sokolov
exity, especially when it > would only improve the ternary "if" feature in such cases. > > On Wed, Oct 26, 2022 at 10:23 AM Michael Sokolov wrote: > > > > see https://github.com/apache/lucene/pull/11878 ... it doesn't do what > > I initially asked for (sti

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Michael Sokolov
see https://github.com/apache/lucene/pull/11878 ... it doesn't do what I initially asked for (still advances all of the operands), but it delays until doubleValue() is called, which is safe and could have some impact On Wed, Oct 26, 2022 at 9:58 AM Michael Sokolov wrote: > > Hi, yes,

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Michael Sokolov
s, and actually advancing on doubleValue() only. > > On Tue, Oct 25, 2022 at 8:13 PM Michael Sokolov wrote: >> >> ExpressionFunctionValueSource lazily evaluates in doubleValues: an >> expression like >> >>condition ? f1 : f2 >> >> will only eva

Expressions greedy advanceExact implementation

2022-10-25 Thread Michael Sokolov
ExpressionFunctionValueSource lazily evaluates in doubleValues: an expression like condition ? f1 : f2 will only evaluate one of f1 or f2. At the same time, the advanceExact() call is greedy -- when you advance that expression it will also advance both f1 and f2. But here's the thing: it alwa

Re: [VOTE] Release Lucene 9.4.1 RC1

2022-10-21 Thread Michael Sokolov
SUCCESS! [0:49:28.580122] +1 On Fri, Oct 21, 2022 at 5:57 AM Robert Muir wrote: > > I change my vote to +1 based on Julie's test. It fails for me with > 9.4.0 and passes for me with 9.4.1 > > :lucene:core:test (SUCCESS): 1 test(s) > > > Task :lucene:core:wipeTaskTemp > The slowest tests (exceedi

Re: call for 9.4.1 release (bug in vectors format)

2022-10-18 Thread Michael Sokolov
Oh no! Very sorry -- thank you for volunteering to fix (hangs head in shame). I guess I'll see where the bug is soon ... On Tue, Oct 18, 2022 at 2:50 PM Michael Wechner wrote: > > +1 :-) > > Thanks > > Michael > > Am 18.10.22 um 19:52 schrieb Julie Tibshirani: > > Hi everyone, > > > > We recentl

Re: Welcome Luca Cavanna as Lucene committer

2022-10-06 Thread Michael Sokolov
Welcome Luca! On Thu, Oct 6, 2022, 1:05 AM 陆徐刚 wrote: > Welcome! > > Xugang > > https://github.com/LuXugang > > On Oct 6, 2022, at 13:59, Mikhail Khludnev wrote: > >  > Welcome, Luca. > > On Wed, Oct 5, 2022 at 8:04 PM Adrien Grand wrote: > >> I'm pleased to announce that Luca Cavanna has acc

[ANNOUNCE] Apache Lucene 9.4.0 released

2022-09-30 Thread Michael Sokolov
The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-n

[RESULT] [VOTE] Release Lucene 9.4.0 RC3

2022-09-30 Thread Michael Sokolov
It's been >72h since the vote was initiated and the result is: +1 8 (7 binding) 0 0 -1 0 This vote has PASSED - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apac

Re: [VOTE] Release Lucene 9.4.0 RC3

2022-09-28 Thread Michael Sokolov
ndless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> >>>> On Tue, Sep 27, 2022 at 3:45 PM Anshum Gupta >>>> wrote: >>>>> >>>>> +1 (binding) >>>>> >>>>> Smoketester i

[VOTE] Release Lucene 9.4.0 RC3

2022-09-27 Thread Michael Sokolov
Please vote for release candidate 3 for Lucene 9.4.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.0-RC3-rev-d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRe

Re: [VOTE] Release Lucene 9.4.0 RC2

2022-09-27 Thread Michael Sokolov
sing the >>> LatLonPoint field, see https://github.com/apache/lucene/issues/11824. >>> >>> It feels like an important regression so it might be worth a respinning. >>> Sorry about that. >>> >>> >>> On Mon, Sep 26, 2022 at 10:30 PM Anshum

[VOTE] Release Lucene 9.4.0 RC2

2022-09-26 Thread Michael Sokolov
Please vote for release candidate 2 for Lucene 9.4.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.0-RC2-rev-0384b4fcad7856ddc574c8b994c814a568ce6d0a You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRe

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
gt; Am 26.09.2022 um 15:51 schrieb Michael Sokolov: > > Hm the build failed with this: > > > > FAILURE: Build failed with an exception. > > > > * What went wrong: > > Execution failed for task ':lucene:core:compileMain19Java'. > >> Error while ev

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
I need to install JDK19, or is there some problem in our build scripts? If I install will it autodetect?? On Mon, Sep 26, 2022 at 9:36 AM Michael Sokolov wrote: > > Nice! Thanks everyone, I will refresh and start building the artifacts > > On Mon, Sep 26, 2022 at 9:33 AM Uwe Schindler w

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
; >>> >>>>> >>> >>>>> (no vote) >>> >>>>> >>> >>>>> SUCCESS! [1:12:31.588303] >>> >>>>> >>> >>>>> >>> >>>>> On Thu, Sep 22, 2022 a

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
Sep 21, 2022 at 9:19 PM Michael McCandless >>>>>> wrote: >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> >>>>>>> SUCCESS! [0:27:16.514391] >>>>>>> >>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-21 Thread Michael Sokolov
release. The vote is still ongoning, so we > > have all options. > > > > Uwe > > > > Am 21.09.2022 um 14:05 schrieb Michael Sokolov: > >> I see; I would kind of like to get the release out before ApacheCon > >> NA, which starts Oct 3. Do you think it&

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-21 Thread Michael Sokolov
icsearch with JDK 19. No risk, it only activates when you enable it. > > Thoughts? > > Uwe > > Am 02.09.2022 um 21:42 schrieb Michael Sokolov: > > NOTICE: > > Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch. > > Please observe the norma

[VOTE] Release Lucene 9.4.0 RC1

2022-09-20 Thread Michael Sokolov
Please vote for release candidate 1 for Lucene 9.4.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.0-RC1-rev-f5d0646daa5651f2192282ac85551bca667e34f9 You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRe

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-20 Thread Michael Sokolov
and publish my local ann-benchmarks set-up so that >> it's not so fragile! >> >> In summary, with your latest fix the recall and QPS look good to me -- I >> don't detect any regression between 9.3 and 9.4. >> >> Julie >> >> On Mon, Sep

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-19 Thread Michael Sokolov
s tomorrow to > double-check there's no drop. It would also be nice to formalize the > ann-benchmarks set-up and run it regularly (like we've discussed in > https://github.com/apache/lucene/issues/10665). > > Julie > > On Mon, Sep 19, 2022 at 10:33 AM Michael So

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-19 Thread Michael Sokolov
809 695.236 > n_cands=120 0.843 948.908 0.843 525.914 > n_cands=200 0.878 671.781 0.878 351.529 > n_cands=400 0.918 392.265 0.918 207.854 > n_cands=600 0.937 282.403 0.937 144.311 > n_cands=800 0.949 214.620 0.949 116.875 > > On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov > wrote

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-18 Thread Michael Sokolov
float operations? It would be a little surprising if that were the case given the small number of branchings compared to the number of multiplies in dot-product though. On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov wrote: > > Thanks for the deep-dive Julie. I was able to reproduce t

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-18 Thread Michael Sokolov
just backported the change. > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov wrote: >> >> it looks like a small bug fix, we have had on main (and 9.x?) for a >> while now and no test failures showed up, I guess. Should be OK to >> port. I plan to cut artifacts this w

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-15 Thread Michael Sokolov
; Thanks for running more tests, Michael. >> It is encouraging that you saw a similar performance between 9.3 and 9.4. I >> will also run more tests with different parameters. >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov wrote: >>> >>> As a follow

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-13 Thread Michael Sokolov
--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov wrote: > > I ran another test. I thought I had increased the RAM buffer size to > 8G and heap to 16G. However I still see two segments in

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-13 Thread Michael Sokolov
iven value must be less that 2GB (2048MB) * * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB */ On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov wrote: > > Hi Mayya, thanks for persisting - I think we need to wrestle this to > the ground for sure. In the test I ran, RAM buffer was th

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-12 Thread Michael Sokolov
drop in QPS in 9.4. > > Thank you. > > > > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward wrote: >> >> Done. Thanks! >> >> > On 9 Sep 2022, at 16:32, Michael Sokolov wrote: >> > >> > Hi Alan - I checked out the interval quer

Lucene 9.4 release notes draft

2022-09-09 Thread Michael Sokolov
Hi all I published a draft of the release notes here: https://cwiki.apache.org/confluence/display/LUCENE/Release+Notes+9.4 Please review and feel free to make corrections/additions directly in confluence. I didn't include everything in CHANGES, so I may have missed something that deserves a mentio

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-09 Thread Michael Sokolov
a problem with interval queries. Am I OK to port this to the 9.4 branch? > > Thanks, Alan > > On 2 Sep 2022, at 20:42, Michael Sokolov wrote: > > NOTICE: > > Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch. > > Please observe the normal rules:

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-08 Thread Michael Sokolov
ble >> > branch. >> >> Then the GitHub Milestone for 9.5 also needs to be created. >> >> This time, I created Milestone 9.5.0. We should include it in the release >> process. >> https://github.com/apache/lucene/milestone/4 >> >> >> 2

Re: release notes question

2022-09-03 Thread Michael Sokolov
gt; > > On Fri, Sep 2, 2022 at 3:46 PM Michael Sokolov wrote: > > > > Hi Lucene devs, I'm going through the release manager script, and > > coming to the point where it talks about writing release notes. It > > suggests starting from a previous release note on the

release notes question

2022-09-02 Thread Michael Sokolov
Hi Lucene devs, I'm going through the release manager script, and coming to the point where it talks about writing release notes. It suggests starting from a previous release note on the confluence wiki, but it seems we haven't been using that for 9.x releases. Can previous release managers give so

Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-02 Thread Michael Sokolov
NOTICE: Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch. Please observe the normal rules: * No new features may be committed to the branch. * Documentation patches, build patches and serious bug fixes may be committed to the branch. However, you should submit all pa

Re: Lucene 9.4.0 release

2022-09-01 Thread Michael Sokolov
t; > > On Thu, Sep 1, 2022 at 11:04 AM Michael Sokolov wrote: >> >> Thanks Tomoko - I appreciate the offer to review the changes needed. I >> will take care of updating the release script/template. >> >> I think I managed to get a GPG key registered and signe

Re: Lucene 9.4.0 release

2022-09-01 Thread Michael Sokolov
7;t think that is >> > true for GitHub ... >> >> You do not need any special permissions to make new Milestones on GitHub. >> Every committer already has permission to create/close/delete Milestones, >> you can test it here. >> https://github.com/apache/lucen

Re: [JENKINS] Lucene » Lucene-NightlyTests-9.x - Build # 302 - Unstable!

2022-09-01 Thread Michael Sokolov
This was a bug in the test; I fixed on 9.x here: https://github.com/apache/lucene/pull/11732, will also cherry-pick to main On Wed, Aug 31, 2022 at 11:09 PM Apache Jenkins Server wrote: > > Build: > https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/302/ > > 1 tests failed. > FA

Re: Lucene 9.4.0 release

2022-08-31 Thread Michael Sokolov
that may be > built on Jira. > I hope other people help to interpret Jira-related things on the way into > the language of GitHub issues. > > Tomoko > > 2022年9月1日(木) 3:40 Michael Sokolov : > >> Thanks for the links, Tomoko. I thought it would be helpful to ask on &g

Re: [lucene] branch main updated: SimpleText knn vectors; fix searchExhaustively and suppress a byte format test case (#11725)

2022-08-31 Thread Michael Sokolov
following commit(s) were added to refs/heads/main by this push: > > new 61ef031f7fa SimpleText knn vectors; fix searchExhaustively and > > suppress a byte format test case (#11725) > > 61ef031f7fa is described below > > > > commit 61ef031f7fa3abdd7c8c2f36db71ad2289b66

Re: Lucene 9.4.0 release

2022-08-31 Thread Michael Sokolov
ps://github.com/apache/lucene/blob/main/dev-docs/github-issues-howto.md >> >> Is this unclear to you? >> >> >> 2022年8月31日(水) 23:13 Michael Sokolov : >>> >>> Hi, I'd like to start the ball rolling for a 9.4.0 release. We don't >>> have

Lucene 9.4.0 release

2022-08-31 Thread Michael Sokolov
a-priority%5C%3AMajor+ to find Major issues, for example, and it seems to only find one Minor one? Does anyone have better github-search-fu? API Changes - * LUCENE-10577: Add VectorEncoding to enable byte-encoded HNSW vectors (Michael Sokolov, Julie Tibshirani) New Features -

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-18) - Build # 36650 - Unstable!

2022-08-28 Thread Michael Sokolov
America/Dawson > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -p lucene/core > > This is both on a mac and on linux. I think the multiplier or some > other option may be affecting the reproducibility? > > Dawid > > On Sun, Aug 28, 2022 at 12:08 AM Michael Sokolov wrote: > > &

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-18) - Build # 36650 - Unstable!

2022-08-27 Thread Michael Sokolov
This did not reproduce for me (on JDK17) even with -Ptests.iters=1000. Tried beasting 100 times too, who knows. Since there are 20 bytes in the actual value, but we expected 5, the 4x multiplier sure looks like confusion of floats and bytes. It's scary if some other test is some impacting this. Not

Re: Label vs. Milestone for version management?

2022-08-25 Thread Michael Sokolov
Tomoko - sorry to re-raise this when we thought it had been settled. Having never really used github issues, I don't think I fully understood the arguments there. On Thu, Aug 25, 2022 at 3:50 AM Tomoko Uchida wrote: > > Hi all. > > I once proposed using Milestone for version management in GitHub

Re: Label vs. Milestone for version management?

2022-08-25 Thread Michael Sokolov
o find out about these, but I think it's better if we can look them up in the issue db. On Thu, Aug 25, 2022 at 9:40 AM Robert Muir wrote: > > On Thu, Aug 25, 2022 at 6:11 AM Michael Sokolov wrote: > > > > The milestone looks appealing since it is prominent and relatively easy

Re: Label vs. Milestone for version management?

2022-08-25 Thread Michael Sokolov
The milestone looks appealing since it is prominent and relatively easy to use. The only drawback I have heard is that it is single valued. It still seems we could use it to document the first version in which something is released, although it wouldn't be possible to record other releases into whi

Re: [ANNOUNCE] Issue migration Jira to GitHub starts on Monday, August 22

2022-08-24 Thread Michael Sokolov
Thanks! It seems to be working nicely. Question about the fix-version: tagging. I wonder if going forward we want to main that for new issues? I happened to notice there is also this "milestone" feature in github -- does that seem like a place to put version information? On Wed, Aug 24, 2022 at 3

Re: [JENKINS] Lucene » Lucene-Coverage-main - Build # 500 - Unstable!

2022-08-19 Thread Michael Sokolov
This test asserts that we return the same documents in the same order when the index is sorted and when it's not, but failed because the scores for two documents were equal, and they end up sorting differently due to docid tiebreaking, which is *not* the same under a sorted index. Not sure what the

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 728 - Still Unstable!

2022-08-13 Thread Michael Sokolov
This didn't reproduce for me, but I can see that the error message is different in SimpleTextKnnVectorsReader, so I'll update that On Sat, Aug 13, 2022 at 1:51 AM Apache Jenkins Server wrote: > > Build: > https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/728/ > > 3 tests faile

Re: [HELP] Please spot-check the migrated Lucene GitHub issues!

2022-08-09 Thread Michael Sokolov
Yes, looks amazing! All I could find was: This one https://github.com/mocobeta/forks-migration-test-2/issues/8964 seems to be missing its attachment - not sure if this was expected with this round? EG https://raw.githubusercontent.com/apache/lucene-jira-archive/attachments/attachments/LUCENE-9004/

Re: [HELP] Please spot-check the migrated Lucene GitHub issues!

2022-07-30 Thread Michael Sokolov
now of any issues like that (except this SPAM!), so I'd be happy if we don't change anything here :) On Sat, Jul 30, 2022 at 6:12 PM Michael Sokolov wrote: > > I did some spot-checking. ooh, it looks so nice! > > I have one suggestion, totally optional/cosmetic, but I wonder

Re: [HELP] Please spot-check the migrated Lucene GitHub issues!

2022-07-30 Thread Michael Sokolov
I did some spot-checking. ooh, it looks so nice! I have one suggestion, totally optional/cosmetic, but I wonder if we could make the original comment authors' names more prominent by moving the [Legacy Jira: ${Name} (@${user}) on ${date}] to the top of each comment rather than the bottom? That wou

Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-28 Thread Michael Sokolov
Thanks David On Wed, Jul 27, 2022 at 5:13 PM David Smiley wrote: > > FYI I had filed https://issues.apache.org/jira/browse/INFRA-23503 > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Jul 26, 2022 at 3:5

Re: Welcome Vigya Sharma as Lucene committer

2022-07-28 Thread Michael Sokolov
Welcome Vigya! On Thu, Jul 28, 2022, 6:48 AM Michael McCandless wrote: > Welcome Vigya!! > > Mike > > On Thu, Jul 28, 2022 at 5:28 AM Lu Xugang > wrote: > >> Congratulations, and welcome Vigya! >> >> Xugang >> >> www.amazingkoala.com.cn >> >> >> >> >> On Jul 28, 2022, at 17:21, Ignacio Vera

Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-26 Thread Michael Sokolov
searching JIRA for "slkjfdf" I found a few issues in other projects, but none seems to be getting the same degree of spam love On Tue, Jul 26, 2022 at 3:50 PM Mike Sokolov (Jira) wrote: > > > [ > https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issueta

Re: Lucene 9.3.0 release

2022-07-21 Thread Michael Sokolov
LUCENE-10592, it's also a big >>>> change, maybe we should not even try to get it in before cutting the >>>> branch? >>>> >>>> On Tue, Jul 19, 2022 at 4:09 PM Mayya Sharipova >>>> wrote: >>>>> >>>>> Thanks for

Re: Lucene 9.3.0 release

2022-07-21 Thread Michael Sokolov
Tue, Jul 19, 2022 at 4:09 PM Mayya Sharipova >>>> wrote: >>>>> >>>>> Thanks for the reminder about the release, Ignacio! >>>>> About LUCENE-10592 I will see what progress we can make today, and will >>>>> let you know be

Re: Lucene 9.3.0 release

2022-07-21 Thread Michael Sokolov
;> On Tue, Jul 19, 2022 at 4:09 PM Mayya Sharipova >>>> wrote: >>>> >>>>> Thanks for the reminder about the release, Ignacio! >>>>> About LUCENE-10592 >>>>> <https://issues.apache.org/jira/browse/LUCENE-10592> I will see wha

Re: Lucene 9.3.0 release

2022-07-19 Thread Michael Sokolov
;> >> >> On Tue, Jul 12, 2022 at 2:50 PM Ignacio Vera wrote: >> >>> Thanks for the heads up, I am planning to cut the brunch middle next >>> week, Wednesday July 20th. >>> Let me know at the beginning of next week if there is any issue from >>>

Re: [DISCUSS] Read-only Jira after the GitHub issues migration?

2022-07-17 Thread Michael Sokolov
I think we'd still have the mailing lists open for discussion. So anyone not willing or able to use GitHub would still be able to participate in a meaningful way. Having two parallel bug trackers seems much less useful to me. I'd rather have people emailing to a list that is active rather than post

Build failures

2022-07-16 Thread Michael Sokolov
Sorry for all the noise. I think it may be a botched backport of the timeout support I did yesterday. Will look at it today

Re: Lucene 9.3.0 release

2022-07-11 Thread Michael Sokolov
I would like to see if we can get https://issues.apache.org/jira/browse/LUCENE-10577 in. It is working and gives nice gains, but there is some controversy about the API. If we can't get it sorted out this week(?) it can certainly slip to the next revision. I know that https://issues.apache.org/jira

Re: How to avoid double-emails on all git issue/PR updates?

2022-07-11 Thread Michael Sokolov
Oh! thank you - this will be a big help. I just went to https://github.com/apache/lucene and then under "Watch" selected "participating and mentions" instead of "all activity" (which I had before). On Mon, Jul 11, 2022 at 5:46 AM Uwe Schindler wrote: > > Hi, > > I fully agree with Adrien, because

Re: A prototype migration tool Jira to GitHub

2022-06-26 Thread Michael Sokolov
as for this access control/script monitoring problem, I wonder whether we could import all the issues into a new github repo owned by whomever is running the script, and then transfer from there to the lucene repo? It would be an extra step involving another script (or something), but maybe(?) that

Re: A prototype migration tool Jira to GitHub

2022-06-23 Thread Michael Sokolov
d, so apologies if this is a duplicate: >>>> >>>> Did you check >>>> https://spring.io/blog/2021/01/07/spring-data-s-migration-from-jira-to-github-issues >>>> >>>> They especially write there is an api that doesn't trigger notifications. >>>&g

Re: A prototype migration tool Jira to GitHub

2022-06-23 Thread Michael Sokolov
Yes thank you! You say this is not difficult, but it looks like a big job to me! Here are a bunch of things I noticed that we would ideally address (from looking at one long and complex issue, LUCENE-9004). I wouldn't be so bold as to say these should block us from proceeding if they're not address

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-20 Thread Michael Sokolov
I think the user mapping must be inferred based on membership in the Apache "organization" https://github.com/settings/organizations On Sun, Jun 19, 2022 at 2:45 AM Dawid Weiss wrote: > > >> User id mapping is an important consideration for me. > > > Some mapping has to be present somewhere alrea

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-15 Thread Michael Sokolov
Agree with everyone here. Also consider that if we duplicate there will be two copies of the same issue, and they will inevitably diverge... On Wed, Jun 15, 2022 at 9:28 AM Jan Høydahl wrote: > > +1 for a manual approach > > Over time the volume will gravitate to mostly GitHub issues. And JIRA wi

Re: exposing per-field storage usage

2022-06-14 Thread Michael Sokolov
2 at 11:15 AM Robert Muir wrote: >> >> On Tue, Jun 14, 2022 at 10:37 AM Michael Sokolov wrote: >> > >> > Oh, yes that's a clever idea. It seems it would take quite a while >> > (tens of minutes?) for a larger index though? Much faster than the >> &

Re: exposing per-field storage usage

2022-06-14 Thread Michael Sokolov
Oh, yes that's a clever idea. It seems it would take quite a while (tens of minutes?) for a larger index though? Much faster than the force-merge solution for sure. I guess to get faster we would have to instrument each format. I mean they generally do know how much space each field is occupying, b

exposing per-field storage usage

2022-06-13 Thread Michael Sokolov
At Amazon, we have a need to produce regular metrics on how much disk storage is consumed by each field. We manage an index with data contributed by many teams and business units and we are often asked to produce reports attributing index storage usage to these customers. The best tool we have for

Re: Welcome Lu Xugang as Lucene committer

2022-06-07 Thread Michael Sokolov
Welcome and thanks for spreading the word; your amazingkoala blog looks very active (although I can't read it :() On Thu, Jun 2, 2022 at 4:09 PM Mikhail Khludnev wrote: > > Welcome, Lu. > > On Wed, Jun 1, 2022 at 12:59 PM 陆徐刚 wrote: >> >> Thanks Adrien for the announcement and all for the welcom

Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-06-07 Thread Michael Sokolov
Sorry I missed the first vote I think; also +1(pmc) from me. I'd be OK with some issues (esp. closed ones) being orphaned in the old system too. On Tue, Jun 7, 2022 at 9:20 AM Dawid Weiss wrote: > > > I'm fine with either system (or both used concurrently). There is significant > research effort

Re: Welcome Greg Miller to the Lucene PMC

2022-06-07 Thread Michael Sokolov
Welcome Greg [copying from other thread, oops!] On Tue, Jun 7, 2022 at 11:41 AM Houston Putman wrote: > > Welcome Greg! > > On Tue, Jun 7, 2022 at 11:35 AM Gautam Worah wrote: >> >> Congratulations Greg! >> >> On Tue, Jun 7, 2022 at 8:04 AM Patrick Zhai wrote: >>> >>> Congrats Greg! >>> >>> Pat

Re: 30% query performance degradation for documents with small stored fields

2022-06-07 Thread Michael Sokolov
I wonder whether it would be worth trying switching from stored fields to doc values. The access patterns are different, so the change would not be trivial, but you might be able to achieve gains this way - I really am not sure whether or not you would, the storage model is completely different, bu

Re: module not found error in intellij

2022-06-03 Thread Michael Sokolov
j compilation mode. >>> It's hacky but I've done it in the past. >>> >>> When I switch to (my preferred) intellij compilation, things break. This >>> is definitely a regression in IntelliJ somewhere because it used to work >>> very recently -

Re: module not found error in intellij

2022-06-02 Thread Michael Sokolov
ssue tracker), they are just not our bugs... > > > 2022年6月3日(金) 0:17 Michael Sokolov : > > > > glad to know I'm not the only one! I think it's not OK though. Running > > tests in IDE is super useful, especially for debugging, but also for > > visualizing c

Re: module not found error in intellij

2022-06-02 Thread Michael Sokolov
t; console. > ./gradlew -p lucene/core.tests/ test > > I'm not sure the exact cause of that though IDEs' java module support > looks far from perfect for now, I would recommend not to use IDE when > running modular tests... > > Tomoko > > 2022年6月2日(木) 23:44 Michae

module not found error in intellij

2022-06-02 Thread Michael Sokolov
In IntelliJ building Lucene main branch I see this: .../workspace/lucene/lucene/core.tests/src/test/module-info.java:23: error: module not found: org.apache.lucene.core.tests.main requires org.apache.lucene.core.tests.main; ^ Am I doing it wrong? Does anyb

Re: Welcome Lu Xugang as Lucene committer

2022-06-01 Thread Michael Sokolov
Welcome! I like finally too, but it seems strange that it has nothing to do with its apparent relative, final. On Wed, Jun 1, 2022 at 4:51 PM Gus Heck wrote: > Welcome and congratulations :) > > On Wed, Jun 1, 2022 at 3:32 PM Alessandro Benedetti > wrote: > >> Welcome on board Xugang! >> --

Re: Welcome Chris Hegarty as Lucene committer

2022-06-01 Thread Michael Sokolov
Welcome Chris! I remember being part of a skeptical bunch of students in 1990 hearing about this new Java thing that was supposedly going to take over the world. Apparently it is still thriving :) -Mike On Wed, Jun 1, 2022 at 12:59 PM David Smiley wrote: > > Welcome Chris! -

Re: Adding a new PointDocValuesField

2022-05-25 Thread Michael Sokolov
Also, there should be examples from other fields. Suppose you are indexing map data and want to support a UI that shows "hot spots" on the map where there is a lot of let's say ... activity of some sort. You'd like to facet on 2-d areas. Or for log analytics -- you want to do anomaly detection and

Re: [VOTE] Release Lucene 9.2.0 RC2

2022-05-20 Thread Michael Sokolov
+1 SUCCESS! [0:49:44.832567] JDK11 only On Fri, May 20, 2022 at 4:46 PM Houston Putman wrote: > > +1 > > SUCCESS! [2:17:07.370407] (java 11 & 17) > > - Houston > > On Fri, May 20, 2022 at 8:04 AM Jan Høydahl wrote: >> >> +1 >> >> SUCCESS! [1:13:38.226868] >> >> Jan >> >> > 19. mai 2022 kl. 17:1

Re: [VOTE] Release Lucene 9.2.0 RC1

2022-05-18 Thread Michael Sokolov
+1 SUCCESS! [0:43:09.481661] I'm not going to get hung up on an issue with the smokeTester if Robert's not :) BTW thank you for running on slow machine that takes many hours! On Wed, May 18, 2022 at 3:48 PM Robert Muir wrote: > > I opened issue about this. It shouldn't block the release, but it

Re: [GitHub] [lucene] msokolov commented on pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-13 Thread Michael Sokolov
Okay sorry I was confused about these override methods - they are different because of the different access patterns in the sparse/dense cases. Maybe the loss of history was unavoidable since we moved/renamed the file, but I wish we could maintain it. On Fri, May 13, 2022 at 1:45 PM GitBox wrote:

Re: [GitHub] [lucene] jpountz commented on pull request #859: LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode

2022-05-13 Thread Michael Sokolov
+1 to back port. It will make things more consistent at least On Thu, May 12, 2022, 11:36 AM GitBox wrote: > > jpountz commented on PR #859: > URL: https://github.com/apache/lucene/pull/859#issuecomment-1125144256 > >FWIW I found about this PR because it is in the 9.2 changelog on `main` > b

<    1   2   3   4   5   6   >