Re: [jira] [Commented] (LUCENE-9983) Stop sorting determinize powersets unnecessarily
Hi Mike, Looking at the algorithm it's the opposite - it tries hard not to iterate over all the combinations. The "power sets" come from combinations of accessible sets of states using different NFA paths. This example is pretty good: https://en.wikipedia.org/wiki/Powerset_construction#Example Dawid On Tue, Jun 1, 2021 at 1:11 AM Michael Sokolov wrote: > I haven't looked at this at all, so please disregard if I'm off base, but > it sounds as if we are materializing power sets? Would it be better to just > store an array of the members and generate the combinations iteratively on > demand? > > On Mon, May 31, 2021, 5:27 PM Haoyu Zhai (Jira) wrote: > >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354660#comment-17354660 >> ] >> >> Haoyu Zhai commented on LUCENE-9983: >> >> >> I've added a simple static counter just for the adversarial test, and >> here's the stats: >> * {{incr}} called: 106073079 >> * entry added to set: 100076079 >> * {{decr}} called: 106069079 >> * entry removed from set: 100072079 >> * {{computeHash}} called: 40057 >> * {{freeze}} called: 14056 >> >> So seems to me my guess above holds, we're doing way more put/remove >> entry operations than others >> >> > Stop sorting determinize powersets unnecessarily >> > >> > >> > Key: LUCENE-9983 >> > URL: https://issues.apache.org/jira/browse/LUCENE-9983 >> > Project: Lucene - Core >> > Issue Type: Improvement >> >Reporter: Michael McCandless >> >Priority: Major >> > Time Spent: 40m >> > Remaining Estimate: 0h >> > >> > Spinoff from LUCENE-9981. >> > Today, our {{Operations.determinize}} implementation builds powersets >> of all subsets of NFA states that "belong" in the same determinized state, >> using [this algorithm|https://en.wikipedia.org/wiki/Powerset_construction >> ]. >> > To hold each powerset, we use a malleable {{SortedIntSet}} and >> periodically freeze it to a {{FrozenIntSet}}, also sorted. We pay a high >> price to keep these growing maps of int key, int value sorted by key, e.g. >> upgrading to a {{TreeMap}} once the map is large enough (> 30 entries). >> > But I think sorting is entirely unnecessary here! Really all we need >> is the ability to add/delete keys from the map, and hashCode / equals (by >> key only – ignoring value!), and to freeze the map (a small optimization >> that we could skip initially). We only use these maps to lookup in the >> (growing) determinized automaton whether this powerset has already been >> seen. >> > Maybe we could simply poach the {{IntIntScatterMap}} implementation >> from [HPPC|https://github.com/carrotsearch/hppc]? And then change its >> {{hashCode}}/{{equals }}to only use keys (not values). >> > This change should be a big speedup for the kinds of (admittedly >> adversarial) regexps we saw on LUCENE-9981. >> > >> >> >> >> -- >> This message was sent by Atlassian Jira >> (v8.3.4#803005) >> >> - >> To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org >> For additional commands, e-mail: issues-h...@lucene.apache.org >> >>
Re: Welcome Greg Miller as Lucene committer
Congrats Greg!! On Mon, May 31, 2021 at 9:37 AM Gautam Worah wrote: > Congratulations Greg :) > > On Mon, May 31, 2021, 8:02 AM Ilan Ginzburg wrote: > >> Congrats Greg! >> >> On Sun, May 30, 2021 at 4:35 PM Greg Miller wrote: >> >>> Thanks everyone! I'm honored to have been nominated and look forward >>> to continuing to work with all of you on Lucene! I'm incredibly >>> grateful for everyone that has helped me so far. There's a lot to >>> learn in Lucene and this community has been a fantastic help ramping >>> up, providing thorough PR feedback/ideas/etc. and simply been a great >>> group of people to collaborate with. >>> >>> As far as a brief bio goes, I live in the Seattle area and work for >>> Amazon's "Product Search" team, which I joined in January of this >>> year. I'm a naturally curious person and find myself fascinated by >>> data structure / algorithm problems, so diving into Lucene has been >>> really fun! I'm also an avid runner (mostly marathons but right now >>> I'm training for my first one-mile race on a track), and love to >>> travel with my wife and daughter (although that's been on "pause" for >>> obvious reasons for the past year+). My biggest accomplishment of 2021 >>> so far has been teaching my daughter to ride a bike, but being >>> nominated as a Lucene committer is a close second :) >>> >>> Thanks again everyone and looking forward to continuing to work with all >>> of you! >>> >>> Cheers, >>> -Greg >>> >>> On Sat, May 29, 2021 at 7:59 PM Michael McCandless >>> wrote: >>> > >>> > Welcome Greg! >>> > >>> > Mike >>> > >>> > On Sat, May 29, 2021 at 3:47 PM Adrien Grand >>> wrote: >>> >> >>> >> I'm pleased to announce that Greg Miller has accepted the PMC's >>> invitation to become a committer. >>> >> >>> >> Greg, the tradition is that new committers introduce themselves with >>> a brief bio. >>> >> >>> >> Congratulations and welcome! >>> >> >>> >> >>> >> -- >>> >> Adrien >>> > >>> > -- >>> > Mike McCandless >>> > >>> > http://blog.mikemccandless.com >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>>
Re: [jira] [Commented] (LUCENE-9983) Stop sorting determinize powersets unnecessarily
I haven't looked at this at all, so please disregard if I'm off base, but it sounds as if we are materializing power sets? Would it be better to just store an array of the members and generate the combinations iteratively on demand? On Mon, May 31, 2021, 5:27 PM Haoyu Zhai (Jira) wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354660#comment-17354660 > ] > > Haoyu Zhai commented on LUCENE-9983: > > > I've added a simple static counter just for the adversarial test, and > here's the stats: > * {{incr}} called: 106073079 > * entry added to set: 100076079 > * {{decr}} called: 106069079 > * entry removed from set: 100072079 > * {{computeHash}} called: 40057 > * {{freeze}} called: 14056 > > So seems to me my guess above holds, we're doing way more put/remove entry > operations than others > > > Stop sorting determinize powersets unnecessarily > > > > > > Key: LUCENE-9983 > > URL: https://issues.apache.org/jira/browse/LUCENE-9983 > > Project: Lucene - Core > > Issue Type: Improvement > >Reporter: Michael McCandless > >Priority: Major > > Time Spent: 40m > > Remaining Estimate: 0h > > > > Spinoff from LUCENE-9981. > > Today, our {{Operations.determinize}} implementation builds powersets of > all subsets of NFA states that "belong" in the same determinized state, > using [this algorithm|https://en.wikipedia.org/wiki/Powerset_construction > ]. > > To hold each powerset, we use a malleable {{SortedIntSet}} and > periodically freeze it to a {{FrozenIntSet}}, also sorted. We pay a high > price to keep these growing maps of int key, int value sorted by key, e.g. > upgrading to a {{TreeMap}} once the map is large enough (> 30 entries). > > But I think sorting is entirely unnecessary here! Really all we need is > the ability to add/delete keys from the map, and hashCode / equals (by key > only – ignoring value!), and to freeze the map (a small optimization that > we could skip initially). We only use these maps to lookup in the > (growing) determinized automaton whether this powerset has already been > seen. > > Maybe we could simply poach the {{IntIntScatterMap}} implementation from > [HPPC|https://github.com/carrotsearch/hppc]? And then change its > {{hashCode}}/{{equals }}to only use keys (not values). > > This change should be a big speedup for the kinds of (admittedly > adversarial) regexps we saw on LUCENE-9981. > > > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005) > > - > To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org > For additional commands, e-mail: issues-h...@lucene.apache.org > >
Re: How to "gradlew beast" a single test case on multiple cores?
I don't think there is, I explored it as part of LUCENE-9465. https://issues.apache.org/jira/browse/LUCENE-9465?focusedCommentId=17178912&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17178912 D. On Mon, May 31, 2021 at 7:08 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Hi Team, > > "gradle beast" takes -Ptests.iters=N and -Ptests.dups=M, to run many > iterations with and without new Random seed (I think), and you can pass it > a single test case e.g. "--tests TestRegexpRandom2". > > But this seems to then run all test iterations on a single test JVM, even > if the gradle properties lists a high tests.jvms? At least, that's what > seems to be happening when I watch in "top". > > Is there some simple way to beast a single test, many times, concurrently, > across many test JVMs? > > Mike McCandless > > http://blog.mikemccandless.com >
How to "gradlew beast" a single test case on multiple cores?
Hi Team, "gradle beast" takes -Ptests.iters=N and -Ptests.dups=M, to run many iterations with and without new Random seed (I think), and you can pass it a single test case e.g. "--tests TestRegexpRandom2". But this seems to then run all test iterations on a single test JVM, even if the gradle properties lists a high tests.jvms? At least, that's what seems to be happening when I watch in "top". Is there some simple way to beast a single test, many times, concurrently, across many test JVMs? Mike McCandless http://blog.mikemccandless.com
Re: Welcome Greg Miller as Lucene committer
Congratulations Greg :) On Mon, May 31, 2021, 8:02 AM Ilan Ginzburg wrote: > Congrats Greg! > > On Sun, May 30, 2021 at 4:35 PM Greg Miller wrote: > >> Thanks everyone! I'm honored to have been nominated and look forward >> to continuing to work with all of you on Lucene! I'm incredibly >> grateful for everyone that has helped me so far. There's a lot to >> learn in Lucene and this community has been a fantastic help ramping >> up, providing thorough PR feedback/ideas/etc. and simply been a great >> group of people to collaborate with. >> >> As far as a brief bio goes, I live in the Seattle area and work for >> Amazon's "Product Search" team, which I joined in January of this >> year. I'm a naturally curious person and find myself fascinated by >> data structure / algorithm problems, so diving into Lucene has been >> really fun! I'm also an avid runner (mostly marathons but right now >> I'm training for my first one-mile race on a track), and love to >> travel with my wife and daughter (although that's been on "pause" for >> obvious reasons for the past year+). My biggest accomplishment of 2021 >> so far has been teaching my daughter to ride a bike, but being >> nominated as a Lucene committer is a close second :) >> >> Thanks again everyone and looking forward to continuing to work with all >> of you! >> >> Cheers, >> -Greg >> >> On Sat, May 29, 2021 at 7:59 PM Michael McCandless >> wrote: >> > >> > Welcome Greg! >> > >> > Mike >> > >> > On Sat, May 29, 2021 at 3:47 PM Adrien Grand wrote: >> >> >> >> I'm pleased to announce that Greg Miller has accepted the PMC's >> invitation to become a committer. >> >> >> >> Greg, the tradition is that new committers introduce themselves with a >> brief bio. >> >> >> >> Congratulations and welcome! >> >> >> >> >> >> -- >> >> Adrien >> > >> > -- >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: Welcome Greg Miller as Lucene committer
Congrats Greg! On Sun, May 30, 2021 at 4:35 PM Greg Miller wrote: > Thanks everyone! I'm honored to have been nominated and look forward > to continuing to work with all of you on Lucene! I'm incredibly > grateful for everyone that has helped me so far. There's a lot to > learn in Lucene and this community has been a fantastic help ramping > up, providing thorough PR feedback/ideas/etc. and simply been a great > group of people to collaborate with. > > As far as a brief bio goes, I live in the Seattle area and work for > Amazon's "Product Search" team, which I joined in January of this > year. I'm a naturally curious person and find myself fascinated by > data structure / algorithm problems, so diving into Lucene has been > really fun! I'm also an avid runner (mostly marathons but right now > I'm training for my first one-mile race on a track), and love to > travel with my wife and daughter (although that's been on "pause" for > obvious reasons for the past year+). My biggest accomplishment of 2021 > so far has been teaching my daughter to ride a bike, but being > nominated as a Lucene committer is a close second :) > > Thanks again everyone and looking forward to continuing to work with all > of you! > > Cheers, > -Greg > > On Sat, May 29, 2021 at 7:59 PM Michael McCandless > wrote: > > > > Welcome Greg! > > > > Mike > > > > On Sat, May 29, 2021 at 3:47 PM Adrien Grand wrote: > >> > >> I'm pleased to announce that Greg Miller has accepted the PMC's > invitation to become a committer. > >> > >> Greg, the tradition is that new committers introduce themselves with a > brief bio. > >> > >> Congratulations and welcome! > >> > >> > >> -- > >> Adrien > > > > -- > > Mike McCandless > > > > http://blog.mikemccandless.com > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Welcome Greg Miller as Lucene committer
Welcome, Greg! On Mon, May 31, 2021 at 7:46 AM Mayya Sharipova wrote: > Congratulations and welcome, Greg! > > On Mon, May 31, 2021 at 6:50 AM Jan Høydahl wrote: > >> Welcome, Greg! >> >> Jan >> >> > 29. mai 2021 kl. 21:47 skrev Adrien Grand : >> > >> > I'm pleased to announce that Greg Miller has accepted the PMC's >> invitation to become a committer. >> > >> > Greg, the tradition is that new committers introduce themselves with a >> brief bio. >> > >> > Congratulations and welcome! >> > >> > -- >> > Adrien >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: Welcome Greg Miller as Lucene committer
Congratulations and welcome, Greg! On Mon, May 31, 2021 at 6:50 AM Jan Høydahl wrote: > Welcome, Greg! > > Jan > > > 29. mai 2021 kl. 21:47 skrev Adrien Grand : > > > > I'm pleased to announce that Greg Miller has accepted the PMC's > invitation to become a committer. > > > > Greg, the tradition is that new committers introduce themselves with a > brief bio. > > > > Congratulations and welcome! > > > > -- > > Adrien > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Welcome Greg Miller as Lucene committer
Welcome, Greg! Jan > 29. mai 2021 kl. 21:47 skrev Adrien Grand : > > I'm pleased to announce that Greg Miller has accepted the PMC's invitation to > become a committer. > > Greg, the tradition is that new committers introduce themselves with a brief > bio. > > Congratulations and welcome! > > -- > Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Greg Miller as Lucene committer
Welcome and congratulations, Greg! Dawid On Sat, May 29, 2021 at 9:47 PM Adrien Grand wrote: > I'm pleased to announce that Greg Miller has accepted the PMC's invitation > to become a committer. > > Greg, the tradition is that new committers introduce themselves with a > brief bio. > > Congratulations and welcome! > > -- > Adrien >
Re: Welcome Greg Miller as Lucene committer
Congrats Greg! Koji On 2021/05/30 4:47, Adrien Grand wrote: I'm pleased to announce that Greg Miller has accepted the PMC's invitation to become a committer. Greg, the tradition is that new committers introduce themselves with a brief bio. Congratulations and welcome! -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Greg Miller as Lucene committer
Congrats! On Sun, May 30, 2021 at 4:47 AM Adrien Grand wrote: > I'm pleased to announce that Greg Miller has accepted the PMC's invitation > to become a committer. > > Greg, the tradition is that new committers introduce themselves with a > brief bio. > > Congratulations and welcome! > > -- > Adrien >