[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1612#comment-1612
 ] 

Adrien Grand commented on LUCENE-8920:
--

This sounds good Mike. I'm making it a blocker for 8.3 since we haven't 
reverted from branch_8x.

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-19 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8920:
-
Priority: Blocker  (was: Major)

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-19 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8920:
-
Fix Version/s: 8.3

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888722#comment-16888722
 ] 

Adrien Grand commented on LUCENE-8928:
--

I played with this idea a bit at 
https://github.com/jpountz/lucene-solr/commit/16e6594af44b753c9ac498a063eb9b9d6102e020
 and 
https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps.java
 with shapes. It's a bit artificial since we are using shapes to index points, 
but nevertheless I got 62% slower indexing (130 seconds instead of 80) but 45% 
faster searching for box queries (63.0 QPS instead of 43.5).

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-19 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8928:


 Summary: BKDWriter could make splitting decisions based on the 
actual range of values
 Key: LUCENE-8928
 URL: https://issues.apache.org/jira/browse/LUCENE-8928
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


Currently BKDWriter assumes that splitting on one dimension has no effect on 
values in other dimensions. While this may be ok for geo points, this is 
usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
could get better indexing by re-computing the range of values on each dimension 
before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 8.2.0

2019-07-19 Thread Adrien Grand
+1

On Thu, Jul 18, 2019 at 9:38 AM Ignacio Vera  wrote:
>
> Hi,
>
> As there is no blockers for the release of Lucene/Solr 8.2 and the branch is 
> stable I am planning to build the first Release candidate tomorrow (Friday). 
> Please let us know if there is any concern/ issue that needs to be dealt with 
> before moving to the next step.
>
>
> On Mon, Jul 15, 2019 at 11:32 PM Michael Sokolov  wrote:
>>
>> Thanks, good catch, I'll set the current version back to 6. I haven't
>> seen any comments on the (trivial) PR, so I'll push tonight in order
>> to keep the release train rolling
>>
>> On Mon, Jul 15, 2019 at 3:28 PM David Smiley  
>> wrote:
>> >
>> > Disable or rollback; I'm good either way.  I think you should un-bump the 
>> > FST version since the feature becomes entirely experimental.
>> >
>> > ~ David Smiley
>> > Apache Lucene/Solr Search Developer
>> > http://www.linkedin.com/in/davidwsmiley
>> >
>> >
>> > On Mon, Jul 15, 2019 at 12:34 PM Ishan Chattopadhyaya 
>> >  wrote:
>> >>
>> >> +1 to rollback and having a 8.3 as soon as we nail this down (even if 
>> >> that is days or 1-2 weeks after 8.2).
>> >>
>> >> On Mon, 15 Jul, 2019, 9:22 PM Michael Sokolov,  wrote:
>> >>>
>> >>> I guess whether we roll back depends on timing. I think we are close
>> >>> to a release though, and these changes are complex and will require
>> >>> further testing, so rollback seems reasonable to me. I think from code
>> >>> management perspective it will be simplest to disable direct
>> >>> addressing for now, rather than actually reverting the various commits
>> >>> that are in place. I can post a patch doing that today.
>> >>>
>> >>> I like the ideas you have for compressing FSTs further. It was
>> >>> bothering me that we store the labels needlessly. I do think that
>> >>> before making more radical changes to Arc though, I would like to add
>> >>> some encapsulation so that we can be a bit freer without being
>> >>> concerned about the abstraction leaking (Several classes depend on the
>> >>> Arc internals today). EG I'd like to make its members private and add
>> >>> getters. I know this is a performance-sensitive area, and maybe we had
>> >>> a reason for not using them? Do we have some experience that suggests
>> >>> that would be a performance issue? My assumption is that JIT
>> >>> compilation would make that free, but I haven't tested.
>> >>>
>> >>> On Mon, Jul 15, 2019 at 11:36 AM Adrien Grand  wrote:
>> >>> >
>> >>> > That would be great. I wonder that we could also make the encoding a
>> >>> > bit more efficient. For instance I noticed that arc metadata is pretty
>> >>> > large in some cases (in the 10-20 bytes) which make gaps very costly.
>> >>> > Associating each label with a dense id and having an intermediate
>> >>> > lookup, ie. lookup label -> id and then id->arc offset instead of
>> >>> > doing label->arc directly could save a lot of space in some cases?
>> >>> > Also it seems that we are repeating the label in the arc metadata when
>> >>> > array-with-gaps is used, even though it shouldn't be necessary since
>> >>> > the label is implicit from the address?
>> >>> >
>> >>> > Do you think we can have a mitigation for worst-case scenarii in 8.2
>> >>> > or should we revert from branch_8_2 to keep the release process going
>> >>> > and work on this for 8.3?
>> >>> >
>> >>> > On Mon, Jul 15, 2019 at 5:12 PM Michael Sokolov  
>> >>> > wrote:
>> >>> > >
>> >>> > > Thanks for the nice test, Adrien. Yes, the tradeoff of direct
>> >>> > > addressing is heavily data-dependent. I think we can improve the
>> >>> > > situation here by tracking, per-FST instance, the size increase we're
>> >>> > > seeing while building (or perhaps do a preliminary pass before
>> >>> > > building) in order to decide whether to apply the encoding.
>> >>> > >
>> >>> > > On Mon, Jul 15, 2019 at 9:02 AM Adrien Grand  
>> >>> > > wrote:
>> >>> > > >
>> >>> > > >

[jira] [Commented] (LUCENE-8924) Remove Fields Order Checks from CheckIndex?

2019-07-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887300#comment-16887300
 ] 

Adrien Grand commented on LUCENE-8924:
--

We rely on the order for merging, see "MultiFields".

> Remove Fields Order Checks from CheckIndex?
> ---
>
> Key: LUCENE-8924
> URL: https://issues.apache.org/jira/browse/LUCENE-8924
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> CheckIndex checks the order of fields read from the FieldsEnum for the 
> posting reader. Since we do not explicitly sort or use a sorted data 
> structure to represent keys (atleast explicitly), and no FieldsEnum depends 
> on the order apart from MultiFieldsEnum, which no longer exists.
>  
> Should we remove the check?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8909) Deprecate getFieldNames from IndexWriter

2019-07-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887291#comment-16887291
 ] 

Adrien Grand commented on LUCENE-8909:
--

+1

> Deprecate getFieldNames from IndexWriter
> 
>
> Key: LUCENE-8909
> URL: https://issues.apache.org/jira/browse/LUCENE-8909
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Munendra S N
>Priority: Major
> Attachments: LUCENE-8909.patch
>
>
> From SOLR-12368
> {quote}Would be nice to be able to remove IndexWriter.getFieldNames as well, 
> which was added in LUCENE-7659 only for this workaround.{quote}
> Once Solr task resolved, deprecate {{IndexWriter#getFieldNames}} from 8x and 
> remove it from master



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8908) Specified default value not returned for query() when doc doesn't match

2019-07-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887290#comment-16887290
 ] 

Adrien Grand commented on LUCENE-8908:
--

+1

> Specified default value not returned for query() when doc doesn't match
> ---
>
> Key: LUCENE-8908
> URL: https://issues.apache.org/jira/browse/LUCENE-8908
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Bill Bell
>Priority: Major
> Attachments: LUCENE-8908.patch, SOLR-7845.patch, SOLR-7845.patch
>
>
> The 2 arg version of the "query()" was designed so that the second argument 
> would specify the value used for any document that does not match the query 
> pecified by the first argument -- but the "exists" property of the resulting 
> ValueSource only takes into consideration wether or not the document matches 
> the query -- and ignores the use of the second argument.
> 
> The work around is to ignore the 2 arg form of the query() function, and 
> instead wrap he query function in def().
> for example:  {{def(query($something), $defaultval)}} instead of 
> {{query($something, $defaultval)}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8922) Speed up retrieval of top hits of DisjunctionMaxQuery

2019-07-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886944#comment-16886944
 ] 

Adrien Grand commented on LUCENE-8922:
--

Here is a patch. It uses the first clause that has a score greater than or 
equal to the minimum competitive score to lead iteration of impacts and 
propagates min competitive scores when the tie break multiplier is 0.

I ran wikibigall with the wikinightly tasks where I added 4 new tasks:
 - DisMaxHighMed: same as OrHighMed but with a DisjunctionMaxQuery and a tie 
break multiplier of 0.1
 - DisMaxHighHigh: same as OrHighHigh but with a DisjunctionMaxQuery and a tie 
break multiplier of 0.1
 - DisMax0HighMed: same as OrHighMed but with a DisjunctionMaxQuery and a tie 
break multiplier of 0
 - DisMax0HighHigh: same as OrHighHigh but with a DisjunctionMaxQuery and a tie 
break multiplier of 0

{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff
  Fuzzy1  177.71 (11.7%)  174.01 (11.2%)   
-2.1% ( -22% -   23%)
SloppyPhrase6.26  (6.1%)6.23  (6.2%)   
-0.4% ( -12% -   12%)
SpanNear2.32  (3.0%)2.32  (3.4%)   
-0.0% (  -6% -6%)
IntervalsOrdered0.85  (1.7%)0.85  (1.8%)
0.0% (  -3% -3%)
 Prefix3   47.79 (12.6%)   47.85 (12.7%)
0.1% ( -22% -   29%)
  OrHighHigh9.87  (2.8%)9.89  (2.8%)
0.2% (  -5% -5%)
  Phrase   70.88  (3.2%)   71.04  (3.1%)
0.2% (  -5% -6%)
Wildcard  128.13  (8.6%)  128.43  (9.0%)
0.2% ( -16% -   19%)
  AndHighMed   65.61  (3.5%)   65.85  (2.9%)
0.4% (  -5% -6%)
 AndHighHigh   36.41  (3.4%)   36.60  (3.1%)
0.5% (  -5% -7%)
 AndHighOrMedMed   25.99  (2.0%)   26.13  (1.8%)
0.5% (  -3% -4%)
   OrHighMed   36.42  (2.7%)   36.61  (2.6%)
0.5% (  -4% -5%)
  Fuzzy2   92.96 (16.1%)   93.59 (13.7%)
0.7% ( -25% -   36%)
  IntNRQ  132.08 (37.3%)  133.02 (38.0%)
0.7% ( -54% -  121%)
AndMedOrHighHigh   26.80  (2.0%)   27.07  (2.1%)
1.0% (  -3% -5%)
Term 1308.93  (3.6%) 1331.58  (3.7%)
1.7% (  -5% -9%)
   DisMaxHighMed   83.40  (3.1%)  111.26  (3.0%)   
33.4% (  26% -   40%)
  DisMaxHighHigh   54.28  (4.8%)   81.35  (4.1%)   
49.9% (  39% -   61%)
 DisMax0HighHigh   45.39  (5.7%)  217.70 (20.1%)  
379.6% ( 334% -  430%)
  DisMax0HighMed  129.09  (3.9%)  905.16 (16.5%)  
601.2% ( 558% -  646%)
{noformat}

> Speed up retrieval of top hits of DisjunctionMaxQuery
> -
>
> Key: LUCENE-8922
> URL: https://issues.apache.org/jira/browse/LUCENE-8922
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There a simple optimization that we are not doing in the case that 
> tieBreakMultiplier is 0: we could propagate the min competitive score to sub 
> clauses as-is.
> Even in the general case, we currently compute the block boundary of the 
> DisjunctionMaxQuery as the minimum of the block boundaries of its sub 
> clauses. This generates blocks that have very low score upper bounds but 
> unfortunately they are also very small, which means that we might sometimes 
> not make progress quickly enough.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master

2019-07-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886941#comment-16886941
 ] 

Adrien Grand commented on LUCENE-8923:
--

+1 Even if some changes are missing I think we'd benefit from pushing this 
rather soon so that developers don't automatically add their changes to 8.2 as 
the last minor.

> Release procedure does not add new version in CHANGES.txt in master
> ---
>
> Key: LUCENE-8923
> URL: https://issues.apache.org/jira/browse/LUCENE-8923
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Minor
> Attachments: LUCENE-8923.patch
>
>
> This issue is just to track something that maybe missing in the release 
> procedure. It currently adds a new version on CHANGES.txt in the minor 
> version branch but it does not do it in master.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8922) Speed up retrieval of top hits of DisjunctionMaxQuery

2019-07-17 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8922:


 Summary: Speed up retrieval of top hits of DisjunctionMaxQuery
 Key: LUCENE-8922
 URL: https://issues.apache.org/jira/browse/LUCENE-8922
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


There a simple optimization that we are not doing in the case that 
tieBreakMultiplier is 0: we could propagate the min competitive score to sub 
clauses as-is.

Even in the general case, we currently compute the block boundary of the 
DisjunctionMaxQuery as the minimum of the block boundaries of its sub clauses. 
This generates blocks that have very low score upper bounds but unfortunately 
they are also very small, which means that we might sometimes not make progress 
quickly enough.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886114#comment-16886114
 ] 

Adrien Grand commented on LUCENE-8883:
--

I have a slight preference for having "Optimizations" as one category.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch, LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8884) Add Directory wrapper to track per-query IO counters

2019-07-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886043#comment-16886043
 ] 

Adrien Grand commented on LUCENE-8884:
--

I'm not seeing any attachement on this JIRA, did you forget to attach a patch?

> Add Directory wrapper to track per-query IO counters
> 
>
> Key: LUCENE-8884
> URL: https://issues.apache.org/jira/browse/LUCENE-8884
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>
> Lucene's IO abstractions ({{Directory, IndexInput/Output}}) make it really 
> easy to track counters of how many IOPs and net bytes are read for each 
> query, which is a useful metric to track/aggregate/alarm on in production or 
> dev benchmarks.
> At my day job we use these wrappers in our nightly benchmarks to catch any 
> accidental performance regressions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 8.2.0

2019-07-15 Thread Adrien Grand
My guess is that the code is designed this way to avoid boilerplate
more than for performance reasons. Mike McCandless might have more
information?

+1 to disable array-with-gaps but keep the logic for now

On Mon, Jul 15, 2019 at 5:52 PM Michael Sokolov  wrote:
>
> I guess whether we roll back depends on timing. I think we are close
> to a release though, and these changes are complex and will require
> further testing, so rollback seems reasonable to me. I think from code
> management perspective it will be simplest to disable direct
> addressing for now, rather than actually reverting the various commits
> that are in place. I can post a patch doing that today.
>
> I like the ideas you have for compressing FSTs further. It was
> bothering me that we store the labels needlessly. I do think that
> before making more radical changes to Arc though, I would like to add
> some encapsulation so that we can be a bit freer without being
> concerned about the abstraction leaking (Several classes depend on the
> Arc internals today). EG I'd like to make its members private and add
> getters. I know this is a performance-sensitive area, and maybe we had
> a reason for not using them? Do we have some experience that suggests
> that would be a performance issue? My assumption is that JIT
> compilation would make that free, but I haven't tested.
>
> On Mon, Jul 15, 2019 at 11:36 AM Adrien Grand  wrote:
> >
> > That would be great. I wonder that we could also make the encoding a
> > bit more efficient. For instance I noticed that arc metadata is pretty
> > large in some cases (in the 10-20 bytes) which make gaps very costly.
> > Associating each label with a dense id and having an intermediate
> > lookup, ie. lookup label -> id and then id->arc offset instead of
> > doing label->arc directly could save a lot of space in some cases?
> > Also it seems that we are repeating the label in the arc metadata when
> > array-with-gaps is used, even though it shouldn't be necessary since
> > the label is implicit from the address?
> >
> > Do you think we can have a mitigation for worst-case scenarii in 8.2
> > or should we revert from branch_8_2 to keep the release process going
> > and work on this for 8.3?
> >
> > On Mon, Jul 15, 2019 at 5:12 PM Michael Sokolov  wrote:
> > >
> > > Thanks for the nice test, Adrien. Yes, the tradeoff of direct
> > > addressing is heavily data-dependent. I think we can improve the
> > > situation here by tracking, per-FST instance, the size increase we're
> > > seeing while building (or perhaps do a preliminary pass before
> > > building) in order to decide whether to apply the encoding.
> > >
> > > On Mon, Jul 15, 2019 at 9:02 AM Adrien Grand  wrote:
> > > >
> > > > I dug this a bit and suspect that the issue is mostly with one field
> > > > that is not part of the data but auto-generated: the ID field. It is a
> > > > slight variant of Flake IDs, so it's not random, it includes a
> > > > timestamp and a sequence number, and I suspect that its patterns
> > > > combined with the larger alphabet than ascii makes this size increase
> > > > more likely than with the data set you tested against.
> > > >
> > > > For instance I ran the following code with direct array addressing on
> > > > and off to simulate a worst-case scenario.
> > > >
> > > >   public static void main(String[] args) throws IOException {
> > > > Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
> > > > IndexWriter w = new IndexWriter(dir, new
> > > > IndexWriterConfig().setOpenMode(OpenMode.CREATE));
> > > > byte[] b = new byte[5];
> > > > Random r = new Random(0);
> > > > for (int i = 0; i < 100; ++i) {
> > > >   r.nextBytes(b);
> > > >   for (int j = 0; j < b.length; ++j) {
> > > > b[j] &= 0xfc; // make this byte a multiple of 4
> > > >   }
> > > >   Document doc = new Document();
> > > >   StringField field = new StringField("f", new BytesRef(b), 
> > > > Store.NO);
> > > >   doc.add(field);
> > > >   w.addDocument(doc);
> > > > }
> > > > w.forceMerge(1);
> > > > IndexReader reader = DirectoryReader.open(w);
> > > > w.close();
> > > > if (reader.leaves().size() != 1) {
> > > >   throw new Error();
> > > > }
> > > > LeafReader leaf = reader.leaves().get(0

Re: Lucene/Solr 8.2.0

2019-07-15 Thread Adrien Grand
That would be great. I wonder that we could also make the encoding a
bit more efficient. For instance I noticed that arc metadata is pretty
large in some cases (in the 10-20 bytes) which make gaps very costly.
Associating each label with a dense id and having an intermediate
lookup, ie. lookup label -> id and then id->arc offset instead of
doing label->arc directly could save a lot of space in some cases?
Also it seems that we are repeating the label in the arc metadata when
array-with-gaps is used, even though it shouldn't be necessary since
the label is implicit from the address?

Do you think we can have a mitigation for worst-case scenarii in 8.2
or should we revert from branch_8_2 to keep the release process going
and work on this for 8.3?

On Mon, Jul 15, 2019 at 5:12 PM Michael Sokolov  wrote:
>
> Thanks for the nice test, Adrien. Yes, the tradeoff of direct
> addressing is heavily data-dependent. I think we can improve the
> situation here by tracking, per-FST instance, the size increase we're
> seeing while building (or perhaps do a preliminary pass before
> building) in order to decide whether to apply the encoding.
>
> On Mon, Jul 15, 2019 at 9:02 AM Adrien Grand  wrote:
> >
> > I dug this a bit and suspect that the issue is mostly with one field
> > that is not part of the data but auto-generated: the ID field. It is a
> > slight variant of Flake IDs, so it's not random, it includes a
> > timestamp and a sequence number, and I suspect that its patterns
> > combined with the larger alphabet than ascii makes this size increase
> > more likely than with the data set you tested against.
> >
> > For instance I ran the following code with direct array addressing on
> > and off to simulate a worst-case scenario.
> >
> >   public static void main(String[] args) throws IOException {
> > Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
> > IndexWriter w = new IndexWriter(dir, new
> > IndexWriterConfig().setOpenMode(OpenMode.CREATE));
> > byte[] b = new byte[5];
> > Random r = new Random(0);
> > for (int i = 0; i < 100; ++i) {
> >   r.nextBytes(b);
> >   for (int j = 0; j < b.length; ++j) {
> > b[j] &= 0xfc; // make this byte a multiple of 4
> >   }
> >   Document doc = new Document();
> >   StringField field = new StringField("f", new BytesRef(b), Store.NO);
> >   doc.add(field);
> >   w.addDocument(doc);
> > }
> > w.forceMerge(1);
> > IndexReader reader = DirectoryReader.open(w);
> > w.close();
> > if (reader.leaves().size() != 1) {
> >   throw new Error();
> > }
> > LeafReader leaf = reader.leaves().get(0).reader();
> > System.out.println(((SegmentReader) leaf).ramBytesUsed());
> > reader.close();
> > dir.close();
> >   }
> >
> > When direct addressing is enabled (default), I get 586079. If I
> > disable direct addressing by applying the below patch, then I get
> > 156228 - about 3.75x less.
> >
> > diff --git a/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
> > b/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
> > index f308f1a..ff99cc2 100644
> > --- a/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
> > +++ b/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
> > @@ -647,7 +647,7 @@ public final class FST implements Accountable {
> >// array that may have holes in it so that we can address the
> > arcs directly by label without
> >// binary search
> >int labelRange = nodeIn.arcs[nodeIn.numArcs - 1].label -
> > nodeIn.arcs[0].label + 1;
> > -  boolean writeDirectly = labelRange > 0 && labelRange <
> > Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs;
> > +  boolean writeDirectly = false; // labelRange > 0 && labelRange
> > < Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs;
> >
> >//System.out.println("write int @pos=" + (fixedArrayStart-4) +
> > " numArcs=" + nodeIn.numArcs);
> >// create the header
> >
> > On Mon, Jul 15, 2019 at 2:33 PM Michael Sokolov  wrote:
> > >
> > > OK, both LUCENE-8781 and LUCENE-8895 were introduced in 8.2.0. I see
> > > most of the other data sets report an increase more in the 10-15%
> > > range, which is expected. I'm curious what the makeup of that http
> > > logs data set is -- I guess it's HTTP logs :) Is the data public?
> > >
> > >
> > > On Mon, Jul 15, 2019 at 7:23 AM Ignacio Vera  wrote:
> > > >
> 

[jira] [Resolved] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder

2019-07-15 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8810.
--
   Resolution: Fixed
Fix Version/s: (was: 8.1.1)
   8.2

> Flattening of nested disjunctions does not take into account number of clause 
> limitation of builder
> ---
>
> Key: LUCENE-8810
> URL: https://issues.apache.org/jira/browse/LUCENE-8810
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.0
>Reporter: Mickaël Sauvée
>Priority: Minor
> Fix For: 8.2
>
> Attachments: LUCENE-8810.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In org.apache.lucene.search.BooleanQuery, at the end of the function 
> rewrite(IndexReader reader), the query is rewritten to flatten nested 
> disjunctions.
> This does not take into account the limitation on the number of clauses in a 
> builder (1024).
>  In some circumstances, this limite can be reached, hence an exception is 
> thrown.
> Here is a unit test that highlight this.
> {code:java}
>   public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws 
> IOException {
> IndexSearcher searcher = newSearcher(new MultiReader());
> BooleanQuery.Builder builder1024 = new BooleanQuery.Builder();
> for(int i = 0; i < 1024; i++) {
>   builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), 
> Occur.SHOULD);
> }
> Query inner = builder1024.build();
> Query query = new BooleanQuery.Builder()
> .add(inner, Occur.SHOULD)
> .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD)
> .build();
> searcher.rewrite(query);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 8.2.0

2019-07-15 Thread Adrien Grand
I dug this a bit and suspect that the issue is mostly with one field
that is not part of the data but auto-generated: the ID field. It is a
slight variant of Flake IDs, so it's not random, it includes a
timestamp and a sequence number, and I suspect that its patterns
combined with the larger alphabet than ascii makes this size increase
more likely than with the data set you tested against.

For instance I ran the following code with direct array addressing on
and off to simulate a worst-case scenario.

  public static void main(String[] args) throws IOException {
Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
IndexWriter w = new IndexWriter(dir, new
IndexWriterConfig().setOpenMode(OpenMode.CREATE));
byte[] b = new byte[5];
Random r = new Random(0);
for (int i = 0; i < 100; ++i) {
  r.nextBytes(b);
  for (int j = 0; j < b.length; ++j) {
b[j] &= 0xfc; // make this byte a multiple of 4
  }
  Document doc = new Document();
  StringField field = new StringField("f", new BytesRef(b), Store.NO);
  doc.add(field);
  w.addDocument(doc);
}
w.forceMerge(1);
IndexReader reader = DirectoryReader.open(w);
w.close();
if (reader.leaves().size() != 1) {
  throw new Error();
}
LeafReader leaf = reader.leaves().get(0).reader();
System.out.println(((SegmentReader) leaf).ramBytesUsed());
reader.close();
dir.close();
  }

When direct addressing is enabled (default), I get 586079. If I
disable direct addressing by applying the below patch, then I get
156228 - about 3.75x less.

diff --git a/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
b/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
index f308f1a..ff99cc2 100644
--- a/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
+++ b/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
@@ -647,7 +647,7 @@ public final class FST implements Accountable {
   // array that may have holes in it so that we can address the
arcs directly by label without
   // binary search
   int labelRange = nodeIn.arcs[nodeIn.numArcs - 1].label -
nodeIn.arcs[0].label + 1;
-  boolean writeDirectly = labelRange > 0 && labelRange <
Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs;
+  boolean writeDirectly = false; // labelRange > 0 && labelRange
< Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs;

   //System.out.println("write int @pos=" + (fixedArrayStart-4) +
" numArcs=" + nodeIn.numArcs);
   // create the header

On Mon, Jul 15, 2019 at 2:33 PM Michael Sokolov  wrote:
>
> OK, both LUCENE-8781 and LUCENE-8895 were introduced in 8.2.0. I see
> most of the other data sets report an increase more in the 10-15%
> range, which is expected. I'm curious what the makeup of that http
> logs data set is -- I guess it's HTTP logs :) Is the data public?
>
>
> On Mon, Jul 15, 2019 at 7:23 AM Ignacio Vera  wrote:
> >
> > The change to Lucene 8.2.0 snapshot was done on July 10th. Previous to that 
> > the Lucene version was 8.1.0.
> >
> > On Mon, Jul 15, 2019 at 12:53 PM Michael Sokolov  wrote:
> >>
> >> Hmm that's possible, although the jump is bigger than anything I
> >> observed while testing. I assume these charts are building off of
> >> apache/master, or something close to that? If so, then the timing is
> >> off a bit. LUCENE-8781 was pushed quite a while before that, and then
> >> https://issues.apache.org/jira/browse/LUCENE-8895 which extended the
> >> encoding to be the default (not just for postings) was pushed on July
> >> 2 or so, but the chart shows a jump on July 10?
> >>
> >> On Mon, Jul 15, 2019 at 4:03 AM Ignacio Vera  wrote:
> >> >
> >> > Hi,
> >> >
> >> > We observed using a snapshot of Lucene 8.2 that there is an increase of 
> >> > around 30% on the memory usage of IndexReaders for some of the test 
> >> > datasets, for example:
> >> >
> >> > https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d
> >> >
> >> > We suspect this is due to this change: 
> >> > https://issues.apache.org/jira/browse/LUCENE-8781
> >> >
> >> > On Sun, Jul 14, 2019 at 7:10 AM David Smiley  
> >> > wrote:
> >> >>
> >> >> Since there won't be any 8.1.2 yet some issues got fixed for 8.1.2 and 
> >> >> there is an 8.1.2 section in CHANGES.txt those issues might not be very 
> >> >> noticeable to users that only look at the published HTML version (e.g. 
> >> >> https://lucene.apache.org/solr/8_1_1/changes/Changes.html ).  Maybe 
> >> >> 8.1.2 should be integrated into 8.2.0 in CHANGES.txt?  Despite this, I 
> >> >> see at least one of those issues got into the curated release notes / 
> >> >> highlights any way -- thanks Ignacio.
> >> >>
> >> >> ~ David Smiley
> >> >> Apache Lucene/Solr Search Developer
> >> >> http://www.linkedin.com/in/davidwsmiley
> >> >>
> >> >>
> >> >> On Fri, Jul 12, 2019 at 9:40 AM Jan Høydahl  
> >> >> wrote:
> >> >>>
> >> >>> Please use HTTPS in the links to download pages.
> >> >>>
> >> >>> Jan Høydahl
> >> >>>
> 

[jira] [Created] (LUCENE-8917) Remove the "Direct" doc-value format

2019-07-15 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8917:


 Summary: Remove the "Direct" doc-value format
 Key: LUCENE-8917
 URL: https://issues.apache.org/jira/browse/LUCENE-8917
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


This is the last user of the Legacy*DocValues APIs. Another option would be to 
move this format to doc-value iterators, but I don't think it's worth the 
effort: let's just remove it in Lucene 9?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery

2019-07-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884944#comment-16884944
 ] 

Adrien Grand commented on LUCENE-8811:
--

Thanks! I'll revert this change from 8.x and 8.2 in the meantime.

> Add maximum clause count check to IndexSearcher rather than BooleanQuery
> 
>
> Key: LUCENE-8811
> URL: https://issues.apache.org/jira/browse/LUCENE-8811
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 8.2
>
> Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, 
> LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch
>
>
> Currently we only check whether boolean queries have too many clauses. 
> However there are other ways that queries may have too many clauses, for 
> instance if you have boolean queries that have themselves inner boolean 
> queries.
> Could we use the new Query visitor API to move this check from BooleanQuery 
> to IndexSearcher in order to make this check more consistent across queries? 
> See for instance LUCENE-8810 where a rewrite rule caused the maximum clause 
> count to be hit even though the total number of leaf queries remained the 
> same.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 8.2.0

2019-07-15 Thread Adrien Grand
FYI I added a note to
https://issues.apache.org/jira/browse/LUCENE-8811, I think it's too
breaking for 8.2 and should wait for 9.0. We can mitigate LUCENE-8810
in 8.x by disabling the flattening of inner disjunctions when this
would create too many clauses in a single BooleanQuery. I'll work with
Atri on getting LUCENE-8811 reverted from 8.x and 8.2 and the separate
mitigation for LUCENE-8810 in if there are no objections.

On Mon, Jul 15, 2019 at 10:03 AM Ignacio Vera  wrote:
>
> Hi,
>
> We observed using a snapshot of Lucene 8.2 that there is an increase of 
> around 30% on the memory usage of IndexReaders for some of the test datasets, 
> for example:
>
> https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d
>
> We suspect this is due to this change: 
> https://issues.apache.org/jira/browse/LUCENE-8781
>
> On Sun, Jul 14, 2019 at 7:10 AM David Smiley  wrote:
>>
>> Since there won't be any 8.1.2 yet some issues got fixed for 8.1.2 and there 
>> is an 8.1.2 section in CHANGES.txt those issues might not be very noticeable 
>> to users that only look at the published HTML version (e.g. 
>> https://lucene.apache.org/solr/8_1_1/changes/Changes.html ).  Maybe 8.1.2 
>> should be integrated into 8.2.0 in CHANGES.txt?  Despite this, I see at 
>> least one of those issues got into the curated release notes / highlights 
>> any way -- thanks Ignacio.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Fri, Jul 12, 2019 at 9:40 AM Jan Høydahl  wrote:
>>>
>>> Please use HTTPS in the links to download pages.
>>>
>>> Jan Høydahl
>>>
>>> 12. jul. 2019 kl. 09:04 skrev Ignacio Vera :
>>>
>>> Ishan: I had a look into the issues and I have no objections as far as they 
>>> get properly reviewed if possible. It will be good to commit the shortly so 
>>> they go through a few CI iterations in case something gets broken. I am 
>>> planning to build the first RC early next week as there are no blockers for 
>>> the release.
>>>
>>> Steve: Than you so much, I need to work on getting the right permissions.
>>>
>>> Finally I wrote a draft for the release notes for Lucene and Solr. It would 
>>> be good if someone with more experience in Solr can review/modify my 
>>> attempt as it is difficult for me to know which are the most important 
>>> bits. Here are the links to the drafts (not they are in wiki, let me know 
>>> if you have problems accessing them):
>>>
>>> Lucene:
>>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=120732808=cb366dc4-c136-4505-9c37-60bde5db2550=shareui=1562914476369
>>>
>>> Solr:
>>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=120732972=5cace703-b80b-49c4-a07f-55b891683f90=shareui=1562914529931
>>>
>>> On Thu, Jul 11, 2019 at 6:36 PM Ishan Chattopadhyaya 
>>>  wrote:

 Hi Ignacio,
 I wish to include two security bug fixes (not vulnerabilities, but feature 
 regressions due to Authorization plugin), SOLR-13472 and SOLR-13619. I can 
 commit both shortly, attempting to write a unit test for it (which is 
 proving harder to do than reproducing, fixing and testing manually). 
 Please let me know if you have any concerns.
 Regards,
 Ishan

 On Thu, 11 Jul, 2019, 9:12 PM Tomoko Uchida, 
  wrote:
>
> Hi Ignacio,
>
> LUCENE-8907 was fixed. (I have reverted a series of commits which
> cause backwards incompatibility on Lucene 8.x.)
> Thank you for waiting for that!
>
> Tomoko
>
> 2019年7月11日(木) 22:44 Uwe Schindler :
> >
> > Hi,
> >
> >
> >
> > I enabled the policeman Jenkins Jobs for 8.2 branch.
> >
> >
> >
> > Uwe
> >
> >
> >
> > -
> >
> > Uwe Schindler
> >
> > Achterdiek 19, D-28357 Bremen
> >
> > https://www.thetaphi.de
> >
> > eMail: u...@thetaphi.de
> >
> >
> >
> > From: Ignacio Vera 
> > Sent: Thursday, July 11, 2019 1:05 PM
> > To: dev@lucene.apache.org
> > Subject: Re: Lucene/Solr 8.2.0
> >
> >
> >
> > Hi,
> >
> >
> >
> > The branch has been created, As a reminder, this branch is on feature 
> > freeze and only documentation or build patches should be committed. I 
> > will be waiting for LUCENE-8907 to start building the first release 
> > candidate.
> >
> > Let me know if there is any other blocker before we can start the 
> > release process.
> >
> >
> >
> > It seems I do not have the permissions to create the Jenkins jobs for 
> > this branch, maybe Steve can help here?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Ignacio
> >
> >
> >
> > On Thu, Jul 11, 2019 at 4:51 AM David Smiley  
> > wrote:
> >
> > BTW for 8.2.0 I updated Solr's CHANGES.txt to split out issues that 
> > seemed to be Improvements that were not really New 

[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery

2019-07-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884932#comment-16884932
 ] 

Adrien Grand commented on LUCENE-8811:
--

[~atris] If the patch you are thinking of is the one on LUCENE-8810, I was 
thinking of something even simpler that would catch the TooManyClauses 
exception when trying to flatten the query.

> Add maximum clause count check to IndexSearcher rather than BooleanQuery
> 
>
> Key: LUCENE-8811
> URL: https://issues.apache.org/jira/browse/LUCENE-8811
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 8.2
>
> Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, 
> LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch
>
>
> Currently we only check whether boolean queries have too many clauses. 
> However there are other ways that queries may have too many clauses, for 
> instance if you have boolean queries that have themselves inner boolean 
> queries.
> Could we use the new Query visitor API to move this check from BooleanQuery 
> to IndexSearcher in order to make this check more consistent across queries? 
> See for instance LUCENE-8810 where a rewrite rule caused the maximum clause 
> count to be hit even though the total number of leaf queries remained the 
> same.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery

2019-07-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884904#comment-16884904
 ] 

Adrien Grand commented on LUCENE-8811:
--

I was reviewing the changelog for 8.2, this change looks a bit too breaking for 
a minor and should probably wait for 9.0? We can separately address LUCENE-8810 
by disabling the flattening of disjunctions if the new BooleanQuery would have 
more than 1024 clauses?

> Add maximum clause count check to IndexSearcher rather than BooleanQuery
> 
>
> Key: LUCENE-8811
> URL: https://issues.apache.org/jira/browse/LUCENE-8811
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 8.2
>
> Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, 
> LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch
>
>
> Currently we only check whether boolean queries have too many clauses. 
> However there are other ways that queries may have too many clauses, for 
> instance if you have boolean queries that have themselves inner boolean 
> queries.
> Could we use the new Query visitor API to move this check from BooleanQuery 
> to IndexSearcher in order to make this check more consistent across queries? 
> See for instance LUCENE-8810 where a rewrite rule caused the maximum clause 
> count to be hit even though the total number of leaf queries remained the 
> same.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-11 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8311.
--
   Resolution: Fixed
Fix Version/s: 8.2

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.2
>
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8909) Deprecate getFieldNames from IndexWriter

2019-07-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882325#comment-16882325
 ] 

Adrien Grand commented on LUCENE-8909:
--

+1

> Deprecate getFieldNames from IndexWriter
> 
>
> Key: LUCENE-8909
> URL: https://issues.apache.org/jira/browse/LUCENE-8909
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Munendra S N
>Priority: Major
>
> From SOLR-12368
> {quote}Would be nice to be able to remove IndexWriter.getFieldNames as well, 
> which was added in LUCENE-7659 only for this workaround.{quote}
> Once Solr task resolved, deprecate {{IndexWriter#getFieldNames}} from 8x and 
> remove it from master



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?

2019-07-10 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8875.
--
   Resolution: Fixed
Fix Version/s: 8.2

> Should TopScoreDocCollector Always Populate Sentinel Values?
> 
>
> Key: LUCENE-8875
> URL: https://issues.apache.org/jira/browse/LUCENE-8875
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: 8.2
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation, 
> and instruct HitQueue to populate with sentinels. While this is a great 
> safety mechanism, for very large datasets where the query's selectivity is 
> high, the sentinel population can be redundant and can become a large enough 
> bottleneck in itself. Does it make sense to introduce a new parameter in 
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and 
> does not populate sentinels?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8907) Provide backward compatibility for loading analysis factories

2019-07-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882074#comment-16882074
 ] 

Adrien Grand commented on LUCENE-8907:
--

I have a slight preference for reverting on 8.x and have this change only in 
9.0. My worry is that the backward compatibility layer would either need to 
introduce leniency (factories without a NAME) or stronger checks (Same NAME and 
class name), and it could end up causing as much trouble as the original change.

[~thetaphi‍] What do you think?

> Provide backward compatibility for loading analysis factories
> -
>
> Key: LUCENE-8907
> URL: https://issues.apache.org/jira/browse/LUCENE-8907
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Blocker
>
> The changes in LUCENE-8778 have breaking changes in the analysis factory 
> interface and  custom factories implemented by users / 3rd parties will be 
> affected. We need to keep some backwards compatibility during 8.x.
> Please see the discussion in SOLR-13593 for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 8.1.2 bug fix release

2019-07-10 Thread Adrien Grand
Hi Đạt,

What do you think of focusing on 8.2 now, as Shalin suggested? Ignacio
had suggested cutting the 8.2 branch today initially so 8.2 could be
out pretty soon. Moreover if we want to move forward with 8.1.2, we
will likely have to delay 8.2 at this point.

On Fri, Jul 5, 2019 at 8:42 AM Adrien Grand  wrote:
>
> Agreed with Shalin, we might want to focus on 8.2 at this point.
>
> On Fri, Jul 5, 2019 at 8:38 AM Shalin Shekhar Mangar
>  wrote:
> >
> > Thanks Dat.
> >
> > I don't think we should release a broken version without a fix for 
> > SOLR-13413. A workaround for SOLR-13413 exists (forcing http1.1 for 
> > inter-node requests) but we don't test that configuration anymore in Sole 
> > so I am hesitant to suggest it.
> >
> > I think that either we agree to upgrade jetty to 9.4.19 in this point 
> > release or we scrap it altogether and focus on 8.2.
> >
> > On Thu, Jul 4, 2019 at 4:54 PM Đạt Cao Mạnh  wrote:
> >>
> >> Thanks Uwe!
> >>
> >> Hi guys, Ishan,
> >> When I tryied to build the RC1 for branch_8_1. I did see this failure on 
> >> test HttpPartitionWithTlogReplicasTest
> >>
> >> 215685 ERROR 
> >> (updateExecutor-537-thread-1-processing-x:collDoRecoveryOnRestart_shard1_replica_t1
> >>  r:core_node3 null n:127.0.0.1:55000_t_ayt%2Fs c:collDoRecoveryOnRestart 
> >> s:shard1) [n:127.0.0.1:55000_t_ayt%2Fs c:collDoRecoveryOnRestart s:shard1 
> >> r:core_node3 x:collDoRecoveryOnRestart_shard1_replica_t1] 
> >> o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling 
> >> SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode: 
> >> http://127.0.0.1:54997/t_ayt/s/collDoRecoveryOnRestart_shard1_replica_t2/ 
> >> to 
> >> http://127.0.0.1:54997/t_ayt/s/collDoRecoveryOnRestart_shard1_replica_t2/
> >>   => java.io.IOException: java.net.ConnectException: Connection 
> >> refused
> >> at 
> >> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:193)
> >> java.io.IOException: java.net.ConnectException: Connection refused
> >> at 
> >> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:193)
> >>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
> >> at 
> >> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:152)
> >>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
> >> at 
> >> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:146)
> >>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
> >> at 
> >> org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:216)
> >>  ~[java/:?]
> >> at 
> >> org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:209)
> >>  ~[java/:?]
> >> at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:169) 
> >> ~[java/:?]
> >> at 
> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102)
> >>  ~[java/:?]
> >> at 
> >> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
> >>  ~[java/:?]
> >> at 
> >> org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:337)
> >>  ~[java/:?]
> >> at 
> >> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231)
> >>  ~[java/:?]
> >> at 
> >> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)
> >>  ~[java/:?]
> >> at 
> >> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> >>  ~[metrics-core-4.0.5.jar:4.0.5]
> >> at 
> >> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> >>  ~[java/:?]
> >> at 
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>  ~[?:1.8.0_191]
> >> at 
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>  ~[?:1.8.0_191]
> >> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
> >> Suppressed: java.io.IOException: java.net.Connect

[jira] [Commented] (LUCENE-8907) Provide backward compatibility for loading analysis factories

2019-07-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881982#comment-16881982
 ] 

Adrien Grand commented on LUCENE-8907:
--

I don't like 2, it feels too breaking to me for a minor release. I like 1 
better but then I think we should also fail if an analysis factory has a NAME 
constant that is not the same as the name that would be derived from the class 
name?

Another option could be to revert from 8.x. In any case we should add migration 
instructions to lucene/MIGRATE.txt on master.

bq. some warning messages would be helpful

We never log from Lucene since this is a library.

> Provide backward compatibility for loading analysis factories
> -
>
> Key: LUCENE-8907
> URL: https://issues.apache.org/jira/browse/LUCENE-8907
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Priority: Blocker
>
> The changes in LUCENE-8778 have breaking changes in the analysis factory 
> interface and  custom factories implemented by users / 3rd parties will be 
> affected. We need to keep some backwards compatibility during 8.x.
> Please see the discussion in SOLR-13593 for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881967#comment-16881967
 ] 

Adrien Grand commented on LUCENE-8311:
--

This made exact phrase queries 3x faster in the nightly benchmarks 
http://people.apache.org/~mikemccand/lucenebench/Phrase.html and term queries 
about 10% slower http://people.apache.org/~mikemccand/lucenebench/Term.html.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881541#comment-16881541
 ] 

Adrien Grand commented on LUCENE-8883:
--

I did grep "^[A-Z]" CHANGES.txt | sort | uniq -c | sort -nr.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8069) Allow index sorting by field length

2019-07-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881491#comment-16881491
 ] 

Adrien Grand commented on LUCENE-8069:
--

I didn't measure the indexing rate, I can do that next. Yes, I hacked a way to 
sort by norm field indeed. The solution that you proposed would likely yield 
similar benefits.

bq. is luceneutil assuming the search query doesn't want the number of total 
hits

Yes, like in nightly benchmarks.

bq. Yet this is not now most people use Lucene...

There are many use-cases for Lucene, but getting top full-text hits by score is 
a pretty common one and it typically doesn't require computing hit counts?

> Allow index sorting by field length
> ---
>
> Key: LUCENE-8069
> URL: https://issues.apache.org/jira/browse/LUCENE-8069
> Project: Lucene - Core
>  Issue Type: Wish
>    Reporter: Adrien Grand
>Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8900) Simplify MultiSorter

2019-07-09 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8900.
--
   Resolution: Fixed
Fix Version/s: 8.2

Thanks [~danmuzi].

> Simplify MultiSorter
> 
>
> Key: LUCENE-8900
> URL: https://issues.apache.org/jira/browse/LUCENE-8900
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.2
>
> Attachments: LUCENE-8900.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880981#comment-16880981
 ] 

Adrien Grand commented on LUCENE-8883:
--

I just looked at the section names that we used at least 10 times in the 
changelog:

{noformat}
 55 API Changes
 53 Bug Fixes
 52 Optimizations
 46 Build
 41 Bug fixes
 37 New Features
 25 Documentation
 24 Other
 21 Changes in Runtime Behavior
 19 Changes in backwards compatibility policy
 15 New features
 15 Improvements
 14 Changes in runtime behavior
 10 Tests
{noformat}

Maybe your patch should rename "Other Changes" to "Other" which seems to be 
what we have used historically, and maybe also add "API Changes" and 
"Optimizations", which seem pretty popular?

Maybe we could specialize bugfix versions and only introduce a "Bug Fixes" 
section in that case?

bq. Also I didn't add "(No changes)"; seems needless / self-evident.

I think it helps clarify since it is very uncommon to release software without 
any changes. We could do it only for new bugfix releases if you think that 
helps since I think those are the only ones that we ever released without new 
changes.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880606#comment-16880606
 ] 

Adrien Grand commented on LUCENE-4312:
--

bq. the complexity of query execution would be driven by what's actually in the 
index

I don't think this is true.

For instance an exact phrase query trying to match "A B C" that is currently 
positioned on A (position=3, length=1), B (position=4, length=1), C 
(position=6, length=1) would need to advance B to the next position in case 
there is another match on position 4 that has a length of 2. And then we should 
advance C first because maybe because it also has another match on position 4 
of a different length.

Also we can't advance positions on terms in the order we want anymore. Today we 
use the rarer term to lead the iteration of positions. If we had position 
lengths in the index we would need to advance positions in the order in which 
terms occur in the phrase query since the start position that B must have 
depends on the length of A on the current position: position starts are 
guaranteed to come in order in the index but position ends are not (at least we 
don't enforce it in token streams today).

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880422#comment-16880422
 ] 

Adrien Grand commented on LUCENE-4312:
--

Recording position lengths in the index is the easy part of the problem in my 
opinion. I'm concerned that this will introduce significant complexity to 
phrase queries (they will require backtracking in order to deal with the case 
that a term exists twice at the same position with different position lengths), 
and even make sloppy phrase queries and their spans/intervals counterparts 
meaningless (as terms could be very distant according to the index only because 
there is one term in-between that has a multi-term synonym indexed). 

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-SmokeRelease-8.x - Build # 142 - Still Failing

2019-07-08 Thread Adrien Grand
For those who haven't followed the 8.1.2 release thread, we are asking
infra for help about this issue at
https://issues.apache.org/jira/browse/INFRA-18701.

On Sun, Jul 7, 2019 at 7:24 PM Apache Jenkins Server
 wrote:
>
> Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-8.x/142/
>
> No tests ran.
>
> Build Log:
> [...truncated 24989 lines...]
> [asciidoctor:convert] asciidoctor: ERROR: about-this-guide.adoc: line 1: 
> invalid part, must have at least one section (e.g., chapter, appendix, etc.)
> [asciidoctor:convert] asciidoctor: ERROR: solr-glossary.adoc: line 1: invalid 
> part, must have at least one section (e.g., chapter, appendix, etc.)
>  [java] Processed 2587 links (2117 relative) to 3396 anchors in 259 files
>  [echo] Validated Links & Anchors via: 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/solr/build/solr-ref-guide/bare-bones-html/
>
> -dist-changes:
>  [copy] Copying 4 files to 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/solr/package/changes
>
> package:
>
> -unpack-solr-tgz:
>
> -ensure-solr-tgz-exists:
> [mkdir] Created dir: 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/solr/build/solr.tgz.unpacked
> [untar] Expanding: 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/solr/package/solr-8.2.0.tgz
>  into 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/solr/build/solr.tgz.unpacked
>
> generate-maven-artifacts:
>
> resolve:
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>
> ivy-availability-check:
> [loadresource] Do not set property disallowed.ivy.jars.list as its length is 
> 0.
>
> -ivy-fail-disallowed-ivy-version:
>
> ivy-fail:
>
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-8.x/lucene/top-level-ivy-settings.xml
>
> resolve:
>

Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-11.0.3) - Build # 24366 - Failure!

2019-07-08 Thread Adrien Grand
Does anyone know why we are getting these accessibility issues?

On Mon, Jul 8, 2019 at 10:09 AM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/24366/
> Java: 64bit/jdk-11.0.3 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>
> All tests passed
>
> Build Log:
> [...truncated 2030 lines...]
>[junit4] JVM J1: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/temp/junit4-J1-20190708_063127_7027301280393236134214.syserr
>[junit4] >>> JVM J1 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J1: EOF 
>
> [...truncated 3 lines...]
>[junit4] JVM J0: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/temp/junit4-J0-20190708_063127_7024504749479836881192.syserr
>[junit4] >>> JVM J0 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J0: EOF 
>
> [...truncated 5 lines...]
>[junit4] JVM J2: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/temp/junit4-J2-20190708_063127_70212783206905665212676.syserr
>[junit4] >>> JVM J2 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J2: EOF 
>
> [...truncated 304 lines...]
>[junit4] JVM J1: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/test-framework/test/temp/junit4-J1-20190708_064432_68310171217942484809105.syserr
>[junit4] >>> JVM J1 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J1: EOF 
>
> [...truncated 3 lines...]
>[junit4] JVM J2: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/test-framework/test/temp/junit4-J2-20190708_064432_68314573784609112931613.syserr
>[junit4] >>> JVM J2 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J2: EOF 
>
>[junit4] JVM J0: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/test-framework/test/temp/junit4-J0-20190708_064432_6831859223900288849170.syserr
>[junit4] >>> JVM J0 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J0: EOF 
>
> [...truncated 1085 lines...]
>[junit4] JVM J1: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/analysis/common/test/temp/junit4-J1-20190708_064634_37214848669064955530728.syserr
>[junit4] >>> JVM J1 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J1: EOF 
>
> [...truncated 3 lines...]
>[junit4] JVM J2: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/analysis/common/test/temp/junit4-J2-20190708_064634_3728464235375514233638.syserr
>[junit4] >>> JVM J2 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J2: EOF 
>
>[junit4] JVM J0: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/analysis/common/test/temp/junit4-J0-20190708_064634_37212451067559916694570.syserr
>[junit4] >>> JVM J0 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be removed in a future release.
>[junit4] <<< JVM J0: EOF 
>
> [...truncated 236 lines...]
>[junit4] JVM J0: stderr was not empty, see: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/analysis/icu/test/temp/junit4-J0-20190708_064925_0766912284488591662497.syserr
>[junit4] >>> JVM J0 emitted unexpected output (verbatim) 
>[junit4] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was 
> deprecated in version 9.0 and will likely be 

[jira] [Commented] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-07-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879116#comment-16879116
 ] 

Adrien Grand commented on LUCENE-8860:
--

I made the issue about box queries, but that would actually work for polygons 
too.

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 8.1.2 bug fix release

2019-07-05 Thread Adrien Grand
Agreed with Shalin, we might want to focus on 8.2 at this point.

On Fri, Jul 5, 2019 at 8:38 AM Shalin Shekhar Mangar
 wrote:
>
> Thanks Dat.
>
> I don't think we should release a broken version without a fix for 
> SOLR-13413. A workaround for SOLR-13413 exists (forcing http1.1 for 
> inter-node requests) but we don't test that configuration anymore in Sole so 
> I am hesitant to suggest it.
>
> I think that either we agree to upgrade jetty to 9.4.19 in this point release 
> or we scrap it altogether and focus on 8.2.
>
> On Thu, Jul 4, 2019 at 4:54 PM Đạt Cao Mạnh  wrote:
>>
>> Thanks Uwe!
>>
>> Hi guys, Ishan,
>> When I tryied to build the RC1 for branch_8_1. I did see this failure on 
>> test HttpPartitionWithTlogReplicasTest
>>
>> 215685 ERROR 
>> (updateExecutor-537-thread-1-processing-x:collDoRecoveryOnRestart_shard1_replica_t1
>>  r:core_node3 null n:127.0.0.1:55000_t_ayt%2Fs c:collDoRecoveryOnRestart 
>> s:shard1) [n:127.0.0.1:55000_t_ayt%2Fs c:collDoRecoveryOnRestart s:shard1 
>> r:core_node3 x:collDoRecoveryOnRestart_shard1_replica_t1] 
>> o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling 
>> SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode: 
>> http://127.0.0.1:54997/t_ayt/s/collDoRecoveryOnRestart_shard1_replica_t2/ to 
>> http://127.0.0.1:54997/t_ayt/s/collDoRecoveryOnRestart_shard1_replica_t2/
>>   => java.io.IOException: java.net.ConnectException: Connection 
>> refused
>> at 
>> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:193)
>> java.io.IOException: java.net.ConnectException: Connection refused
>> at 
>> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:193)
>>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
>> at 
>> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:152)
>>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
>> at 
>> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:146)
>>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
>> at 
>> org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:216)
>>  ~[java/:?]
>> at 
>> org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:209)
>>  ~[java/:?]
>> at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:169) 
>> ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102)
>>  ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
>>  ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:337)
>>  ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231)
>>  ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)
>>  ~[java/:?]
>> at 
>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
>>  ~[metrics-core-4.0.5.jar:4.0.5]
>> at 
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>>  ~[java/:?]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>  ~[?:1.8.0_191]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>  ~[?:1.8.0_191]
>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
>> Suppressed: java.io.IOException: java.net.ConnectException: Connection 
>> refused
>> at 
>> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:193)
>>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
>> at 
>> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:152)
>>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
>> at 
>> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:146)
>>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
>> at 
>> org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:216)
>>  ~[java/:?]
>> at 
>> org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:209)
>>  ~[java/:?]
>> at org.apache.solr.common.util.JavaBinCodec.close(JavaBinCodec.java:1261) 
>> ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:103)
>>  ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
>>  ~[java/:?]
>> at 
>> org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:337)
>>  ~[java/:?]
>> at 
>> 

[jira] [Commented] (SOLR-12368) in-place DV updates should no longer have to jump through hoops if field does not yet exist

2019-07-04 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878866#comment-16878866
 ] 

Adrien Grand commented on SOLR-12368:
-

I'm not familiar enough with how Solr routes updates to doc values or stored 
fields, but indeed we don't need to avoid updates on fields that don't exist 
anymore. Thanks for cleaning this up! Can you mark IndexWriter#getFieldNames as 
deprecated on 8.x instead of removing (which is the right thing to do on 
master).

> in-place DV updates should no longer have to jump through hoops if field does 
> not yet exist
> ---
>
> Key: SOLR-12368
> URL: https://issues.apache.org/jira/browse/SOLR-12368
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-12368.patch, SOLR-12368.patch, SOLR-12368.patch
>
>
> When SOLR-5944 first added "in-place" DocValue updates to Solr, one of the 
> edge cases thta had to be dealt with was the limitation imposed by 
> IndexWriter that docValues could only be updated if they already existed - if 
> a shard did not yet have a document w/a value in the field where the update 
> was attempted, we would get an error.
> LUCENE-8316 seems to have removed this error, which i believe means we can 
> simplify & speed up some of the checks in Solr, and support this situation as 
> well, rather then falling back on full "read stored fields & reindex" atomic 
> update



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877919#comment-16877919
 ] 

Adrien Grand commented on LUCENE-8311:
--

I opened https://github.com/apache/lucene-solr/pull/760. Performance is a bit 
better than what we had before:

{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff
HighTerm 1395.12  (5.1%) 1230.78  (4.3%)  
-11.8% ( -20% -   -2%)
 MedTerm 2352.56  (4.7%) 2170.42  (3.9%)   
-7.7% ( -15% -0%)
 LowSpanNear   13.70  (7.0%)   12.67  (4.9%)   
-7.5% ( -18% -4%)
HighSpanNear5.69  (5.3%)5.31  (3.2%)   
-6.5% ( -14% -2%)
 MedSpanNear   23.33  (4.2%)   21.97  (2.4%)   
-5.8% ( -11% -0%)
  AndHighMed  114.70  (2.9%)  109.40  (4.1%)   
-4.6% ( -11% -2%)
 AndHighHigh   35.08  (3.2%)   33.51  (4.1%)   
-4.5% ( -11% -2%)
 LowTerm 3014.11  (4.7%) 2893.44  (4.7%)   
-4.0% ( -12% -5%)
   OrHighMed   60.26  (2.5%)   57.96  (2.1%)   
-3.8% (  -8% -0%)
  OrHighHigh   15.45  (2.5%)   14.87  (2.3%)   
-3.8% (  -8% -1%)
   LowPhrase   25.81  (3.4%)   24.89  (2.8%)   
-3.6% (  -9% -2%)
HighSloppyPhrase7.44  (6.3%)7.20  (5.7%)   
-3.3% ( -14% -9%)
 MedSloppyPhrase   12.76  (5.1%)   12.51  (4.6%)   
-1.9% ( -10% -8%)
 LowSloppyPhrase   34.24  (4.1%)   33.59  (3.8%)   
-1.9% (  -9% -6%)
   HighTermMonthSort   70.86 (10.9%)   69.98 (10.7%)   
-1.2% ( -20% -   22%)
  Fuzzy1  211.28  (3.5%)  208.86  (2.2%)   
-1.1% (  -6% -4%)
  Fuzzy2  180.97  (4.4%)  179.47  (2.6%)   
-0.8% (  -7% -6%)
   OrHighLow  467.25  (2.9%)  467.94  (2.0%)
0.1% (  -4% -5%)
 Prefix3   91.35  (8.1%)   91.52  (7.2%)
0.2% ( -14% -   16%)
   HighTermDayOfYearSort   62.77  (6.9%)   62.96  (7.5%)
0.3% ( -13% -   15%)
Wildcard  129.49  (4.3%)  129.99  (2.8%)
0.4% (  -6% -7%)
 Respell  210.68  (1.9%)  211.58  (2.4%)
0.4% (  -3% -4%)
  AndHighLow  541.64  (3.1%)  544.44  (3.2%)
0.5% (  -5% -7%)
  IntNRQ  148.56  (8.3%)  149.44 (10.4%)
0.6% ( -16% -   21%)
  HighPhrase   10.86  (9.0%)   13.92 (15.2%)   
28.2% (   3% -   57%)
   MedPhrase   62.22  (2.1%)   97.61  (4.6%)   
56.9% (  49% -   64%)
{noformat}

But there is a lot of variance across runs because it depends a lot on which 
query gets picked up. For instance on another run I got

{noformat}
   LowPhrase   39.39  (1.9%)   51.21  (2.2%)   
30.0% (  25% -   34%)
  HighPhrase   13.09  (3.2%)  192.76 (26.8%) 
1372.5% (1301% - 1448%)
{noformat}

In spite of some queries that get slightly slower, I think we should merge this 
since we need phrases to expose good impacts if we want to give boolean queries 
a chance to speed up queries that include phrases. Term queries appear to be a 
bit slower, I'm assuming that this is due to the fact that the JVM cannot do as 
much inlining as before since we are starting to use classes for phrases that 
were only used for term queries before.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8762) Lucene50PostingsReader should specialize reading docs+freqs with impacts

2019-07-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877916#comment-16877916
 ] 

Adrien Grand commented on LUCENE-8762:
--

I proposed to change the specialization for docs+freqs+positions as part of 
LUCENE-8311. But it doesn't add any specialization for docs+freqs, which would 
still probably be worth adding?

> Lucene50PostingsReader should specialize reading docs+freqs with impacts
> 
>
> Key: LUCENE-8762
> URL: https://issues.apache.org/jira/browse/LUCENE-8762
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
>
> Currently if you ask for impacts, we only have one implementation that is 
> able to expose everything: docs, freqs, positions and offsets. In contrast, 
> if you don't need impacts, we have specialization for docs+freqs, 
> docs+freqs+positions and docs+freqs+positions+offsets.
> Maybe we should add specialization for the docs+freqs case with impacts, 
> which should be the most common case, and remove specialization for 
> docs+freqs+positions when impacts are not requested?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877849#comment-16877849
 ] 

Adrien Grand commented on LUCENE-8311:
--

It turns out that part of the reason why the patch is making things slower is 
that it is moving phrase queries from BlockPostingsEnum, which is specialized 
to read freqs and positions only, to BlockImpactsEverythingEnum, which can read 
any of docs+freqs, docs+freqs+positios or docs+freqs+positions+offsets. Maybe 
we should remove BlockPostingsEnum and have a specialized impacts enum for 
positions instead.

The merged impacts look like they have some room for improvement as well. I'm 
looking into those issues so that we can then do better testing of LUCENE-8806.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8899) Implementation of MultiTermQuery for ORed Queries

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877185#comment-16877185
 ] 

Adrien Grand commented on LUCENE-8899:
--

This sounds very similar to what TermInSetQuery is doing, am I missing 
something?

> Implementation of MultiTermQuery for ORed Queries
> -
>
> Key: LUCENE-8899
> URL: https://issues.apache.org/jira/browse/LUCENE-8899
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> While working on multi range queries, I realised that it would be good to 
> specialize for cases where all clauses in a query are ORed together. 
> MultiTermQuery springs to mind, when all terms are basically disjuncted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8900) Simplify MultiSorter

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877183#comment-16877183
 ] 

Adrien Grand commented on LUCENE-8900:
--

Thanks [~danmuzi], I will apply your first suggestion. However I can't apply 2 
because I merged the logic for integers and longs, which means that is some 
cases the missing value will be an Integer and in other cases it will be a Long.

> Simplify MultiSorter
> 
>
> Key: LUCENE-8900
> URL: https://issues.apache.org/jira/browse/LUCENE-8900
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8900.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8901) Load frequencies lazily for postings and impacts

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877164#comment-16877164
 ] 

Adrien Grand commented on LUCENE-8901:
--

Thanks [~mayyas]!

> Load frequencies lazily for postings and impacts
> 
>
> Key: LUCENE-8901
> URL: https://issues.apache.org/jira/browse/LUCENE-8901
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.2
>
>
> Allow frequencies blocks to be loaded lazily when they are not needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8901) Load frequencies lazily for postings and impacts

2019-07-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8901:
-
Fix Version/s: 8.2

> Load frequencies lazily for postings and impacts
> 
>
> Key: LUCENE-8901
> URL: https://issues.apache.org/jira/browse/LUCENE-8901
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.2
>
>
> Allow frequencies blocks to be loaded lazily when they are not needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877157#comment-16877157
 ] 

Adrien Grand commented on LUCENE-8857:
--

Double checking, have you run all Solr tests or only TestDistributedGrouping?

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877155#comment-16877155
 ] 

Adrien Grand commented on LUCENE-8857:
--

Thanks [~atris] I'll look into merging now. MIGRATE is considered quite loud 
already, plus the fact that it is there makes it pretty likely to be included 
in the release notes.

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8069) Allow index sorting by field length

2019-07-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand reopened LUCENE-8069:
--

I've had this idea come back to my mind several times since I opened it. 
Sorting by norm brings the following benefits:
 - Better compression, smaller doc IDs likely have tiny term frequencies since 
most times the term frequency is less than or equal to the norm.
 - Smaller impacts: since each block of postings has only one unique norm value 
on average, then it also only has one impact on average. This helps at search 
time since computing the score of this impact gives us immediately the best 
score of the block, as opposed to having to iterate several impacts and take 
the highest score.
 - For term queries, it makes sure that among all documents that have X 
occurrences of the queried term, we visit the documents that have the lowest 
norm first, and thus the ones that trigger the better scores.
 - Boolean queries are interesting: they get the same above benefit as term 
queries but on the other hand the norm tends to correlate with the number of 
unique terms so it might be that you need to collect more matches before you 
find one that matches several query terms.

I hacked a quick prototype and ran luceneutil on wikibig, results are 
encouraging:
{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff
   HighTermDayOfYearSort   37.64  (6.4%)   33.96  (4.7%)   
-9.8% ( -19% -1%)
  HighPhrase   26.45  (2.7%)   25.24  (2.8%)   
-4.6% (  -9% -0%)
   OrHighLow  341.59  (2.8%)  327.84  (2.6%)   
-4.0% (  -9% -1%)
  Fuzzy2  153.15  (5.3%)  147.70  (5.1%)   
-3.6% ( -13% -7%)
  IntNRQ  151.43  (1.4%)  147.04  (3.4%)   
-2.9% (  -7% -1%)
   HighTermMonthSort   79.28  (6.4%)   79.44  (7.6%)
0.2% ( -12% -   15%)
 Respell  229.10  (2.2%)  230.62  (1.8%)
0.7% (  -3% -4%)
  Fuzzy1  285.25  (6.9%)  288.99  (6.8%)
1.3% ( -11% -   16%)
 Prefix3   34.60 (10.3%)   35.14 (10.6%)
1.6% ( -17% -   25%)
Wildcard   72.36  (5.8%)   73.86  (6.3%)
2.1% (  -9% -   15%)
 MedTerm 1895.68  (4.2%) 1939.92  (4.2%)
2.3% (  -5% -   11%)
HighSpanNear5.25  (6.0%)5.46  (6.0%)
3.9% (  -7% -   17%)
 LowSloppyPhrase6.85  (6.5%)7.13  (6.3%)
4.2% (  -8% -   18%)
   LowPhrase   46.08  (1.7%)   48.56  (1.8%)
5.4% (   1% -9%)
 LowSpanNear   24.03  (3.7%)   25.68  (4.3%)
6.9% (  -1% -   15%)
 MedSpanNear5.20 (13.2%)5.63 (15.2%)
8.3% ( -17% -   42%)
 MedSloppyPhrase   11.01  (4.5%)   11.95  (4.7%)
8.6% (   0% -   18%)
   MedPhrase   23.39  (2.6%)   25.64  (2.2%)
9.6% (   4% -   14%)
HighSloppyPhrase3.84  (5.9%)4.26  (5.8%)   
11.0% (   0% -   24%)
  AndHighLow  401.13  (3.4%)  458.11  (3.0%)   
14.2% (   7% -   21%)
 LowTerm 2294.98  (4.0%) 2863.59  (7.0%)   
24.8% (  13% -   37%)
  AndHighMed   53.62  (3.8%)   71.40  (1.8%)   
33.2% (  26% -   40%)
HighTerm 1286.59  (3.9%) 1917.61  (5.7%)   
49.0% (  38% -   60%)
 AndHighHigh   41.24  (3.5%)   69.17  (4.2%)   
67.7% (  58% -   78%)
   OrHighMed   49.92  (2.4%)   84.95  (4.0%)   
70.2% (  62% -   78%)
  OrHighHigh   43.55  (2.3%)   90.06  (4.8%)  
106.8% (  97% -  116%)
{noformat}

The {{doc}} file is 12% smaller.

> Allow index sorting by field length
> ---
>
> Key: LUCENE-8069
> URL: https://issues.apache.org/jira/browse/LUCENE-8069
> Project: Lucene - Core
>  Issue Type: Wish
>    Reporter: Adrien Grand
>Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8900) Simplify MultiSorter

2019-07-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8900:
-
Attachment: LUCENE-8900.patch
Status: Open  (was: Open)

Here is a patch, it does two things:
 - Uses advanceExact instead of advance on doc-value iterators.
 - Replaces usage of Comparable with longs, since in all cases values can be 
converted to comparable longs, which avoids issues with generics.

> Simplify MultiSorter
> 
>
> Key: LUCENE-8900
> URL: https://issues.apache.org/jira/browse/LUCENE-8900
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8900.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8900) Simplify MultiSorter

2019-07-02 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8900:


 Summary: Simplify MultiSorter
 Key: LUCENE-8900
 URL: https://issues.apache.org/jira/browse/LUCENE-8900
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8892) Missing closing parens in string representation of MultiBoolFunction

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876712#comment-16876712
 ] 

Adrien Grand commented on LUCENE-8892:
--

[~munendrasn] I resolved and gave you permission. It should work next time.

> Missing closing parens in string representation of MultiBoolFunction
> 
>
> Key: LUCENE-8892
> URL: https://issues.apache.org/jira/browse/LUCENE-8892
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Florian Diebold
>Priority: Trivial
> Fix For: 8.2
>
> Attachments: 0001-Fix-missing-parenthesis-in-MultiBoolFunction.patch, 
> LUCENE-8892.patch, SOLR-13514.patch
>
>
> The {{description}} function of {{MultiBoolFunction}} includes an open 
> parenthesis, but doesn't close it. This makes score explanations more 
> confusing than necessary sometimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8892) Missing closing parens in string representation of MultiBoolFunction

2019-07-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8892:
-
   Resolution: Fixed
Fix Version/s: 8.2
   Status: Resolved  (was: Patch Available)

> Missing closing parens in string representation of MultiBoolFunction
> 
>
> Key: LUCENE-8892
> URL: https://issues.apache.org/jira/browse/LUCENE-8892
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Florian Diebold
>Priority: Trivial
> Fix For: 8.2
>
> Attachments: 0001-Fix-missing-parenthesis-in-MultiBoolFunction.patch, 
> LUCENE-8892.patch, SOLR-13514.patch
>
>
> The {{description}} function of {{MultiBoolFunction}} includes an open 
> parenthesis, but doesn't close it. This makes score explanations more 
> confusing than necessary sometimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876705#comment-16876705
 ] 

Adrien Grand commented on LUCENE-8857:
--

We need to have all changes in the same pull request, otherwise there will be a 
window of time during which we will get test failures when testing Solr, which 
could break a lot of people. As you noticed, it didn't take long to Munendra to 
notice something had broken.

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876698#comment-16876698
 ] 

Adrien Grand commented on LUCENE-8857:
--

[~atris] Thanks for looking into the grouping failure. I'm not seeing changes 
to Solr, so I'm assuming we would still get the failure that [~munendrasn] 
shared if we pushed?

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876689#comment-16876689
 ] 

Adrien Grand commented on LUCENE-8757:
--

This change as been reverted from 8.x due to the fact that it required changes 
to TopDocs#merge that would necessarily be breaking to our users.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-07-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8757:
-
Fix Version/s: (was: 8.2)

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-8.x-Linux (64bit/jdk-11.0.3) - Build # 803 - Unstable!

2019-07-02 Thread Adrien Grand
I muted this test and opened
https://issues.apache.org/jira/browse/LUCENE-8898, which I made a
blocker for 8.2.

On Tue, Jul 2, 2019 at 7:27 AM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/803/
> Java: 64bit/jdk-11.0.3 -XX:-UseCompressedOops -XX:+UseG1GC
>
> 4 tests failed.
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testMap
>
> Error Message:
> expected:<25152.0> but was:<30184.0>
>
> Stack Trace:
> java.lang.AssertionError: expected:<25152.0> but was:<30184.0>
> at 
> __randomizedtesting.SeedInfo.seed([357E97B4DBB41250:15582367C14F0122]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:553)
> at org.junit.Assert.assertEquals(Assert.java:683)
> at 
> org.apache.lucene.util.TestRamUsageEstimator.testMap(TestRamUsageEstimator.java:136)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
>
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testMap
>
> Error Message:
> expected:<25152.0> but was:<30184.0>
>
> Stack Trace:
> java.lang.AssertionError: expected:<25152.0> but was:<30184.0>
> at 
> 

[jira] [Created] (LUCENE-8898) TestRamUsageEstimator.testMap failures

2019-07-01 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8898:


 Summary: TestRamUsageEstimator.testMap failures
 Key: LUCENE-8898
 URL: https://issues.apache.org/jira/browse/LUCENE-8898
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
 Fix For: 8.2


Here is an example failure:

{noformat}
4 tests failed.
FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testMap

Error Message:
expected:<25152.0> but was:<30184.0>

Stack Trace:
java.lang.AssertionError: expected:<25152.0> but was:<30184.0>
at 
__randomizedtesting.SeedInfo.seed([ED7055A14021EA69:CD56E1725ADAF91B]:0)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:553)
at org.junit.Assert.assertEquals(Assert.java:683)
at 
org.apache.lucene.util.TestRamUsageEstimator.testMap(TestRamUsageEstimator.java:136)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}

This happens on master and branch_8x but always when the JVM version is greater 
than or equal to 11 apparently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (LUCENE-8898) TestRamUsageEstimator.testMap failures

2019-07-01 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8898:
-
Issue Type: Bug  (was: Improvement)

> TestRamUsageEstimator.testMap failures
> --
>
> Key: LUCENE-8898
> URL: https://issues.apache.org/jira/browse/LUCENE-8898
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Adrien Grand
>Priority: Blocker
> Fix For: 8.2
>
>
> Here is an example failure:
> {noformat}
> 4 tests failed.
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testMap
> Error Message:
> expected:<25152.0> but was:<30184.0>
> Stack Trace:
> java.lang.AssertionError: expected:<25152.0> but was:<30184.0>
> at 
> __randomizedtesting.SeedInfo.seed([ED7055A14021EA69:CD56E1725ADAF91B]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:553)
> at org.junit.Assert.assertEquals(Assert.java:683)
> at 
> org.apache.lucene.util.TestRamUsageEstimator.testMap(TestRamUsageEstimator.java:136)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-01 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876248#comment-16876248
 ] 

Adrien Grand commented on LUCENE-8857:
--

[~atris] Can you look into those failures? I had understood from your comment 
on the PR that you had run tests? FYI I'm seeing issues on the Lucene end as 
well, e.g. ant test  -Dtestcase=TestGrouping -Dtests.method=testRandom 
-Dtests.seed=1039BE5B957F7FDD -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=be -Dtests.timezone=Europe/Rome -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8.

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-01 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876242#comment-16876242
 ] 

Adrien Grand commented on LUCENE-8857:
--

Thanks [~munendrasn] I'm reverting now.

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-07-01 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8857.
--
   Resolution: Fixed
Fix Version/s: master (9.0)

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8882) Add State To QueryVisitor

2019-07-01 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876132#comment-16876132
 ] 

Adrien Grand commented on LUCENE-8882:
--

Can you elaborate more on how this would help replace IndexOrDocValues?

> Add State To QueryVisitor
> -
>
> Key: LUCENE-8882
> URL: https://issues.apache.org/jira/browse/LUCENE-8882
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> QueryVisitor has no state passed in either up or down recursion. This limits 
> the width of decisions that can be taken by visitation of QueryVisitor. For 
> eg, for LUCENE-8881, we need a way to specify is the visitor is a rewriter 
> visitor.
>  
> This Jira proposes adding a property bag model to QueryVisitor, which can 
> then be referred to by the Query instance being visited by QueryVisitor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8862) Collector Level Dynamic Memory Accounting

2019-07-01 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8862.
--
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

> Collector Level Dynamic Memory Accounting
> -
>
> Key: LUCENE-8862
> URL: https://issues.apache.org/jira/browse/LUCENE-8862
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Inspired from LUCENE-8855, I am thinking of adding a new interface which 
> tracks dynamic memory used by Collectors. This shall allow users to get an 
> accountability as to the memory usage of their Collectors and better plan 
> their resource capacity. This shall also allow us to add Collector level 
> limits for memory usage, thus allowing users a finer control over their 
> resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8892) Missing closing parens in string representation of MultiBoolFunction

2019-06-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874962#comment-16874962
 ] 

Adrien Grand commented on LUCENE-8892:
--

+1

> Missing closing parens in string representation of MultiBoolFunction
> 
>
> Key: LUCENE-8892
> URL: https://issues.apache.org/jira/browse/LUCENE-8892
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Florian Diebold
>Priority: Trivial
> Attachments: 0001-Fix-missing-parenthesis-in-MultiBoolFunction.patch, 
> LUCENE-8892.patch, SOLR-13514.patch
>
>
> The {{description}} function of {{MultiBoolFunction}} includes an open 
> parenthesis, but doesn't close it. This makes score explanations more 
> confusing than necessary sometimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong

2019-06-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874906#comment-16874906
 ] 

Adrien Grand commented on LUCENE-8831:
--

[~ivera] This looks like a good candidate for 8.1.2?

> LatLonShapeBoundingBoxQuery hashcode is wrong 
> --
>
> Key: LUCENE-8831
> URL: https://issues.apache.org/jira/browse/LUCENE-8831
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns 
> always a different value. Therefore the query cannot be cached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874810#comment-16874810
 ] 

Adrien Grand commented on LUCENE-8878:
--

[~mikemccand] ImpactsEnum as mostly about exposing maximum scores per block. I 
believe you are talking about Scorer#setMinCompetitiveScore, ie. changing the 
FieldComparator API to only track the bottom bucket as opposed to every bucket? 
If this is the case I agree that it sounds like a good idea to explore.

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8890) Parallel Iteration of Lists

2019-06-28 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8890.
--
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

Thank you [~svamann], sorry it took so long to merge your change.

> Parallel Iteration of Lists
> ---
>
> Key: LUCENE-8890
> URL: https://issues.apache.org/jira/browse/LUCENE-8890
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Sven Amann
>Priority: Minor
>  Labels: pull-request-available
> Fix For: master (9.0), 8.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Solr/contrib/analysis-extras contains the class `BooleanWeight`, which 
> maintains two lists that are repeatedly iterated over in parallel. While both 
> lists do have the same length, this is not immediately obvious from the 
> locations the iterate them. A future change may lead to the lists getting our 
> of sync, which would break the iterations. Moreover, there is no established 
> language feature for iterating two lists, which is why the iteration is 
> implemented differently in various locations throughout the class.
> I created a patch that joins the two lists into one, which simplifies the 
> iteration, unifies the implementation in all places, and prevents that the 
> two lists get out of sync without becoming aware of the parallel iterations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Moved] (LUCENE-8890) Parallel Iteration of Lists

2019-06-28 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand moved SOLR-12736 to LUCENE-8890:
-

Affects Version/s: (was: 8.0)
Lucene Fields: New
  Key: LUCENE-8890  (was: SOLR-12736)
  Project: Lucene - Core  (was: Solr)

> Parallel Iteration of Lists
> ---
>
> Key: LUCENE-8890
> URL: https://issues.apache.org/jira/browse/LUCENE-8890
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Sven Amann
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Solr/contrib/analysis-extras contains the class `BooleanWeight`, which 
> maintains two lists that are repeatedly iterated over in parallel. While both 
> lists do have the same length, this is not immediately obvious from the 
> locations the iterate them. A future change may lead to the lists getting our 
> of sync, which would break the iterations. Moreover, there is no established 
> language feature for iterating two lists, which is why the iteration is 
> implemented differently in various locations throughout the class.
> I created a patch that joins the two lists into one, which simplifies the 
> iteration, unifies the implementation in all places, and prevents that the 
> two lists get out of sync without becoming aware of the parallel iterations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8889) Remove Dead Code From PointRangeQuery

2019-06-27 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8889.
--
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

Oops I had not seen Jim commented here before merging. I'm still resolving, but 
happy to reopen/revert if there are concerns.

> Remove Dead Code From PointRangeQuery
> -
>
> Key: LUCENE-8889
> URL: https://issues.apache.org/jira/browse/LUCENE-8889
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Minor
> Fix For: master (9.0), 8.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> PointRangeQuery has accessors for the underlying points in the query but 
> those are never accessed. We should remove them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-27 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874010#comment-16874010
 ] 

Adrien Grand commented on LUCENE-8871:
--

The visibility changes are required because our Javadocs checker verifies that 
every method/constructor that ends up in the Javadocs has a description. 
Reducing the visibility helped because then these classes don't even show up in 
javadocs anymore.

> Move Kuromoji DictionaryBuilder tool from src/tools to src/ 
> 
>
> Key: LUCENE-8871
> URL: https://issues.apache.org/jira/browse/LUCENE-8871
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently tests in tools directories are not run as part of the normal 
> testing done by {{ant test}} - you have to explicitly run {{test-tools}}, 
> which it seems people don't do (and it might not survivie translation to 
> gradle, who knows), so [~rcmuir] suggested we just move the tools into the 
> main source tree (under src/java and src/test)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-27 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873980#comment-16873980
 ] 

Adrien Grand commented on LUCENE-8871:
--

[~sokolov] I tried to address the precommit failures but this required some 
changes to the visibility of classes, could you review?

> Move Kuromoji DictionaryBuilder tool from src/tools to src/ 
> 
>
> Key: LUCENE-8871
> URL: https://issues.apache.org/jira/browse/LUCENE-8871
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently tests in tools directories are not run as part of the normal 
> testing done by {{ant test}} - you have to explicitly run {{test-tools}}, 
> which it seems people don't do (and it might not survivie translation to 
> gradle, who knows), so [~rcmuir] suggested we just move the tools into the 
> main source tree (under src/java and src/test)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected

2019-06-27 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873941#comment-16873941
 ] 

Adrien Grand commented on LUCENE-8886:
--

+1

> TestMutablePointsReaderUtils not doing what it is expected
> --
>
> Key: LUCENE-8886
> URL: https://issues.apache.org/jira/browse/LUCENE-8886
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8886.patch
>
>
> The  TestMutablePointsReaderUtils is actually not doing what it is expected. 
> The problem is that we are constructing Point objects but not copying the 
> bytes provided so is always working with arrays with 0 values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8855) Add Accountable to some Query implementations

2019-06-27 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873849#comment-16873849
 ] 

Adrien Grand commented on LUCENE-8855:
--

[~ab] I'd be in favor of waiting for 8.2, which we could release right after 
8.1.2.

> Add Accountable to some Query implementations
> -
>
> Key: LUCENE-8855
> URL: https://issues.apache.org/jira/browse/LUCENE-8855
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.1.2
>
> Attachments: LUCENE-8855.patch, LUCENE-8855.patch, LUCENE-8855.patch, 
> LUCENE-8855.patch, LUCENE-8855.patch
>
>
> Query implementations should also support {{Accountable}} API in order to 
> monitor the memory consumption e.g. in caches where either keys or values are 
> {{Query}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8855) Add Accountable to some Query implementations

2019-06-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873426#comment-16873426
 ] 

Adrien Grand commented on LUCENE-8855:
--

[~ab] I'm surprised this went to 8.1, this isn't a bug fix?

> Add Accountable to some Query implementations
> -
>
> Key: LUCENE-8855
> URL: https://issues.apache.org/jira/browse/LUCENE-8855
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.1.2
>
> Attachments: LUCENE-8855.patch, LUCENE-8855.patch, LUCENE-8855.patch, 
> LUCENE-8855.patch, LUCENE-8855.patch
>
>
> Query implementations should also support {{Accountable}} API in order to 
> monitor the memory consumption e.g. in caches where either keys or values are 
> {{Query}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873323#comment-16873323
 ] 

Adrien Grand commented on LUCENE-8885:
--

I don't think backward compatibility should be a concern for such low-level 
APIs. I like 1 but then we wouldn't have an API that we could use for merges 
anymore since we wouldn't have a reliable way to know which byte[] maps to 
which docID(s)? Maybe 2 is more practical, and we could make it look a bit 
nicer by replacing the int[] with a DocIdSetIterator, ie. {{void 
visit(DocIdSetIterator docs, byte[] packedValue) throws IOException}}?

> Optimise BKD reader by exploiting cardinality information stored on leaves
> --
>
> Key: LUCENE-8885
> URL: https://issues.apache.org/jira/browse/LUCENE-8885
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. In such case the points are stored together with the 
> cardinality. We still call the IntersectVisitor once per document therefore 
> we are checking many times the same point agains the query. The idea is to 
> check the point once and then add all the documents.
> The API of the IntersectVisitor does not allow that, and therefore to exploit 
> that property we need to either change the API or extend it. Here are the 
> possibilities I can think of:
> 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) 
> by the following method:
> {code:java}
>  /** Called for leaf cells that intersects the leaf to test if the point  
>  matches to the query
>  * In case it matches, the implementor must call {@link 
> IntersectVisitor#visit(int)} with the
>  * documents associated with this point are visited */
> boolean matches(byte[] packedValue) throws IOException;
> {code}
> This will allow the BKD reader to check if a point matches the query and if 
> true then Coll the method IntersectVisitor#visit(int) for all documents 
> associated with that point.
> The drawback of this approach is backwards compatibility and the need to 
> update all classes implement this interface.
> 2) Extends the API by adding a new default method in the IntersectVisitor 
> interface:
> {code:java}
>  /** Called for documents in a leaf cell that crosses the query.  The consumer
>  *  should scrutinize the packedValue to decide whether to accept it.  If 
> accepted it should
>  *  consider only the {@code numberDocs} documents starting at {@code 
> offset} In the 1D case,
>  *  values are visited in increasing order, and in the case of ties, in 
> increasing
>  *  docID order. */
> default void visit(int[] docID, int offset, int numberDocs, byte[] 
> packedValue) throws IOException {
>   for ( int i =offset; i < offset + numberDocs; i++) {
> visit(docID[i], packedValue);
>   }
> }
> {code}
> The merit of this approach is that is backwards compatible and it is up to 
> the implementors to override this method and get the benefits for this 
> optimisation.The biggest downside is that it assumes that the codec has doc 
> IDs available in an int[] slice as opposed to streaming them from disk 
> directly to the IntersectVisitor for instance as [~jpountz] noted.
> Maybe there are more options I did not think about so looking forward to 
> hearing opining if we should do this change at all and if so, how to approach 
> it. My +1 goes to 1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8877) TopDocsCollector Should Not Depend on Priority Queue

2019-06-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873014#comment-16873014
 ] 

Adrien Grand commented on LUCENE-8877:
--

Abstraction increases complexity too, it feels reasonable to me that top-docs 
collectors are backed by a priority queue since this is the go-to data-stucture 
for top-k selection problems? If you need more flexibility, you could directly 
extends Collector as opposed to TopDocsCollector?

> TopDocsCollector Should Not Depend on Priority Queue
> 
>
> Key: LUCENE-8877
> URL: https://issues.apache.org/jira/browse/LUCENE-8877
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> TopDocsCollector is tightly coupled to the notion of priority queue, which is 
> not necessarily a good abstraction to have since the collector really just 
> needs an interface to iterate on and hold docID and score, with possibly 
> shard indexes.
>  
> We should rewrite this to a more simplistic interface with priority queue 
> being the default implementation 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8855) Add Accountable to Query implementations

2019-06-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873011#comment-16873011
 ] 

Adrien Grand commented on LUCENE-8855:
--

+1

> Add Accountable to Query implementations
> 
>
> Key: LUCENE-8855
> URL: https://issues.apache.org/jira/browse/LUCENE-8855
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Attachments: LUCENE-8855.patch, LUCENE-8855.patch, LUCENE-8855.patch, 
> LUCENE-8855.patch, LUCENE-8855.patch
>
>
> Query implementations should also support {{Accountable}} API in order to 
> monitor the memory consumption e.g. in caches where either keys or values are 
> {{Query}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8870) Support numeric value in Field class

2019-06-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873009#comment-16873009
 ] 

Adrien Grand commented on LUCENE-8870:
--

bq. Currently NumericDocValuesField does not support the option for stored. So 
users need to add a separate StoredField.

I'm viewing this one as a feature. :) Having the same field indexed and stored 
often makes sense because the analyzer takes care of converting the text into 
the internal representation that the index cares about. But for doc values, we 
expect users to do this work of converting the data to what works for Lucene. 
For instance if you are indexing doubles, you would likely index as a byte[] in 
points, use the long bits of the double in doc values, and use the double 
directly for storing, these are 3 different representations for the same data. 
My gut feeling is that trying to fold everything into a single field would make 
things more complicated rather than simpler.

> Support numeric value in Field class
> 
>
> Key: LUCENE-8870
> URL: https://issues.apache.org/jira/browse/LUCENE-8870
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8870.patch
>
>
> I checked the following comment in Field class.
> {code:java}
> // TODO: allow direct construction of int, long, float, double value too..?
> {code}
> We already have some fields like IntPoint and StoredField, but I think it's 
> okay.
> The test cases are set in the TestField class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873003#comment-16873003
 ] 

Adrien Grand commented on LUCENE-8878:
--

bq. Is it the case today? I wonder whether the ordinals are comparable across 
segments (likely not...);

Indeed ordinals are not comparable across segments. Have a look at 
TermOrdValComparator#setBottom, it looks up the bottom term in the terms 
dictionary of the current segment to get an ordinal that may be used for 
comparison. I'm afraid the API would need to be a bit more complex than what 
you are proposing, but hopefully not as complicated as the current API.

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872534#comment-16872534
 ] 

Adrien Grand commented on LUCENE-8878:
--

+1 to simplify, even at the cost of some performance. As long as we can keep 
comparing strings using their ordinals instead of their actual values, it 
should be good.

bq. To access the values can we somehow use the existing FunctionValues classes?

I was hoping we could soon replace FunctionValues with the new 
oal.search.LongValues/DoubleValues. :)

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8855) Add Accountable to Query implementations

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872532#comment-16872532
 ] 

Adrien Grand commented on LUCENE-8855:
--

BytesRefHash has some code commented out that looks like a left over? One minor 
last concern I have is that sizeOf without a default size feels a bit trappy 
since lots of objects are larger than their shallow size. Could we make it fail 
if it encounters an unknown object instead of assuming shallow size?

> Add Accountable to Query implementations
> 
>
> Key: LUCENE-8855
> URL: https://issues.apache.org/jira/browse/LUCENE-8855
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Attachments: LUCENE-8855.patch, LUCENE-8855.patch, LUCENE-8855.patch, 
> LUCENE-8855.patch
>
>
> Query implementations should also support {{Accountable}} API in order to 
> monitor the memory consumption e.g. in caches where either keys or values are 
> {{Query}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872205#comment-16872205
 ] 

Adrien Grand commented on LUCENE-8806:
--

Oh, that is interesting. Are you testing on wikibigall or one of the wikimedium 
datasets that has truncated content?

> WANDScorer should support two-phase iterator
> 
>
> Key: LUCENE-8806
> URL: https://issues.apache.org/jira/browse/LUCENE-8806
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8806.patch, LUCENE-8806.patch
>
>
> Following https://issues.apache.org/jira/browse/LUCENE-8770 the WANDScorer 
> should leverage two-phase iterators in order to be faster when used in 
> conjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872156#comment-16872156
 ] 

Adrien Grand commented on LUCENE-8806:
--

I'm a bit confused as to why HighPhraseHighTerm and some other queries get so 
much slower. I was thinking this could only get faster than before since we 
would now leverage two-phase iterators instead of using iterators naively. 

> WANDScorer should support two-phase iterator
> 
>
> Key: LUCENE-8806
> URL: https://issues.apache.org/jira/browse/LUCENE-8806
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8806.patch, LUCENE-8806.patch
>
>
> Following https://issues.apache.org/jira/browse/LUCENE-8770 the WANDScorer 
> should leverage two-phase iterators in order to be faster when used in 
> conjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872151#comment-16872151
 ] 

Adrien Grand commented on LUCENE-8806:
--

FYI we have an issue for phrases already LUCENE-8311.

> WANDScorer should support two-phase iterator
> 
>
> Key: LUCENE-8806
> URL: https://issues.apache.org/jira/browse/LUCENE-8806
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8806.patch, LUCENE-8806.patch
>
>
> Following https://issues.apache.org/jira/browse/LUCENE-8770 the WANDScorer 
> should leverage two-phase iterators in order to be faster when used in 
> conjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Propose CHANGES.txt releases begin with the categories (empty)

2019-06-25 Thread Adrien Grand
+1, it's otherwise tempting to reuse an existing category even if it
doesn't fit as well as a category that is not listed yet.

On Tue, Jun 25, 2019 at 6:40 AM David Smiley  wrote:
>
> Looking at Solr's CHANGES.txt for 8.2 I see we have some sections: "Upgrade 
> Notes", "New Features", "Bug Fixes", and "Other Changes".  There is no 
> "Improvements" so no surprise here, the New Features category has issues 
> that ought to be listed as such.  I think the order vary as well.  I propose 
> that on new releases, the initial state of the next release in CHANGES.txt 
> have these sections.  They can easily be removed at the upcoming release if 
> there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up.  Any opinions?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley



-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?

2019-06-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871732#comment-16871732
 ] 

Adrien Grand commented on LUCENE-8875:
--

Aggregates don't have this issue since they don't track top hits?

+1 to having a separate collector for large N values in sandbox.

> Should TopScoreDocCollector Always Populate Sentinel Values?
> 
>
> Key: LUCENE-8875
> URL: https://issues.apache.org/jira/browse/LUCENE-8875
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation, 
> and instruct HitQueue to populate with sentinels. While this is a great 
> safety mechanism, for very large datasets where the query's selectivity is 
> high, the sentinel population can be redundant and can become a large enough 
> bottleneck in itself. Does it make sense to introduce a new parameter in 
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and 
> does not populate sentinels?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?

2019-06-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871560#comment-16871560
 ] 

Adrien Grand commented on LUCENE-8875:
--

What do you mean by bucket aggregates?

> Should TopScoreDocCollector Always Populate Sentinel Values?
> 
>
> Key: LUCENE-8875
> URL: https://issues.apache.org/jira/browse/LUCENE-8875
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation, 
> and instruct HitQueue to populate with sentinels. While this is a great 
> safety mechanism, for very large datasets where the query's selectivity is 
> high, the sentinel population can be redundant and can become a large enough 
> bottleneck in itself. Does it make sense to introduce a new parameter in 
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and 
> does not populate sentinels?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?

2019-06-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871508#comment-16871508
 ] 

Adrien Grand commented on LUCENE-8875:
--

I like pre-populating the hit queue mostly because it makes the collector code 
simpler and likely a bit faster. As a comparison TopFieldCollector can't 
pre-populate the hit queue, which forces it to have different code paths for 
the case that the priority queue is full (common path) or that the queue is not 
full yet. In general I'm seeing large number of hits as an abuse case.

> Should TopScoreDocCollector Always Populate Sentinel Values?
> 
>
> Key: LUCENE-8875
> URL: https://issues.apache.org/jira/browse/LUCENE-8875
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation, 
> and instruct HitQueue to populate with sentinels. While this is a great 
> safety mechanism, for very large datasets where the query's selectivity is 
> high, the sentinel population can be redundant and can become a large enough 
> bottleneck in itself. Does it make sense to introduce a new parameter in 
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and 
> does not populate sentinels?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8870) Support numeric value in Field class

2019-06-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871491#comment-16871491
 ] 

Adrien Grand commented on LUCENE-8870:
--

I think this new constructor would be misleading. For instance it might be 
tempting to use if you wanted to index doubles. But the only thing you can 
index with this constructor is the string representation of the double values, 
which is unlikely to be helpful.

I wonder whether we should make this class abstract instead so that it can't be 
instantiated directly, and potentially enhance some of its sub classes to 
address use-cases that were only doable with this Field class until now, such 
as having a text field with term vectors enabled.

> Support numeric value in Field class
> 
>
> Key: LUCENE-8870
> URL: https://issues.apache.org/jira/browse/LUCENE-8870
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8870.patch
>
>
> I checked the following comment in Field class.
> {code:java}
> // TODO: allow direct construction of int, long, float, double value too..?
> {code}
> We already have some fields like IntPoint and StoredField, but I think it's 
> okay.
> The test cases are set in the TestField class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Munendra S N as Lucene/Solr committer

2019-06-24 Thread Adrien Grand
Welcome Munendra!

On Fri, Jun 21, 2019 at 11:42 AM Ishan Chattopadhyaya
 wrote:
>
> Hi all,
>
> Please join me in welcoming Munendra as a Lucene/Solr committer!
>
> Munendra has been working on bug fixes and improvements in various
> parts of Solr.
>
> Congratulations and welcome! It is a tradition to introduce yourself
> with a brief bio, Munendra.
>
> Ishan
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>


-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

2019-06-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871436#comment-16871436
 ] 

Adrien Grand commented on LUCENE-8806:
--

+1

Can you run luceneutil on some disjunctions of phrase queries to double check 
it helps?

> WANDScorer should support two-phase iterator
> 
>
> Key: LUCENE-8806
> URL: https://issues.apache.org/jira/browse/LUCENE-8806
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8806.patch, LUCENE-8806.patch
>
>
> Following https://issues.apache.org/jira/browse/LUCENE-8770 the WANDScorer 
> should leverage two-phase iterators in order to be faster when used in 
> conjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8876) EnglishMinimalStemmer does not implement s-stemmer paper correctly?

2019-06-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871408#comment-16871408
 ] 

Adrien Grand commented on LUCENE-8876:
--

I agree that the use of "applicable" suggests that the THEN part has fired, but 
then doesn't it mean that exceptions of the 2nd rule are always ignored?

> EnglishMinimalStemmer does not implement s-stemmer paper correctly?
> ---
>
> Key: LUCENE-8876
> URL: https://issues.apache.org/jira/browse/LUCENE-8876
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Mark Harwood
>Priority: Minor
>
> The EnglishMinimalStemmer fails to stem ees suffixes like bees, trees and 
> employees.
> The [original 
> paper|[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.9828=rep1=pdf]]
>  has this table of rules:
> !https://user-images.githubusercontent.com/170925/59616454-5dc7d580-911c-11e9-80b0-c7a59458c5a7.png!
> The notes accompanying the table state :
> {quote}"the first applicable rule encountered is the only one used"
> {quote}
>  
> For the {{ees}} and {{oes}} suffixes I think EnglishMinimalStemmer 
> misinterpreted the rule logic and consequently {{bees != bee}} and {{tomatoes 
> != tomato}}. The {{oes}} and {{ees}} suffixes are left intact.
> "The first applicable rule" for {{ees}} could be interpreted as rule 2 or 3 
> in the table depending on if you take {{applicable}} to mean "the THEN part 
> of the rule has fired" or just that the suffix was referenced in the rule. 
> EnglishMinimalStemmer has assumed the latter and I think it should be the 
> former. We should fall through into rule 3 for {{ees}} and {{oes}} (remove 
> any trailing S). That's certainly the conclusion I came to independently 
> testing on real data.
> There are some additional changes I'd like to see in a plural stemmer but I 
> won't list them here - the focus should be making the code here match the 
> original paper it references.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality

2019-06-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868334#comment-16868334
 ] 

Adrien Grand commented on LUCENE-8868:
--

Is the index-time overhead noticeable? I'm thinking this change could help in 
the 1D case as well, for instance if you are storing dates as millis since 
Epoch that only have second (or day) granularity.

> New storing strategy for BKD tree leaves with low cardinality
> -
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1.  When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the 
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> *   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
> where two is the estimated size of storing the cardinality. This is an 
> overestimation as in some cases you will only need one byte to store the 
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note 
> that -1 is when all values are equal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8855) Add Accountable to Query implementations

2019-06-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868322#comment-16868322
 ] 

Adrien Grand commented on LUCENE-8855:
--

Thanks Andrzej this looks like a better trade-off to me in general. Do we need 
Accountable on PointRangeQuery, this one should always be small? I think we 
should also avoid Accountable on BytesRef and IntsRef since these objects can 
be used to represent a slice of an array. For instance I know in some of places 
we have collections of BytesRef objects that all share the same byte[], so 
counting the underlying byte[] more than once would be incorrect.

In the case of unknown queries I'm wondering whether we should return an 
arbitrary constant instead of the shallow size of the object, in order to 
overestimate memory usage instead of underestimating it? For the caching 
use-case I suspect it's better to overestimate memory usage a bit?

> Add Accountable to Query implementations
> 
>
> Key: LUCENE-8855
> URL: https://issues.apache.org/jira/browse/LUCENE-8855
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Attachments: LUCENE-8855.patch, LUCENE-8855.patch, LUCENE-8855.patch
>
>
> Query implementations should also support {{Accountable}} API in order to 
> monitor the memory consumption e.g. in caches where either keys or values are 
> {{Query}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8858) Migrate Lucene's Moin wiki to Confluence

2019-06-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867284#comment-16867284
 ] 

Adrien Grand commented on LUCENE-8858:
--

Thanks [~janhoy] and [~hossman]!

> Migrate Lucene's Moin wiki to Confluence
> 
>
> Key: LUCENE-8858
> URL: https://issues.apache.org/jira/browse/LUCENE-8858
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> We have a deadline end of June to migrate Moin wiki to Confluence.
> This Jira will track migration of Lucene's 
> https://wiki.apache.org/lucene-java/ over to 
> https://cwiki.apache.org/confluence/display/LUCENE
> The old Confluence space will be overwritten as it is not used.
> After migration we'll clean up and weed out what is not needed, and then 
> start moving developer-centric content into the main git repo (which will be 
> covered in other JIRAs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-18 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866873#comment-16866873
 ] 

Adrien Grand commented on LUCENE-8867:
--

+1 to split

> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



<    1   2   3   4   5   6   7   8   9   10   >