date:20220105



jpountz commented on pull request #588:
URL: https://github.com/apache/lucene/pull/588#issuecomment-1006349216


   Feel free to merge this if tests pass and you didn't have to make 
significant changes upon backporting. Do we also need to move the CHANGES entry 
to a different version on other branches?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()



jpountz commented on a change in pull request #578:
URL: https://github.com/apache/lucene/pull/578#discussion_r779353869



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##
@@ -74,11 +74,6 @@ protected boolean useHashTable(FacetsCollector fc, 
TaxonomyReader taxoReader) {
 return sumTotalHits < maxDoc / 10;
   }
 
-  /** Increment the count for this ordinal by 1. */
-  protected void increment(int ordinal) {

Review comment:
   @gsmiller I wonder if we should reconsider the backward-compatibility 
guarantees of the faceting APIs. Except for APIs that are really meant for end 
users to extend like analysis components, my understanding is that we usually 
consider overriding of our own classes expert usage that is not subject to 
backward compatibility (as opposed to direct usage of these classes).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn opened a new pull request #588: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (Backporting)



zacharymorn opened a new pull request #588:
URL: https://github.com/apache/lucene/pull/588


   This PR backports bug fix #444 to version `9.1.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn opened a new pull request #587: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (Backporting)



zacharymorn opened a new pull request #587:
URL: https://github.com/apache/lucene/pull/587


   This PR backports bug fix apache/lucene#444 to version `9.0.1`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn opened a new pull request #2637: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (Backporting)



zacharymorn opened a new pull request #2637:
URL: https://github.com/apache/lucene-solr/pull/2637


   This PR backports bug fix https://github.com/apache/lucene/pull/444 to 
version `8.11.2`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



zacharymorn commented on a change in pull request #534:
URL: https://github.com/apache/lucene/pull/534#discussion_r779299155



##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/perfield/TestPerFieldKnnVectorsFormat.java
##
@@ -172,9 +171,14 @@ public KnnVectorsWriter fieldsWriter(SegmentWriteState 
state) throws IOException
   KnnVectorsWriter writer = delegate.fieldsWriter(state);
   return new KnnVectorsWriter() {
 @Override
-public void writeField(FieldInfo fieldInfo, VectorValues values) 
throws IOException {
+public void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
+throws IOException {
   fieldsWritten.add(fieldInfo.name);
-  writer.writeField(fieldInfo, values);
+  // assert that knnVectorsReader#getVectorValues returns different 
instances upon repeated
+  // calls

Review comment:
   Ah right. I've moved it to `AssertingKnnVectorsReader`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 commented on a change in pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts



gf2121 commented on a change in pull request #585:
URL: https://github.com/apache/lucene/pull/585#discussion_r779279138



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -71,17 +71,27 @@ public FastTaxonomyFacetCounts(
 
   private final void count(List matchingDocs) throws IOException 
{
 for (MatchingDocs hits : matchingDocs) {
-  SortedNumericDocValues dv = 
hits.context.reader().getSortedNumericDocValues(indexFieldName);
-  if (dv == null) {
+  SortedNumericDocValues multiValued =
+  hits.context.reader().getSortedNumericDocValues(indexFieldName);
+  if (multiValued == null) {
 continue;
   }
 
+  NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
+
+  DocIdSetIterator valuesIt = singleValued != null ? singleValued : 
multiValued;
   DocIdSetIterator it =
-  
ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), dv));
+  
ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), 
valuesIt));
 
-  for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = 
it.nextDoc()) {
-for (int i = 0; i < dv.docValueCount(); i++) {
-  increment((int) dv.nextValue());
+  if (singleValued != null) {
+for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= it.nextDoc()) {

Review comment:
   Maybe simplify this a bit with `while(it.nextDoc() != 
DocIdSetIterator.NO_MORE_DOCS)` as `doc` is not used in the loop body?

##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -91,31 +101,36 @@ private final void count(List matchingDocs) 
throws IOException {
 
   private final void countAll(IndexReader reader) throws IOException {
 for (LeafReaderContext context : reader.leaves()) {
-  SortedNumericDocValues dv = 
context.reader().getSortedNumericDocValues(indexFieldName);
-  if (dv == null) {
+  SortedNumericDocValues multiValued =

Review comment:
   +1

##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -71,17 +71,27 @@ public FastTaxonomyFacetCounts(
 
   private final void count(List matchingDocs) throws IOException 
{
 for (MatchingDocs hits : matchingDocs) {
-  SortedNumericDocValues dv = 
hits.context.reader().getSortedNumericDocValues(indexFieldName);
-  if (dv == null) {
+  SortedNumericDocValues multiValued =
+  hits.context.reader().getSortedNumericDocValues(indexFieldName);
+  if (multiValued == null) {
 continue;
   }
 
+  NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
+
+  DocIdSetIterator valuesIt = singleValued != null ? singleValued : 
multiValued;
   DocIdSetIterator it =
-  
ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), dv));
+  
ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), 
valuesIt));
 
-  for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = 
it.nextDoc()) {
-for (int i = 0; i < dv.docValueCount(); i++) {
-  increment((int) dv.nextValue());
+  if (singleValued != null) {
+for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= it.nextDoc()) {
+  increment((int) singleValued.longValue());
+}
+  } else {
+for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= it.nextDoc()) {

Review comment:
   We can use `while(it.nextDoc() != DocIdSetIterator.NO_MORE_DOCS)` here 
too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Closed] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time

2022-01-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-6121.


> Fix CachingTokenFilter to propagate reset() the first time
> --
>
> Key: LUCENE-6121
> URL: https://issues.apache.org/jira/browse/LUCENE-6121
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: random-chains
> Fix For: 5.0, 6.0
>
> Attachments: 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch
>
>
> CachingTokenFilter should have been propagating reset() _but only the first 
> time_ and thus you would then use CachingTokenFilter in a more normal way – 
> wrap it and call reset() then increment in a loop, etc., instead of knowing 
> you need to reset() on what it wraps but not this token filter itself. That's 
> weird. It's ab-normal for a TokenFilter to never propagate reset, so every 
> user of CachingTokenFilter to date has worked around this by calling reset() 
> on the underlying input instead of the final wrapping token filter 
> (CachingTokenFilter in this case).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time

2022-01-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-6121.
--
Resolution: Fixed

> Fix CachingTokenFilter to propagate reset() the first time
> --
>
> Key: LUCENE-6121
> URL: https://issues.apache.org/jira/browse/LUCENE-6121
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: random-chains
> Fix For: 6.0, 5.0
>
> Attachments: 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch
>
>
> CachingTokenFilter should have been propagating reset() _but only the first 
> time_ and thus you would then use CachingTokenFilter in a more normal way – 
> wrap it and call reset() then increment in a loop, etc., instead of knowing 
> you need to reset() on what it wraps but not this token filter itself. That's 
> weird. It's ab-normal for a TokenFilter to never propagate reset, so every 
> user of CachingTokenFilter to date has worked around this by calling reset() 
> on the underlying input instead of the final wrapping token filter 
> (CachingTokenFilter in this case).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #543: LUCENE-10245: Addition of MultiDoubleValues(Source) and MultiLongValues(Source) along with faceting capabilities



gsmiller commented on pull request #543:
URL: https://github.com/apache/lucene/pull/543#issuecomment-1006175062


   @romseygeek I think this PR is ready for another look when you have a 
moment. Thanks again for your input!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #586: LUCENE-10353: add random null injection to TestRandomChains



rmuir commented on pull request #586:
URL: https://github.com/apache/lucene/pull/586#issuecomment-1006157404


   > I am fine, except the NPEs should have a message.
   
   Why? for users that throw the stacktrace away?
   
   > 
   > P.S.: And as said maybe require a message always!?
   
   Maybe, we should just decide how it should look? FWIW, if you care about 
messages, the implicit NPEs from the JDK are superior to anything we do:
   ```
   java.lang.NullPointerException: Cannot load from int array because "x" is 
null
at npe.implicitArray(npe.java:8)
...
   
   java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" 
because "x" is null
at npe.implicit(npe.java:5)
...
   ```
   
   If we just do `Objects.requireNonNull(x)`, we get:
   ```
   java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at npe.objects(npe.java:11)
...   
   ```
   
   If we do `Objects.requireNonNull(x, "x")`, it is only slightly better:
   ```
   java.lang.NullPointerException: x
at java.base/java.util.Objects.requireNonNull(Objects.java:233)
at npe.message(npe.java:14)
...
   ```
   
   In all cases there is a stack trace, users can't expect to debug anything if 
they throw that away. So part of me says, don't even bother with message. 
Especially I would be against formatting fancy strings for every null check.
   
   I can go along with just putting local variable's name in the message as a 
compromise (it is still an ugly hack! the "friendly" NPE feature in java seems 
half-baked!), but if we want that to be the standard, let's ban the one-arg 
method in forbidden-apis and fix it consistently everywhere? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting

2022-01-05 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469567#comment-17469567
 ] 

Greg Miller commented on LUCENE-10356:
--

[~gf2121] mind having a look at [https://github.com/apache/lucene/pull/585] 
when you have a free moment? This just extends a change you made recently. 
Curious to get your thoughts on it. Thanks!

> Special-case singleton doc values for general taxonomy facet counting
> -
>
> Key: LUCENE-10356
> URL: https://issues.apache.org/jira/browse/LUCENE-10356
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Inspired by [https://github.com/apache/lucene/pull/574,] we should also 
> special-case singleton dvs in the general count path (#573 specialized it for 
> countAll).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()



gsmiller commented on a change in pull request #578:
URL: https://github.com/apache/lucene/pull/578#discussion_r779174169



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##
@@ -74,11 +74,6 @@ protected boolean useHashTable(FacetsCollector fc, 
TaxonomyReader taxoReader) {
 return sumTotalHits < maxDoc / 10;
   }
 
-  /** Increment the count for this ordinal by 1. */
-  protected void increment(int ordinal) {

Review comment:
   I think we ought to leave this in. Removing it is a 
backwards-compatibility concern since it's possible (likely) that users have 
sub-classed `intTaxonomyFacets` and rely on this. I think it's also nice to 
keep in general so that sub-classes can rely on this instead of having to 
manage direct access to the dense/sparse structures if they choose to.

##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##
@@ -32,9 +32,9 @@
 public abstract class IntTaxonomyFacets extends TaxonomyFacets {
 
   /** Per-ordinal value. */
-  private final int[] values;
+  final int[] values;

Review comment:
   I'd suggest adding some javadoc to these two fields mentioning that 
they're exposed for sub-classes that want "expert" functionality (e.g., direct 
access along with the burden of knowing which one is being used). The doc could 
point users to `#increment` and `#getValue` for more typical use-cases that 
don't want the burden of directly accessing these.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #586: LUCENE-10353: add random null injection to TestRandomChains



uschindler commented on pull request #586:
URL: https://github.com/apache/lucene/pull/586#issuecomment-1006113376


   The only thing: we should pass the parameter name on NPE (2nd argument of 
requireNonNull. This allows to figure out easier which one was wrong for the 
caller.
   
   Maybe add a check in the random chains fuzzer to require a message on the 
exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #586: LUCENE-10353: add random null injection to TestRandomChains



uschindler commented on pull request #586:
URL: https://github.com/apache/lucene/pull/586#issuecomment-1006112599


   I switched to 100% null and it now still passes. We should also run one time 
with 50% to fuzz some cases where the first parameter is non-null and second is 
null.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-01-05 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469557#comment-17469557
 ] 

Greg Miller commented on LUCENE-10151:
--

Just so I don't loose track of this thought, we'll probably also want the 
blocking call to {{Future#get}} to specify a timeout as well if the user has 
specified one (here: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L721).

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #586: LUCENE-10353: add random null injection to TestRandomChains



uschindler commented on a change in pull request #586:
URL: https://github.com/apache/lucene/pull/586#discussion_r779141374



##
File path: 
lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java
##
@@ -754,6 +758,7 @@ public String toString() {
   } catch (InvocationTargetException ite) {
 final Throwable cause = ite.getCause();
 if (cause instanceof IllegalArgumentException
+|| cause instanceof NullPointerException

Review comment:
   Have added it in 
https://github.com/apache/lucene/pull/586/commits/c40d99c3ac90cac567483c114f158ee74a6c6698




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a change in pull request #586: LUCENE-10353: add random null injection to TestRandomChains



rmuir commented on a change in pull request #586:
URL: https://github.com/apache/lucene/pull/586#discussion_r779116266



##
File path: 
lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java
##
@@ -754,6 +758,7 @@ public String toString() {
   } catch (InvocationTargetException ite) {
 final Throwable cause = ite.getCause();
 if (cause instanceof IllegalArgumentException
+|| cause instanceof NullPointerException

Review comment:
   If we can tighten this logic to be `(cause instanceof 
NullPointerException && weProvidedANullArg)`, I think it would be better. then 
the test wouldn't mask bugs (internal NPEs) that happen for situations where we 
passed the ctor all non-null arguments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10261) Preset/ custom analyzer pipelines in Luke won't work with the module system



 [ 
https://issues.apache.org/jira/browse/LUCENE-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10261.
--
Resolution: Not A Problem

> Preset/ custom analyzer pipelines in Luke won't work with the module system
> ---
>
> Key: LUCENE-10261
> URL: https://issues.apache.org/jira/browse/LUCENE-10261
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> A spinoff from LUCENE-10255



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir opened a new pull request #586: LUCENE-10353: add random null injection to TestRandomChains



rmuir opened a new pull request #586:
URL: https://github.com/apache/lucene/pull/586


   10% of the time, TestRandomChains will pass `null` to any object parameters 
in analyzers' ctors.
   We allow NPE from the ctor, so it enforces the analyzers check up front.
   
   It just means we have to run the test in a loop:
   ```
   ./gradlew :lucene:analysis.tests:beast -Dtests.dups=100 --tests 
TestRandomChains -Dtests.nightly=true
   ```
   and add missing `Objects.requireNonNull()` to the bugs that it finds at 
runtime.
   
   Example fail:
   ```
  > java.lang.NullPointerException: Cannot invoke 
"org.apache.lucene.analysis.compound.hyphenation.HyphenationTree.hyphenate(char[],
 int, int, int, int)" because "this.hyphenator" is null
  > at 
__randomizedtesting.SeedInfo.seed([29B8EF94FA5640A3:1459C6F5BD445D63]:0)
  > at 
org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter.decompose(HyphenationCompoundWordTokenFilter.java:143)
  > at 
org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase.incrementToken(CompoundWordTokenFilterBase.java:115)
   ```
   
   See issue: https://issues.apache.org/jira/browse/LUCENE-10353


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10328) Module path for compiling and running tests is wrong

2022-01-05 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10328.
--
Resolution: Fixed

> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 9.1
>
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10328) Module path for compiling and running tests is wrong



[ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469529#comment-17469529
 ] 

ASF subversion and git services commented on LUCENE-10328:
--

Commit b8da9f32c8d436cc39601264dbb1f039b9882b57 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b8da9f3 ]

LUCENE-10328: open up certain packages for junit and the test framework 
(reflective access).


> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 9.1
>
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-10328) Module path for compiling and running tests is wrong

2022-01-05 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reopened LUCENE-10328:
--

Reopen for manual backporting to 9x.

> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 9.1
>
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10328) Module path for compiling and running tests is wrong



[ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469515#comment-17469515
 ] 

ASF subversion and git services commented on LUCENE-10328:
--

Commit ff547e7bbdc78d6869b6f47d828aa6452664ce58 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ff547e7 ]

LUCENE-10328: Module path for compiling and running tests is wrong (#571)



> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10328) Module path for compiling and running tests is wrong



 [ 
https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10328.
--
Fix Version/s: 9.1
   Resolution: Fixed

> Module path for compiling and running tests is wrong
> 
>
> Key: LUCENE-10328
> URL: https://issues.apache.org/jira/browse/LUCENE-10328
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 9.1
>
> Attachments: image-2021-12-19-12-29-21-737.png, 
> image-2022-01-04-16-04-56-563.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Uwe noticed that the module path for compiling and running tests is empty - 
> indeed, the modular configurations we create for the test sourceset do not 
> inherit from their main counterparts. This is not a standard thing created 
> for a sourceset - the test-main connection link is created by gradle's java 
> plugin. We need to do a similar thing for modular configurations.
> !image-2021-12-19-12-29-21-737.png|width=490,height=280!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss merged pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



dweiss merged pull request #571:
URL: https://github.com/apache/lucene/pull/571


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mcimadamore edited a comment on pull request #518: Initial rewrite of MMapDirectory for JDK-18 preview (incubating) Panama APIs (>= JDK-18-ea-b26)



mcimadamore edited a comment on pull request #518:
URL: https://github.com/apache/lucene/pull/518#issuecomment-1005995125


   > From what I have learned, copy operations have high overhead because:
   > 
   > * they are not hot, so aren't optimized so fast
   > 
   > * when not optimized, the setup cost is high (lots of class checks to 
get array type, decision for swapping bytes). This is especially heavy for 
small arrays.
   
   Hi, I'm not sure as to why copy operations should be slower in the memory 
access API than with the ByteBuffer API. I would expect most of the checks to 
be similar (except for the liveness tests of the segment involved). I do recall 
that the ByteBuffer API does optimize bulk copy for very small buffers (I don't 
recall what the limit is, but it was very very low, like 4 elements or 
something).
   
   In principle, this JVM fix (as per 18) should help too:
   https://bugs.openjdk.java.net/browse/JDK-8269119
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mcimadamore commented on pull request #518: Initial rewrite of MMapDirectory for JDK-18 preview (incubating) Panama APIs (>= JDK-18-ea-b26)



mcimadamore commented on pull request #518:
URL: https://github.com/apache/lucene/pull/518#issuecomment-1005995125


   > From what I have learned, copy operations have high overhead because:
   > 
   > * they are not hot, so aren't optimized so fast
   > 
   > * when not optimized, the setup cost is high (lots of class checks to 
get array type, decision for swapping bytes). This is especially heavy for 
small arrays.
   
   Hi, I'm not sure as to why copy operations should be slower in the memory 
access API then with the ByteBuffer API. I would expect most of the checks to 
be similar (except for the liveness tests of the segment involved). I do recall 
that the ByteBuffer API does optimize bulk copy for very small buffers (I don't 
recall what the limit is, but it was very very low, like 4 elements or 
something).
   
   In principle, this JVM fix (as per 18) should help too:
   https://bugs.openjdk.java.net/browse/JDK-8269119
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts



gsmiller commented on a change in pull request #585:
URL: https://github.com/apache/lucene/pull/585#discussion_r779026781



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -91,31 +101,36 @@ private final void count(List matchingDocs) 
throws IOException {
 
   private final void countAll(IndexReader reader) throws IOException {
 for (LeafReaderContext context : reader.leaves()) {
-  SortedNumericDocValues dv = 
context.reader().getSortedNumericDocValues(indexFieldName);
-  if (dv == null) {
+  SortedNumericDocValues multiValued =

Review comment:
   I took the liberty to suggest renaming variables here for consistency 
with other faceting implementations that do this and for slightly improved 
readability (IMO anyway).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts



gsmiller commented on pull request #585:
URL: https://github.com/apache/lucene/pull/585#issuecomment-1005949811


   Maybe a very small improvement with this change, but nothing particularly 
impactful. I wonder how often we actually trigger this case in our benchmarks? 
Certainly not as often as with the "count all" cases. I think it's worth making 
this change though for consistency (and I think there's a small improvement 
there anyway).
   
   ```
   TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
  BrowseDayOfYearTaxoFacets   14.53 (10.8%)   14.08 
(11.4%)   -3.1% ( -22% -   21%) 0.378
   BrowseDateTaxoFacets   14.48 (10.7%)   14.07 
(11.2%)   -2.8% ( -22% -   21%) 0.416
   HighIntervalsOrdered3.00  (5.2%)2.94  
(7.0%)   -1.8% ( -13% -   10%) 0.359
  OrNotHighHigh 1044.76  (4.5%) 1028.26  
(3.0%)   -1.6% (  -8% -6%) 0.188
MedIntervalsOrdered   35.82  (4.8%)   35.28  
(6.1%)   -1.5% ( -11% -9%) 0.384
   OrNotHighMed  969.48  (2.7%)  957.92  
(2.3%)   -1.2% (  -5% -3%) 0.129
  OrHighLow 1228.88  (2.8%) 1215.30  
(2.5%)   -1.1% (  -6% -4%) 0.188
  BrowseMonthTaxoFacets   15.15 (10.3%)   14.99 
(10.0%)   -1.1% ( -19% -   21%) 0.740
   OrNotHighLow  864.18  (2.4%)  856.27  
(2.0%)   -0.9% (  -5% -3%) 0.187
LowIntervalsOrdered   14.43  (2.5%)   14.32  
(3.3%)   -0.7% (  -6% -5%) 0.422
BrowseRandomLabelTaxoFacets   12.34  (9.2%)   12.26  
(9.9%)   -0.6% ( -18% -   20%) 0.834
  OrHighNotHigh  877.19  (3.9%)  872.54  
(3.8%)   -0.5% (  -7% -7%) 0.662
 HighPhrase  106.42  (1.7%)  106.06  
(1.4%)   -0.3% (  -3% -2%) 0.498
MedTerm 1930.85  (4.6%) 1924.69  
(3.5%)   -0.3% (  -8% -8%) 0.806
   HighTerm 1383.88  (4.6%) 1380.07  
(4.1%)   -0.3% (  -8% -8%) 0.843
  MedPhrase  744.32  (2.1%)  742.71  
(2.4%)   -0.2% (  -4% -4%) 0.761
LowTerm 3065.99  (3.3%) 3060.87  
(3.8%)   -0.2% (  -7% -7%) 0.882
  LowPhrase   82.62  (1.6%)   82.51  
(1.4%)   -0.1% (  -3% -2%) 0.768
   OrHighNotLow 1094.83  (4.7%) 1094.10  
(3.1%)   -0.1% (  -7% -8%) 0.958
MedSloppyPhrase  104.44  (2.5%)  104.41  
(3.3%)   -0.0% (  -5% -5%) 0.980
  OrHighMed  193.98  (4.4%)  193.96  
(4.6%)   -0.0% (  -8% -9%) 0.994
Respell   65.50  (1.1%)   65.52  
(1.2%)0.0% (  -2% -2%) 0.913
 Fuzzy2   84.12  (1.1%)   84.16  
(1.0%)0.0% (  -2% -2%) 0.901
 AndHighLow 1163.35  (3.5%) 1163.95  
(3.1%)0.1% (  -6% -6%) 0.961
BrowseRandomLabelSSDVFacets9.34  (1.7%)9.35  
(2.9%)0.1% (  -4% -4%) 0.940
  BrowseMonthSSDVFacets   12.91 (13.1%)   12.92 
(13.1%)0.1% ( -23% -   30%) 0.978
   OrHighNotMed 1094.06  (3.9%) 1095.46  
(3.1%)0.1% (  -6% -7%) 0.908
   PKLookup  171.75  (3.0%)  172.12  
(3.3%)0.2% (  -5% -6%) 0.829
Prefix3  365.59  (9.9%)  366.48  
(9.1%)0.2% ( -17% -   21%) 0.936
 Fuzzy1  113.46  (1.2%)  113.82  
(1.2%)0.3% (  -2% -2%) 0.390
   Wildcard   38.65  (8.2%)   38.77  
(7.8%)0.3% ( -14% -   17%) 0.898
   HighSloppyPhrase   11.81  (3.3%)   11.85  
(3.9%)0.3% (  -6% -7%) 0.773
 OrHighHigh   12.04  (4.0%)   12.10  
(4.0%)0.5% (  -7% -8%) 0.713
LowSloppyPhrase7.75  (3.6%)7.79  
(4.3%)0.5% (  -7% -8%) 0.703
LowSpanNear   29.78  (2.5%)   29.96  
(2.4%)0.6% (  -4% -5%) 0.420
  HighTermMonthSort  126.96 (16.0%)  127.81 
(14.3%)0.7% ( -25% -   36%) 0.889
   HighSpanNear   12.43  (3.3%)   12.52  
(3.2%)0.7% (  -5% -7%) 0.488
MedSpanNear   12.64  (2.9%)   12.73  
(2.9%)0.8% (  -4% -6%) 0.402
 AndHighMed   55.03

[GitHub] [lucene] jtibshirani commented on a change in pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.



jtibshirani commented on a change in pull request #583:
URL: https://github.com/apache/lucene/pull/583#discussion_r779026136



##
File path: lucene/core/src/java/org/apache/lucene/codecs/FieldsProducer.java
##
@@ -42,6 +45,14 @@ protected FieldsProducer() {}
*/
   public abstract void checkIntegrity() throws IOException;
 
+  /**
+   * Get the {@link Terms} for this field. The behavior is undefined if the 
field doesn't have

Review comment:
   Got it. The latest changes make sense to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller opened a new pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts

2022-01-05 Thread ASF subversion and git services (Jira)

gsmiller opened a new pull request #585:
URL: https://github.com/apache/lucene/pull/585

# Description

Facet implementations have seen performance improvements by unwrapping
singleton doc values for situations where the underlying field is actually
single-valued. This change adds the optimization for counting in taxonomy
faceting (bringing consistency with the countAll implementation along with SSDV
faceting, etc.).

# Solution

Try unwrapping the `SortedNumericDocValues` as a `NumericDocValues` and use
the single-valued field directly if possible.

# Tests

Existing tests cover this faceting implementation. Ran benchmarks as well.
Saw very marginal improvements and no regressions.

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code
conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request
title.
- [x] I have given Lucene maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the `main` branch.
- [x] I have run `./gradlew check`.
- [ ] I have added tests for my changes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10354) Clarify contract of codec APIs with missing/disabled fields



[ 
https://issues.apache.org/jira/browse/LUCENE-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469457#comment-17469457
 ] 

ASF subversion and git services commented on LUCENE-10354:
--

Commit c8651afde70c62b4a4f5618b9483953bd2bc1bb8 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c8651af ]

LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields. 
(#583)



> Clarify contract of codec APIs with missing/disabled fields
> ---
>
> Key: LUCENE-10354
> URL: https://issues.apache.org/jira/browse/LUCENE-10354
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The question has come up  a few times of how codec APIs should react to 
> fields that are missing or do not have the relevant feature enabled.
> This issue proposes that we improve javadocs and AssertingCodec following the 
> same model as doc values and norms:
>  - The behavior of codec APIs on fields that are missing or don't have the 
> feature enabled is undefined.
>  - CodecReader is responsible for checking FieldInfos before delegating to 
> codec APIs.
>  - AssertingCodec ensures that we never call codec APIs on missing/disabled 
> fields.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.



jpountz merged pull request #583:
URL: https://github.com/apache/lucene/pull/583


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler edited a comment on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005928057


   To conclude here: I was already thinking several times during the module 
system devlopment that it might be a good idea to have some pattern in 
forbidden/errorprone/... that detects if you call a caller-sensitive method 
like those in `AccessController#doPrivileged() / Class#getResourceAsStream() / 
Class#getResource()` or reflective invokes (not MethodHandles) and do that in 
some public/protected method that injects one of the method call parameters 
directly/indirectly into the caller-sensitive method. Because this pattern is 
mostly wrong and a security leak (or kills functionality of your 
public/protected method when used in module system encapsulation).
   
   Example of such a broken method (it is public and injects the `resource` 
parameter into `Class#getResourceAsStream()`, which is caller-sensitive: 
https://github.com/apache/lucene/blob/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/core/src/java/org/apache/lucene/util/IOUtils.java#L193-L212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler edited a comment on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005928057


   To conclude here: I was already thinking several times during the module 
system devlopment that it might be a good idea to have some pattern in 
forbidden/errorprone/... that detects if you call a caller-sensitive method 
like those in `AccessController#doPrivileged() / Class#getResourceAsStream() / 
Class#getResource()` or reflective invokes (not MethodHandles) and do that in 
some public/protected method that injects one of the method call parameters 
directly/indirectly into the caller-sensitive method. Because this pattern is 
mostly wrong and a security leak (or kills functionality of your 
public/protected method when used in module system encapsulation).
   
   Example of such a broken method: 
https://github.com/apache/lucene/blob/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/core/src/java/org/apache/lucene/util/IOUtils.java#L193-L212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005928057


   To conclude here: I was already thinking several times during the module 
system devlopment that it might be a good idea to have some pattern in 
forbidden/errorprone/... that detects if you call a caller-sensitive method 
like those in `AccessController#doPrivileged() / Class#getResourceAsStream() / 
Class#getResource()` or reflective invokes (not MethodHandles) and do that in 
some public method that injects one of the method call parameters 
directly/indirectly into the caller-sensitive method. Because this pattern is 
mostly wrong and a security leak (or kills your public method when used in 
module system encapsulation).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



dweiss commented on pull request #571:
URL: https://github.com/apache/lucene/pull/571#issuecomment-1005918864


   > The depenendecies are always modular, so lucene.core is put on 
module-path, even so we are running tests in classpath mode. This is waht this 
PR mainly changes, correct? reviously it was not fully working unless you 
explicitely declared it.
   
   lucene.core (and any other dependency placed in modular configurations) is 
correctly inserted on module-path if this reference is from "outside" the 
project itself. In other words, the tests within lucene.core run with main 
source set classes on classpath (otherwise you'd have split package errors) but 
anywhere else where you reference lucene.core, it will be placed on module-path.
   
   That debug flag (-Pbuild.debug.paths=true) shows verbosely how classpath and 
module path is configured for each task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



dweiss commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778993450



##
File path: gradle/java/modules.gradle
##
@@ -27,194 +29,167 @@ allprojects {
   modularity.inferModulePath.set(false)
 }
 
-// Map convention configuration names to "modular" corresponding 
configurations.
-Closure moduleConfigurationNameFor = { String configurationName ->
-  return "module" + configurationName.capitalize().replace("Classpath", 
"Path")
-}
-
-//
-// For each source set, create explicit configurations for declaring 
modular dependencies.
-// These "modular" configurations correspond 1:1 to Gradle's conventions 
but have a 'module' prefix
-// and a capitalized remaining part of the conventional name. For example, 
an 'api' configuration in
-// the main source set would have a corresponding 'moduleApi' 
configuration for declaring modular
-// dependencies.
-//
-// Gradle's java plugin "convention" configurations extend from their 
modular counterparts
-// so all dependencies end up on classpath by default for backward 
compatibility with other
-// tasks and gradle infrastructure.
 //
-// At the same time, we also know which dependencies (and their transitive 
graph of dependencies!)
-// should be placed on module-path only.
-//
-// Note that an explicit configuration of modular dependencies also opens 
up the possibility of automatically
-// validating whether the dependency configuration for a gradle project is 
consistent with the information in
-// the module-info descriptor because there is a (nearly?) direct 
correspondence between the two:
-//
-// moduleApi- 'requires transitive'
-// moduleImplementation - 'requires'
-// moduleCompileOnly- 'requires static'
+// Configure modular extensions for each source set.
 //
 project.sourceSets.all { SourceSet sourceSet ->
-  ConfigurationContainer configurations = project.configurations
-
-  // Create modular configurations for convention configurations.
-  Closure createModuleConfigurationForConvention = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = 
configurations.maybeCreate(moduleConfigurationNameFor(configurationName))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(false)
-conventionConfiguration.extendsFrom(moduleConfiguration)
-
-project.logger.info("Created module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration moduleApi = 
createModuleConfigurationForConvention(sourceSet.apiConfigurationName)
-  Configuration moduleImplementation = 
createModuleConfigurationForConvention(sourceSet.implementationConfigurationName)
-  moduleImplementation.extendsFrom(moduleApi)
-  Configuration moduleRuntimeOnly = 
createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName)
-  Configuration moduleCompileOnly = 
createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName)
-  // sourceSet.compileOnlyApiConfigurationName  // This seems like a very 
esoteric use case, leave out.
-
-  // Set up compilation module path configuration combining corresponding 
convention configurations.
-  Closure createResolvableModuleConfiguration = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = configurations.maybeCreate(
-moduleConfigurationNameFor(conventionConfiguration.name))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(true)
-moduleConfiguration.attributes {
-  // Prefer class folders over JARs. The exception is made for tests 
projects which require a composition
-  // of classes and resources, otherwise split into two folders.
-  if (project.name.endsWith(".tests")) {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.JAR))
-  } else {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.CLASSES))
-  }
-}
-
-project.logger.info("Created resolvable module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration compileModulePathConfiguration = 
createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName)
-  compileModulePathConfiguration.extendsFrom(moduleCompileOnly, 
moduleImplementation)
-
-  Configuration runtimeModulePathConfiguration = 
creat

[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler commented on a change in pull request #579:
URL: https://github.com/apache/lucene/pull/579#discussion_r778990326



##
File path: lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
##
@@ -584,9 +585,13 @@ public static long shallowSizeOfInstance(Class clazz) {
   final Class target = clazz;
   final Field[] fields;
   try {
-fields =
+@SuppressWarnings("removal")

Review comment:
   maybe extract method here in same way. Then the extra variable would not 
be needed and suppressforbidden works only on the call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 commented on a change in pull request #545: LUCENE-10319: make ForUtil#BLOCK_SIZE changeable



gf2121 commented on a change in pull request #545:
URL: https://github.com/apache/lucene/pull/545#discussion_r778988719



##
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/ForUtil.java
##
@@ -1051,4 +1052,76 @@ private static void decode24(DataInput in, long[] tmp, 
long[] longs) throws IOEx
   longs[longsIdx + 0] = l0;
 }
   }
+

Review comment:
   Let these codes be generated from the script, so that it can change with 
the change of BLOCK_SIZE.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler edited a comment on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005906362


   > > Thanks for the pointer, Robert. I wonder what the "acceptable level" 
criteria are. ;)
   > 
   > I wonder too, i searched some commonly used java libraries mainline 
branches (`guava`, `log4j2`) and found `AccessController` calls in each. If 
OpenJDK actually remove theses methods anytime soon, it will break probably 
every java app right now. So I'm not worried.
   
   When I met the OpenJDK committers before COVID started in Brussels and this 
was discussed for the first time the statement in the well-known beer-bar was 
"trust me, we won't remove SecurityManager and AccessController before Java 43 
[fictive number]. But we will soon make all AccessController operations noops."
   
   I think the "deprecation for removal" is just to make it more prominent, so 
you build does not just print a warning that is always supressed (`-Xlint 
-deprecation` switch).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler edited a comment on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005906362


   > > Thanks for the pointer, Robert. I wonder what the "acceptable level" 
criteria are. ;)
   > 
   > I wonder too, i searched some commonly used java libraries mainline 
branches (`guava`, `log4j2`) and found `AccessController` calls in each. If 
OpenJDK actually remove theses methods anytime soon, it will break probably 
every java app right now. So I'm not worried.
   
   When I met the OpenJDK committers before COVID started in Brussels and this 
was discussed for the first time the statement in the well-known beer-bar was 
"trust me, we won't remove SecurityManager and AccessController before Java 43 
[fictive number]. But we will soon make all AccessController operations noops."
   
   I think the "deprecation for removal" is just to make it more prominent, so 
you build does not just print a warning that is always supressed ({{-Xlint 
-deprecation}} switch).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 commented on pull request #545: LUCENE-10319: make ForUtil#BLOCK_SIZE changeable



gf2121 commented on pull request #545:
URL: https://github.com/apache/lucene/pull/545#issuecomment-1005906911


   Thanks @jpountz ! This is indeed making codes harder to read. I tried to 
make all these complex constants generated from script, keeping `ForUtil.java` 
clean. How does it look now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005906362


   > > Thanks for the pointer, Robert. I wonder what the "acceptable level" 
criteria are. ;)
   > 
   > I wonder too, i searched some commonly used java libraries mainline 
branches (`guava`, `log4j2`) and found `AccessController` calls in each. If 
OpenJDK actually remove theses methods anytime soon, it will break probably 
every java app right now. So I'm not worried.
   
   When I met the OpenJDK committers before COVID started in Brussels and this 
was discussed for the first time the statement in the well-known beer-bar was 
"trust me, we won't remove SecurityManager and AccessController before Java 
43". But we will soon make all AccessControler operations noops.
   
   I think the "deprecation for removal" is just to make it more prominent, so 
you build does not just print a warning that is always supressed ({{-Xlint 
-deprecation}} switch).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4983) CommonGramsFilter assumes all input tokens have a length of 1



 [ 
https://issues.apache.org/jira/browse/LUCENE-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4983:
--
Labels: random-chains  (was: )

> CommonGramsFilter assumes all input tokens have a length of 1
> -
>
> Key: LUCENE-4983
> URL: https://issues.apache.org/jira/browse/LUCENE-4983
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Major
>  Labels: random-chains
>
> CommonGramsFilter set posLenAttribute to 2 for bi-grams, no matter the length 
> of the input tokens. Here is an example seed that produces a failure:
> {noformat}
> [junit4:junit4]  says Привет! Master seed: 3296009A5B3B7A05
> [junit4:junit4] Executing 1 suite with 1 JVM.
> [junit4:junit4] 
> [junit4:junit4] Started J0 PID(23946@RD-38).
> [junit4:junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
> [junit4:junit4]   2> TEST FAIL: useCharFilter=true text='apuqdgtr wjco  mpc '
> [junit4:junit4]   2> Exception from random analyzer: 
> [junit4:junit4]   2> charfilters=
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, , 
> java.io.StringReader@699f982d)
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@6cbfe887,
>  [])
> [junit4:junit4]   2> tokenizer=
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.core.LetterTokenizer(LUCENE_44, 
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory@4fb3c3d9,
>  
> org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@2b3b2ed8)
> [junit4:junit4]   2> filters=
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.util.ElisionFilter(org.apache.lucene.analysis.ValidatingTokenFilter@0,
>  [iez])
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@3a807d14, 
> org.apache.lucene.analysis.ValidatingTokenFilter@20)
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.commongrams.CommonGramsFilter(LUCENE_44, 
> org.apache.lucene.analysis.ValidatingTokenFilter@37caea, [bbtzjxco, , 
> jafehvlp, kujsm, znpfw, xqfni])
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.bg.BulgarianStemFilter(org.apache.lucene.analysis.ValidatingTokenFilter@6c1927b)
> [junit4:junit4]   2> offsetsAreCorrect=true
> [junit4:junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
> -Dtests.seed=3296009A5B3B7A05 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=ar_YE -Dtests.timezone=Europe/London 
> -Dtests.file.encoding=US-ASCII
> [junit4:junit4] ERROR   14.7s | TestRandomChains.testRandomChains <<<
> [junit4:junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent endOffset at pos=2: 13 vs 8; token=￯[㑮ٯb_
> [junit4:junit4]>  at 
> __randomizedtesting.SeedInfo.seed([3296009A5B3B7A05:F7729FB1C2967C5]:0)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.bg.BulgarianStemFilter.incrementToken(BulgarianStemFilter.java:48)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:635)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447)
> [junit4:junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:944)
> [junit4:junit4]>  at java.lang.Thread.run(Thread.java:679)
> [junit4:junit4]   2> NOTE: test params are: codec=Lucene42: 
> {dummy=PostingsFormat(name=Direct)}, docValues:{}, sim=DefaultSimilarity, 
> locale=ar_YE, timezone=Europe/London
> [junit4:junit4]   2> NOTE: Linux 3.5.0-27-generic amd64/Sun Microsystems Inc. 
> 1.6.0_27 (64-bit)/cpus=2,threads=1,free=96085824,total=223412224
> [junit4:junit4]   2> NOTE: All tests run in this JVM: [TestRandomChains]
> [junit4:junit4] Completed in 16.32s, 1 test, 1 error <<< FAILURES!
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time



 [ 
https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-6121:
--
Labels: random-chains  (was: )

> Fix CachingTokenFilter to propagate reset() the first time
> --
>
> Key: LUCENE-6121
> URL: https://issues.apache.org/jira/browse/LUCENE-6121
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: random-chains
> Fix For: 5.0, 6.0
>
> Attachments: 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch
>
>
> CachingTokenFilter should have been propagating reset() _but only the first 
> time_ and thus you would then use CachingTokenFilter in a more normal way – 
> wrap it and call reset() then increment in a loop, etc., instead of knowing 
> you need to reset() on what it wraps but not this token filter itself. That's 
> weird. It's ab-normal for a TokenFilter to never propagate reset, so every 
> user of CachingTokenFilter to date has worked around this by calling reset() 
> on the underlying input instead of the final wrapping token filter 
> (CachingTokenFilter in this case).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8092) TestRandomChains failure



 [ 
https://issues.apache.org/jira/browse/LUCENE-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-8092:
--
Labels: random-chains  (was: )

> TestRandomChains failure
> 
>
> Key: LUCENE-8092
> URL: https://issues.apache.org/jira/browse/LUCENE-8092
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Priority: Major
>  Labels: random-chains
>
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.2/1/
> ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
> -Dtests.seed=C006DAD2E1FC77AF -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/Users/romseygeek/projects/lucene-test-data/enwiki.random.lines.txt
>  -Dtests.locale=tr -Dtests.timezone=Europe/Simferopol -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> Reproduces locally on 7.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system



 [ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10352:
---
Labels: random-chains  (was: )

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Labels: random-chains
> Fix For: 9.1, 10.0 (main)
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10362) JapaneseNumberFilter messes up offsets



 [ 
https://issues.apache.org/jira/browse/LUCENE-10362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10362:
---
Labels: random-chains  (was: )

> JapaneseNumberFilter messes up offsets
> --
>
> Key: LUCENE-10362
> URL: https://issues.apache.org/jira/browse/LUCENE-10362
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>  Labels: random-chains
>
> It is a tokenfilter, tries to change offsets, so of course TestRandomChains 
> finds bugs in it
> {noformat}
>  2> NOTE: reproduce with: gradlew test --tests 
> TestRandomChains.testRandomChains -Dtests.seed=CE566FFD0024BDB0 
> -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=en-PG 
> -Dtests.timezone=CST -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> {noformat}
> {noformat}
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to 
> /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> stage 0: m<[0-1] +1> mi<[0-2] +1> i<[1-2] +1> iy<[1-3] +1> y<[2-3] +1> 
> yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> k<[4-5] +1> kt<[4-6] +1> t<[5-6] +1> t 
> <[5-7] +1>  <[6-7] +1>  2<[6-8] +1> 2<[7-8] +1> 26<[7-9] +1> 6<[8-9] +1> 
> 64<[8-10] +1>
>   2> stage 1: m<[0-1] +1> mi<[0-2] +1> i<[1-2] +1> iy<[1-3] +1> y<[2-3] +1> 
> yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> k<[4-5] +1> kt<[4-6] +1> t<[5-6] +1> t 
> <[5-7] +1>  <[6-7] +1>  2<[6-8] +1> 2<[7-8] +1> 26<[7-9] +1> 6<[8-9] +1> 
> 64<[8-10] +1>
>   2> stage 2: n<[3-4] +1> nk<[3-5] +1> word<[3-5] +0> k<[4-5] +1> 
> word<[4-5] +0> kt<[4-6] +1> word<[4-6] +0> t<[5-6] +1> 
> word<[5-6] +0> t <[5-7] +1>  <[6-7] +1>  2<[6-8] +1> 
> word<[6-8] +0> 2<[7-8] +1> word<[7-8] +0> 26<[7-9] +1> 
> word<[7-9] +0> 6<[8-9] +1> 64<[8-10] +1> word<[8-10] +0>
>   2> last stage: yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> word<[3-5] 
> +0> k<[4-5] +1> word<[4-5] +0> kt<[4-6] +1> word<[4-6] 
> +0> t<[5-6] +1> word<[5-6] +0> t <[5-7] +1>  <[6-7] +1>  2<[6-8] 
> +1> word<[6-8] +0> 2<[7-8] +1> word<[7-8] +0> 26<[7-9] 
> +1> word<[7-9] +0> 6<[8-9] +1> word<[8-10] +0>
>   2> TEST FAIL: useCharFilter=false text='miynkt 264957329&#'
>   2> Exception from random analyzer:
>   2> charfilters=
>   2> tokenizer=
>   2>   org.apache.lucene.analysis.ngram.NGramTokenizer()
>   2> filters=
>   2>   
> Conditional:org.apache.lucene.analysis.icu.ICUNormalizer2Filter(OneTimeWrapper@3b5fdc7f
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  com.ibm.icu.impl.Norm2AllModes$ComposeNormalizer2@5ef6381c)
>   2>   
> Conditional:org.apache.lucene.analysis.miscellaneous.TypeAsSynonymFilter(OneTimeWrapper@3e803db2
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,
>  )
>   2>   
> Conditional:org.apache.lucene.analysis.ja.JapaneseNumberFilter(OneTimeWrapper@20de0223
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,keyword=false)
>> java.lang.IllegalStateException: last stage: inconsistent endOffset 
> at pos=17: 9 vs 10; token=word
>> at 
> __randomizedtesting.SeedInfo.seed([CE566FFD0024BDB0:F3B7469C4736A070]:0)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:164)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10363) JapaneseCompletionFilter messes up offsets



 [ 
https://issues.apache.org/jira/browse/LUCENE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10363:
---
Labels: random-chains  (was: )

> JapaneseCompletionFilter messes up offsets
> --
>
> Key: LUCENE-10363
> URL: https://issues.apache.org/jira/browse/LUCENE-10363
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>  Labels: random-chains
>
> It is a tokenfilter, tries to change offsets, so of course TestRandomChains 
> finds bugs in it:
> {noformat}
> NOTE: reproduce with: gradlew test --tests 
> TestRandomChains.testRandomChainsWithLargeStrings 
> -Dtests.seed=E233A5FAC016E02 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.locale=en-TV -Dtests.timezone=Asia/Saigon -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {noformat}
> {noformat}
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to 
> /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_54/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> stage 0: lk<[1-3] +1> p<[6-7] +1> ngtoixtmldzsjz<[10-24] +1> uoq<[25-28] 
> +1> HANGUL<[28-28] +1> o<[29-30] +1> HANGUL<[31-31] +1> VulliPHsZzn<[32-43] 
> +1>
>   2> stage 1: lk<[1-3] +1> 85<[1-3] +0> p<[6-7] +1> 70<[6-7] +0> 
> ngtoixtmldzsjz<[10-24] +1> 653543<[10-24] +0> uoq<[25-28] +1> 05<[25-28] 
> +0> HANGUL<[28-28] +1> 565800<[28-28] +0> o<[29-30] +1> 00<[29-30] +0> 
> HANGUL<[31-31] +1> 565800<[31-31] +0> VulliPHsZzn<[32-43] +1> 787460<[32-43] 
> +0>
>   2> stage 2: ngtoixtmldzsjz 653543<[10-24] +0> 653543<[10-24] +1> 653543 
> uoq<[10-28] +0> uoq<[25-28] +1> uoq 05<[25-28] +0> 05<[25-28] +1> 
> 05 HANGUL<[25-28] +0> HANGUL<[28-28] +1> HANGUL 565800<[28-28] +0> 
> 565800<[28-28] +1> 565800 o<[28-30] +0> o<[29-30] +1> o 00<[29-30] +0> 
> 00<[29-30] +1> 00 HANGUL<[29-31] +0> HANGUL<[31-31] +1> HANGUL 
> 565800<[31-31] +0> 565800<[31-31] +1> 565800 VulliPHsZzn<[31-43] +0> 
> VulliPHsZzn<[32-43] +1>
>   2> last stage: ngtoixtmldzsjz<[10-24] +1> ngtoixtmldzsjz 653543<[10-24] +0> 
> 653543<[10-24] +1> 653543 uoq<[10-28] +0> uoq<[25-28] +1> uoq 05<[25-28] 
> +1> 05<[25-28] +1> 05 HANGUL<[25-28] +1> HANGUL<[28-28] +1> HANGUL 
> 565800<[28-28] +0> 565800<[28-28] +1> 565800 o<[28-30] +0> o<[29-30] +1> o 
> 00<[29-30] +0> 00<[29-30] +1> 00 HANGUL<[29-31] +0> 
> HANGUL<[31-31] +1> HANGUL 565800<[31-31] +1> 565800<[31-31] +1> 565800 
> VulliPHsZzn<[31-43] +0>
>   2> TEST FAIL: useCharFilter=true text='[lk[-.p|) ngtoixtmldzsjz uoqao 
> aVulliPHsZzn wxsk'
>   2> Exception from random analyzer:
>   2> charfilters=
>   2>   org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, 
> , java.io.StringReader@5b3b54eb)
>   2> tokenizer=
>   2>   
> org.apache.lucene.analysis.classic.ClassicTokenizer(org.apache.lucene.util.AttributeFactory$1@e29311e9)
>   2> filters=
>   2>   
> org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(ValidatingTokenFilter@32a6de77
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  true)
>   2>   
> org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@3d044414
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  q)
>   2>   
> Conditional:org.apache.lucene.analysis.ja.JapaneseCompletionFilter(OneTimeWrapper@435207ec
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,reading=null,reading
>  (en)=null,pronunciation=null,pronunciation (en)=null, INDEX)
>> java.lang.IllegalStateException: last stage: inconsistent endOffset 
> at pos=19: 31 vs 43; token=565800 VulliPHsZzn
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10361) KoreanNumberFilter messes up offsets



 [ 
https://issues.apache.org/jira/browse/LUCENE-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10361:
---
Labels: random-chains  (was: )

> KoreanNumberFilter messes up offsets
> 
>
> Key: LUCENE-10361
> URL: https://issues.apache.org/jira/browse/LUCENE-10361
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>  Labels: random-chains
>
> It is a tokenfilter, tries to change offsets, so of course TestRandomChains 
> finds bugs in it:
> {noformat}
> NOTE: reproduce with: gradlew test --tests TestRandomChains.testRandomChains 
> -Dtests.seed=12BC606B774693E4 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.locale=om-Latn-ET -Dtests.timezone=Australia/Yancowinna 
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> {noformat}
> {noformat}
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to 
> /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> stage 0: 뱅<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 履<[6-7] +1> jEqyzUT<[8-15] 
> +1>
>   2> stage 1: 00<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 00<[6-7] +1> 
> 154300<[8-15] +1> 454300<[8-15] +0>
>   2> last stage: 0<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 00<[6-7] +1> 
> 454300<[8-15] +0>
>   2> TEST FAIL: useCharFilter=false 
> text='\ubc45\u0191(\u0117\ud8ad\udf0a\uf9df jEqyzUT '
>   2> Exception from random analyzer:
>   2> charfilters=
>   2>   
> org.apache.lucene.analysis.cjk.CJKWidthCharFilter(java.io.StringReader@17af5384)
>   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@33e5bdbb,
>  org.apache.lucene.analysis.cjk.CJKWidthCharFilter@1aafd271)
>   2> tokenizer=
>   2>   
> org.apache.lucene.analysis.icu.segmentation.ICUTokenizer(org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig@4e6f4690)
>   2> filters=
>   2>   
> Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@34215eb7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common,
>  false)
>   2>   
> org.apache.lucene.analysis.ko.KoreanNumberFilter(ValidatingTokenFilter@7b4a2a5b
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common,keyword=false)
>> java.lang.IllegalStateException: last stage: inconsistent 
> startOffset at pos=3: 6 vs 8; token=454300
>> at 
> __randomizedtesting.SeedInfo.seed([12BC606B774693E4:2F5D490A30548E24]:0)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922)
>> at 
> org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains(TestRandomChains.java:915)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10360) BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term text



 [ 
https://issues.apache.org/jira/browse/LUCENE-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10360:
---
Labels: random-chains  (was: )

> BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term 
> text
> --
>
> Key: LUCENE-10360
> URL: https://issues.apache.org/jira/browse/LUCENE-10360
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Priority: Major
>  Labels: random-chains
>
> Error seen:
> {noformat}
>   2> TEST FAIL: useCharFilter=true text='Uf?F ?wlu{0

[jira] [Updated] (LUCENE-10353) Add null injection to analyzer integration tests (e.g. TestRandomChains)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10353:
---
Labels: random-chains  (was: )

> Add null injection to analyzer integration tests (e.g. TestRandomChains)
> 
>
> Key: LUCENE-10353
> URL: https://issues.apache.org/jira/browse/LUCENE-10353
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Uwe Schindler
>Priority: Major
>  Labels: random-chains
>
> These tests inject random parameter values (from argumentProviders). Some 
> generated values may be illegal and IllegalArgumentException is "allowed" if 
> the constructor returns it. None of the values should cause failures at 
> runtime.
> But for object types, we never inject null values (unless the 
> argumentProvider were to do it itself). We should do this some low % of the 
> time, and "allow" ctors to return NPE too.
> I see bugs in some of the analyzers where they are just a missing null check 
> in the constructor. It is important to fail on invalid configuration up-front 
> in the ctor, rather than failing e.g. at index time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10358) JapaneseIterationMarkCharFilter: TestRandomChains fails with incorrect offsets or causes IndexOutOfBounds



 [ 
https://issues.apache.org/jira/browse/LUCENE-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10358:
---
Labels: random-chains  (was: )

> JapaneseIterationMarkCharFilter: TestRandomChains fails with incorrect 
> offsets or causes IndexOutOfBounds
> -
>
> Key: LUCENE-10358
> URL: https://issues.apache.org/jira/browse/LUCENE-10358
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Priority: Major
>  Labels: random-chains
>
> Failures seen:
> {noformat}
> $ gradlew :lucene:analysis:integration.tests:test --tests 
> TestRandomChains.testRandomChainsWithLargeStrings 
> -Dtests.seed=AA632771CC823702 -Dtests.slow=true -Dtests.locale=fr-MF 
> -Dtests.timezone=America/Panama -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to C:\Users\Uwe 
> Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> stage 0: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> ppkarrpf<[7-14] +1> 1<[16-17] +1> 
> 5<[18-19] +1>
>   2> stage 1: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] 
> +1> 759700<[7-14] +0> 1<[16-17] +1> 5<[18-19] +1> 00<[18-19] +0>
>   2> stage 2: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] 
> +1> 759700<[7-14] +0> 1<[16-17] +1> 00<[18-19] +0>
>   2> TEST FAIL: useCharFilter=true text='\ud801\udc96\ud801\udcaa\ud801\udc84 
> ppkarpf {1,5}g?)u em mbm hbil'
>   2> Exception from random analyzer:
>   2> charfilters=
>   2>   
> org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@105e6aa7,
>  true, false)
>   2> tokenizer=
>   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>   2> filters=
>   2>   
> Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@79889b7f
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  true)
>   2>   
> org.apache.lucene.analysis.ja.JapaneseNumberFilter(ValidatingTokenFilter@53a9e96c
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>   2>   
> org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter(ValidatingTokenFilter@6cb4578d
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false,
>  
> org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter$StemmerOverrideMap@51fc8124)
>> java.lang.IllegalStateException: stage 2: inconsistent startOffset 
> at pos=3: 16 vs 18; token=00
>> at 
> __randomizedtesting.SeedInfo.seed([AA632771CC823702:C038986095CC17F1]:0)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138)
>> at 
> org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.incrementToken(StemmerOverrideFilter.java:67)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922)
>> at 
> org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943)
> {noformat}
> and also:
> {noformat}
> $ gradlew :lucene:analysis:integration.tests:test --tests 
> TestRandomChains.testRandomChains -Dtests.seed=3A0D0E91E0CA5BFC 
> -Dtests.slow=true -Dtests.locale=nmg-CM -Dtests.timezone=Antarctica/Vostok 
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to C:\Users\Uwe 
> Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test_17\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> TEST FAIL: useCharFilter=false text=''
>   2> Except

[jira] [Updated] (LUCENE-10359) KoreanTokenizer: TestRandomChains fails with incorrect offsets



 [ 
https://issues.apache.org/jira/browse/LUCENE-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10359:
---
Labels: random-chains  (was: )

> KoreanTokenizer: TestRandomChains fails with incorrect offsets
> --
>
> Key: LUCENE-10359
> URL: https://issues.apache.org/jira/browse/LUCENE-10359
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Priority: Major
>  Labels: random-chains
>
> It looks like KoreanTokenizer is causing this (NORI), but Kuromoji may be 
> affected in the same way:
> {noformat}
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to C:\Users\Uwe 
> Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> stage 0: e<[2-3] +1> ek<[4-6] +1> oy<[8-10] +1> 1<[11-12] +1> 
> zzkuxp<[13-19] +1>
>   2> stage 1: e<[2-3] +1> ek<[4-6] +1> oy<[8-10] +1> 1<[11-12] +1> 
> zzkuxp<[13-19] +1>
>   2> stage 2: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> 
> oy<[8-10] +1> oy 1<[8-12] +0> 1<[11-12] +1> 1 zzkuxp<[11-19] +0>
>   2> stage 3: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> 
> oy<[8-10] +1> oy 1<[8-12] +0> 1<[11-12] +1> 1 zzkuxp<[11-19] +0>
>   2> last stage: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> 
> oy<[8-10] +1> oy 1<[8-12] +0> 1 zzkuxp<[11-19] +0>
>   2> TEST FAIL: useCharFilter=false text='?.e|ek|]oy{1 zzkuxp ZyzzV ycuqjnv 
> axtpppvk \u233b\u23c8\u2314\u232e\u236e\u238d\u235e x d  \"'
>   2> Exception from random analyzer:
>   2> charfilters=
>   2>   org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, 
> ifywufhi, java.io.StringReader@48586999)
>   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@65036838,
>  org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@11d4ba35)
>   2> tokenizer=
>   2>   org.apache.lucene.analysis.ko.KoreanTokenizer()
>   2> filters=
>   2>   
> org.apache.lucene.analysis.en.KStemFilter(ValidatingTokenFilter@595d7938 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false)
>   2>   
> org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@13d08b48
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false,
>  u)
>   2>   
> org.apache.lucene.analysis.util.ElisionFilter(ValidatingTokenFilter@6396b917 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false,
>  [fh, hiiwwxyyd, fcpodqor, qogvhmywr, l, icad])
>   2>   
> Conditional:org.apache.lucene.analysis.ko.KoreanNumberFilter(OneTimeWrapper@5f0558f6
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false)
>> java.lang.IllegalStateException: last stage: inconsistent 
> startOffset at pos=2: 8 vs 11; token=1 zzkuxp
>> at 
> __randomizedtesting.SeedInfo.seed([E4552C7844FC2DA3:8E0E93691DB20D50]:0)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028)
>> at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922)
>> at 
> org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue



magibney commented on pull request #380:
URL: https://github.com/apache/lucene/pull/380#issuecomment-1005887111


   Apologies for the delay, and thanks for bearing with me, @spyk. I'm inclined 
to err on the cautious side with this, since I'm not as familiar with this part 
of the codebase or the OpenNLP community. That said, this seems really 
straightforward to me, and clearly an improvement. (I considered, but decided 
against, suggesting to adopt `computeIfAbsent()` in place of `cached = 
map.get(...); if (cached==null) map.put(...)` ... the former is "newer" and 
semantically clearer, but the latter is more idiomatic to this part of the 
codebase).
   
   The only thing giving me pause now is that I notice we're changing the 
return type of a _public_ method. If there are third-party extensions that rely 
on the existing return type of this method, they will break. An easy fix, but 
still ... I'm additionally chastened to see that the issue introducing this 
code, [LUCENE-2899](https://issues.apache.org/jira/browse/LUCENE-2899), has 36 
"votes" and 68 "watchers" (!) -- so the chance of this being a breaking change 
for some third party extension are not insignificant (FWIW I'd be surprised if 
third parties actually called this method in practice, but I'm not sure I have 
the perspective to judge :slightly_smiling_face:)
   
   I dislike the idea of maintaining backward compatibility "just because", 
when this seems like such a clear improvement, and when I suspect that the 
`public` access for these static methods may not necessarily represent an 
explicit design choice (?); and with the 9.0 release still quite fresh, 
arguably now would not be the worst time to break backcompat (esp. in such a 
minor way). But I'm afraid I really would like another committer (ideally, 
@sarowe?) to weigh in on this. Thank you for your patience!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene

2022-01-05 Thread Cameron VandenBerg (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469420#comment-17469420
 ] 

Cameron VandenBerg commented on LUCENE-10157:
-

Hi Adrien, it makes sense to me why you would like to move Indri code to the 
sandbox, but I am still hesitant about moving the smoothingScore API because 
that required changes outside of core.  I am worried that the smoothingScore 
changes will be lost, which is the building block to a lot of functionality in 
lucene.

 

Is there anything that I could do to help keep the smoothingScore in core?  I 
am happy to submit a new smaller PR that simply fixes the IndriAndScorer and 
adds additional tests.  I am open to suggestions and happy to work with you.

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #536: Don't store graph offsets for HNSW graph



msokolov commented on pull request #536:
URL: https://github.com/apache/lucene/pull/536#issuecomment-1005868557


   Thanks for the thorough testing, @mayya-sharipova. I think we want to 
minimize heap usage, the index size cost is small; basically we are trading off 
on-heap for on-disk/off-heap, which is always a tradeoff we like. The search 
time change seems like noise? So +1 from me.
   
   Also, glad to see the fanout numbers are sane :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test



jpountz commented on pull request #444:
URL: https://github.com/apache/lucene/pull/444#issuecomment-1005865609


   Correct, changes should no longer be backported to `branch_8x`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778933921



##
File path: gradle/java/modules.gradle
##
@@ -27,194 +29,167 @@ allprojects {
   modularity.inferModulePath.set(false)
 }
 
-// Map convention configuration names to "modular" corresponding 
configurations.
-Closure moduleConfigurationNameFor = { String configurationName ->
-  return "module" + configurationName.capitalize().replace("Classpath", 
"Path")
-}
-
-//
-// For each source set, create explicit configurations for declaring 
modular dependencies.
-// These "modular" configurations correspond 1:1 to Gradle's conventions 
but have a 'module' prefix
-// and a capitalized remaining part of the conventional name. For example, 
an 'api' configuration in
-// the main source set would have a corresponding 'moduleApi' 
configuration for declaring modular
-// dependencies.
-//
-// Gradle's java plugin "convention" configurations extend from their 
modular counterparts
-// so all dependencies end up on classpath by default for backward 
compatibility with other
-// tasks and gradle infrastructure.
 //
-// At the same time, we also know which dependencies (and their transitive 
graph of dependencies!)
-// should be placed on module-path only.
-//
-// Note that an explicit configuration of modular dependencies also opens 
up the possibility of automatically
-// validating whether the dependency configuration for a gradle project is 
consistent with the information in
-// the module-info descriptor because there is a (nearly?) direct 
correspondence between the two:
-//
-// moduleApi- 'requires transitive'
-// moduleImplementation - 'requires'
-// moduleCompileOnly- 'requires static'
+// Configure modular extensions for each source set.
 //
 project.sourceSets.all { SourceSet sourceSet ->
-  ConfigurationContainer configurations = project.configurations
-
-  // Create modular configurations for convention configurations.
-  Closure createModuleConfigurationForConvention = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = 
configurations.maybeCreate(moduleConfigurationNameFor(configurationName))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(false)
-conventionConfiguration.extendsFrom(moduleConfiguration)
-
-project.logger.info("Created module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration moduleApi = 
createModuleConfigurationForConvention(sourceSet.apiConfigurationName)
-  Configuration moduleImplementation = 
createModuleConfigurationForConvention(sourceSet.implementationConfigurationName)
-  moduleImplementation.extendsFrom(moduleApi)
-  Configuration moduleRuntimeOnly = 
createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName)
-  Configuration moduleCompileOnly = 
createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName)
-  // sourceSet.compileOnlyApiConfigurationName  // This seems like a very 
esoteric use case, leave out.
-
-  // Set up compilation module path configuration combining corresponding 
convention configurations.
-  Closure createResolvableModuleConfiguration = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = configurations.maybeCreate(
-moduleConfigurationNameFor(conventionConfiguration.name))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(true)
-moduleConfiguration.attributes {
-  // Prefer class folders over JARs. The exception is made for tests 
projects which require a composition
-  // of classes and resources, otherwise split into two folders.
-  if (project.name.endsWith(".tests")) {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.JAR))
-  } else {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.CLASSES))
-  }
-}
-
-project.logger.info("Created resolvable module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration compileModulePathConfiguration = 
createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName)
-  compileModulePathConfiguration.extendsFrom(moduleCompileOnly, 
moduleImplementation)
-
-  Configuration runtimeModulePathConfiguration = 
c

[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene



[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469399#comment-17469399
 ] 

Adrien Grand commented on LUCENE-10157:
---

Hi Cameron, the sandbox is actually part of Lucene 
(https://github.com/apache/lucene/tree/main/lucene/sandbox), it is just a 
different jar from lucene-core, just like the query parser you added under 
lucene/queryparser would have been released in a different JAR 
(lucene-queryparser-\{version}.jar).

 I'm not suggestion we do not add it to Lucene, just in a less intrusive way. 
And having it in the sandbox will leave us more time to think more about how 
this could be integrated with dynamic pruning, two-phase iteration, the 
similarity API, etc.

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778925282



##
File path: gradle/documentation/render-javadoc.gradle
##
@@ -57,7 +57,7 @@ allprojects {
   outputDir = project.javadoc.destinationDir
 }
 
-if (project.path == ':lucene:luke' || project.path.endsWith(".tests")) {
+if (project.path == ':lucene:luke' || !(project in 
rootProject.ext.mavenProjects)) {

Review comment:
   Ah it is inverse. All projects that will land in Maven central. Yeah 
thats a better check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778931152



##
File path: gradle/java/modules.gradle
##
@@ -27,194 +29,167 @@ allprojects {
   modularity.inferModulePath.set(false)
 }
 
-// Map convention configuration names to "modular" corresponding 
configurations.
-Closure moduleConfigurationNameFor = { String configurationName ->
-  return "module" + configurationName.capitalize().replace("Classpath", 
"Path")
-}
-
-//
-// For each source set, create explicit configurations for declaring 
modular dependencies.
-// These "modular" configurations correspond 1:1 to Gradle's conventions 
but have a 'module' prefix
-// and a capitalized remaining part of the conventional name. For example, 
an 'api' configuration in
-// the main source set would have a corresponding 'moduleApi' 
configuration for declaring modular
-// dependencies.
-//
-// Gradle's java plugin "convention" configurations extend from their 
modular counterparts
-// so all dependencies end up on classpath by default for backward 
compatibility with other
-// tasks and gradle infrastructure.
 //
-// At the same time, we also know which dependencies (and their transitive 
graph of dependencies!)
-// should be placed on module-path only.
-//
-// Note that an explicit configuration of modular dependencies also opens 
up the possibility of automatically
-// validating whether the dependency configuration for a gradle project is 
consistent with the information in
-// the module-info descriptor because there is a (nearly?) direct 
correspondence between the two:
-//
-// moduleApi- 'requires transitive'
-// moduleImplementation - 'requires'
-// moduleCompileOnly- 'requires static'
+// Configure modular extensions for each source set.
 //
 project.sourceSets.all { SourceSet sourceSet ->
-  ConfigurationContainer configurations = project.configurations
-
-  // Create modular configurations for convention configurations.
-  Closure createModuleConfigurationForConvention = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = 
configurations.maybeCreate(moduleConfigurationNameFor(configurationName))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(false)
-conventionConfiguration.extendsFrom(moduleConfiguration)
-
-project.logger.info("Created module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration moduleApi = 
createModuleConfigurationForConvention(sourceSet.apiConfigurationName)
-  Configuration moduleImplementation = 
createModuleConfigurationForConvention(sourceSet.implementationConfigurationName)
-  moduleImplementation.extendsFrom(moduleApi)
-  Configuration moduleRuntimeOnly = 
createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName)
-  Configuration moduleCompileOnly = 
createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName)
-  // sourceSet.compileOnlyApiConfigurationName  // This seems like a very 
esoteric use case, leave out.
-
-  // Set up compilation module path configuration combining corresponding 
convention configurations.
-  Closure createResolvableModuleConfiguration = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = configurations.maybeCreate(
-moduleConfigurationNameFor(conventionConfiguration.name))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(true)
-moduleConfiguration.attributes {
-  // Prefer class folders over JARs. The exception is made for tests 
projects which require a composition
-  // of classes and resources, otherwise split into two folders.
-  if (project.name.endsWith(".tests")) {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.JAR))
-  } else {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.CLASSES))
-  }
-}
-
-project.logger.info("Created resolvable module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration compileModulePathConfiguration = 
createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName)
-  compileModulePathConfiguration.extendsFrom(moduleCompileOnly, 
moduleImplementation)
-
-  Configuration runtimeModulePathConfiguration = 
c

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778930571



##
File path: gradle/java/modules.gradle
##
@@ -27,194 +29,167 @@ allprojects {
   modularity.inferModulePath.set(false)
 }
 
-// Map convention configuration names to "modular" corresponding 
configurations.
-Closure moduleConfigurationNameFor = { String configurationName ->
-  return "module" + configurationName.capitalize().replace("Classpath", 
"Path")
-}
-
-//
-// For each source set, create explicit configurations for declaring 
modular dependencies.
-// These "modular" configurations correspond 1:1 to Gradle's conventions 
but have a 'module' prefix
-// and a capitalized remaining part of the conventional name. For example, 
an 'api' configuration in
-// the main source set would have a corresponding 'moduleApi' 
configuration for declaring modular
-// dependencies.
-//
-// Gradle's java plugin "convention" configurations extend from their 
modular counterparts
-// so all dependencies end up on classpath by default for backward 
compatibility with other
-// tasks and gradle infrastructure.
 //
-// At the same time, we also know which dependencies (and their transitive 
graph of dependencies!)
-// should be placed on module-path only.
-//
-// Note that an explicit configuration of modular dependencies also opens 
up the possibility of automatically
-// validating whether the dependency configuration for a gradle project is 
consistent with the information in
-// the module-info descriptor because there is a (nearly?) direct 
correspondence between the two:
-//
-// moduleApi- 'requires transitive'
-// moduleImplementation - 'requires'
-// moduleCompileOnly- 'requires static'
+// Configure modular extensions for each source set.
 //
 project.sourceSets.all { SourceSet sourceSet ->
-  ConfigurationContainer configurations = project.configurations
-
-  // Create modular configurations for convention configurations.
-  Closure createModuleConfigurationForConvention = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = 
configurations.maybeCreate(moduleConfigurationNameFor(configurationName))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(false)
-conventionConfiguration.extendsFrom(moduleConfiguration)
-
-project.logger.info("Created module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration moduleApi = 
createModuleConfigurationForConvention(sourceSet.apiConfigurationName)
-  Configuration moduleImplementation = 
createModuleConfigurationForConvention(sourceSet.implementationConfigurationName)
-  moduleImplementation.extendsFrom(moduleApi)
-  Configuration moduleRuntimeOnly = 
createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName)
-  Configuration moduleCompileOnly = 
createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName)
-  // sourceSet.compileOnlyApiConfigurationName  // This seems like a very 
esoteric use case, leave out.
-
-  // Set up compilation module path configuration combining corresponding 
convention configurations.
-  Closure createResolvableModuleConfiguration = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = configurations.maybeCreate(
-moduleConfigurationNameFor(conventionConfiguration.name))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(true)
-moduleConfiguration.attributes {
-  // Prefer class folders over JARs. The exception is made for tests 
projects which require a composition
-  // of classes and resources, otherwise split into two folders.
-  if (project.name.endsWith(".tests")) {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.JAR))
-  } else {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.CLASSES))
-  }
-}
-
-project.logger.info("Created resolvable module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration compileModulePathConfiguration = 
createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName)
-  compileModulePathConfiguration.extendsFrom(moduleCompileOnly, 
moduleImplementation)
-
-  Configuration runtimeModulePathConfiguration = 
c

[GitHub] [lucene] mayya-sharipova commented on pull request #536: Don't store graph offsets for HNSW graph



mayya-sharipova commented on pull request #536:
URL: https://github.com/apache/lucene/pull/536#issuecomment-1005844307


   I've also run the comparison on a bigger dataset: deep-image-96-angular of 
10M docs.
   M: 16; efConstruction: 500
   
   Disk size before the change: 4.2G; after change: 4.3G => 2% increase
   Not much affect on search performance:
   
   | | baseline recall | baseline QPS | candidate recall | 
candidate QPS |
   | --- | --: | ---: | ---: | 
: |
   | n_cands=10  |   0.726 | 1527.894 |0.728 |   
870.721 |
   | n_cands=20  |   0.793 | 1350.206 |0.794 |  
1364.301 |
   | n_cands=40  |   0.862 | 1053.906 |0.862 |  
1068.798 |
   | n_cands=80  |   0.917 |  737.711 |0.918 |   
741.551 |
   | n_cands=120 |   0.942 |  573.783 |0.942 |   
589.756 |
   | n_cands=200 |   0.964 |  402.166 |0.964 |   
414.730 |
   | n_cands=400 |   0.982 |  237.545 |0.982 |   
251.678 |
   | n_cands=600 |   0.988 |  174.223 |0.988 |   
177.968 |
   | n_cands=800 |   0.991 |  137.420 |0.991 |   
143.290 |
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778926381



##
File path: gradle/documentation/render-javadoc.gradle
##
@@ -57,7 +57,7 @@ allprojects {
   outputDir = project.javadoc.destinationDir
 }
 
-if (project.path == ':lucene:luke' || project.path.endsWith(".tests")) {
+if (project.path == ':lucene:luke' || !(project in 
rootProject.ext.mavenProjects)) {

Review comment:
   by the way theres also the better operator `project !in 
rootProject.ext.mavenProjects`
   
   I used is elsewhere already.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778924772



##
File path: gradle/documentation/render-javadoc.gradle
##
@@ -57,7 +57,7 @@ allprojects {
   outputDir = project.javadoc.destinationDir
 }
 
-if (project.path == ':lucene:luke' || project.path.endsWith(".tests")) {
+if (project.path == ':lucene:luke' || !(project in 
rootProject.ext.mavenProjects)) {

Review comment:
   What is this `mavenProjects`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on pull request #571:
URL: https://github.com/apache/lucene/pull/571#issuecomment-1005839073


   I think you have to solve the conflicts caused by the change for running 
"gradlew beast". I leave that up to you. Maybe it works better after this 
branch is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778922786



##
File path: gradle/java/modules.gradle
##
@@ -27,194 +29,167 @@ allprojects {
   modularity.inferModulePath.set(false)
 }
 
-// Map convention configuration names to "modular" corresponding 
configurations.
-Closure moduleConfigurationNameFor = { String configurationName ->
-  return "module" + configurationName.capitalize().replace("Classpath", 
"Path")
-}
-
-//
-// For each source set, create explicit configurations for declaring 
modular dependencies.
-// These "modular" configurations correspond 1:1 to Gradle's conventions 
but have a 'module' prefix
-// and a capitalized remaining part of the conventional name. For example, 
an 'api' configuration in
-// the main source set would have a corresponding 'moduleApi' 
configuration for declaring modular
-// dependencies.
-//
-// Gradle's java plugin "convention" configurations extend from their 
modular counterparts
-// so all dependencies end up on classpath by default for backward 
compatibility with other
-// tasks and gradle infrastructure.
 //
-// At the same time, we also know which dependencies (and their transitive 
graph of dependencies!)
-// should be placed on module-path only.
-//
-// Note that an explicit configuration of modular dependencies also opens 
up the possibility of automatically
-// validating whether the dependency configuration for a gradle project is 
consistent with the information in
-// the module-info descriptor because there is a (nearly?) direct 
correspondence between the two:
-//
-// moduleApi- 'requires transitive'
-// moduleImplementation - 'requires'
-// moduleCompileOnly- 'requires static'
+// Configure modular extensions for each source set.
 //
 project.sourceSets.all { SourceSet sourceSet ->
-  ConfigurationContainer configurations = project.configurations
-
-  // Create modular configurations for convention configurations.
-  Closure createModuleConfigurationForConvention = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = 
configurations.maybeCreate(moduleConfigurationNameFor(configurationName))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(false)
-conventionConfiguration.extendsFrom(moduleConfiguration)
-
-project.logger.info("Created module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration moduleApi = 
createModuleConfigurationForConvention(sourceSet.apiConfigurationName)
-  Configuration moduleImplementation = 
createModuleConfigurationForConvention(sourceSet.implementationConfigurationName)
-  moduleImplementation.extendsFrom(moduleApi)
-  Configuration moduleRuntimeOnly = 
createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName)
-  Configuration moduleCompileOnly = 
createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName)
-  // sourceSet.compileOnlyApiConfigurationName  // This seems like a very 
esoteric use case, leave out.
-
-  // Set up compilation module path configuration combining corresponding 
convention configurations.
-  Closure createResolvableModuleConfiguration = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = configurations.maybeCreate(
-moduleConfigurationNameFor(conventionConfiguration.name))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(true)
-moduleConfiguration.attributes {
-  // Prefer class folders over JARs. The exception is made for tests 
projects which require a composition
-  // of classes and resources, otherwise split into two folders.
-  if (project.name.endsWith(".tests")) {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.JAR))
-  } else {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.CLASSES))
-  }
-}
-
-project.logger.info("Created resolvable module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration compileModulePathConfiguration = 
createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName)
-  compileModulePathConfiguration.extendsFrom(moduleCompileOnly, 
moduleImplementation)
-
-  Configuration runtimeModulePathConfiguration = 
c

[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene

2022-01-05 Thread Cameron VandenBerg (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469393#comment-17469393
 ] 

Cameron VandenBerg commented on LUCENE-10157:
-

Hi [~jpountz], I would really love to be able to keep these changes in lucene 
if possible.  I am very happy to write more tests and make any changes you feel 
are necessary.  I am free to work on this right now and can do quick 
turnarounds.  I have worked a lot more with the lucene testing framework now, 
and I feel that I can do a good job showing that the smoothingScore API does 
work.

 

The reason I am hopeful that we can keep the smoothingScore is that it is 
important to our reasearch.  I am actually actively using the smoothingScore 
API in our research at Carnegie Mellon University for creating a new search 
dataset.  I do have it working in my project because I have some additional 
functionality that I have not committed to lucene yet because I was trying to 
minimize the scope of my first PR. 

 

Thank you for your time!  Let me know what I can do to help keep the 
smoothingScore functionality in the lucene API.

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong

2022-01-05 Thread ASF subversion and git services (Jira)



uschindler commented on a change in pull request #571:
URL: https://github.com/apache/lucene/pull/571#discussion_r778921454



##
File path: gradle/java/modules.gradle
##
@@ -27,194 +29,167 @@ allprojects {
   modularity.inferModulePath.set(false)
 }
 
-// Map convention configuration names to "modular" corresponding 
configurations.
-Closure moduleConfigurationNameFor = { String configurationName ->
-  return "module" + configurationName.capitalize().replace("Classpath", 
"Path")
-}
-
-//
-// For each source set, create explicit configurations for declaring 
modular dependencies.
-// These "modular" configurations correspond 1:1 to Gradle's conventions 
but have a 'module' prefix
-// and a capitalized remaining part of the conventional name. For example, 
an 'api' configuration in
-// the main source set would have a corresponding 'moduleApi' 
configuration for declaring modular
-// dependencies.
-//
-// Gradle's java plugin "convention" configurations extend from their 
modular counterparts
-// so all dependencies end up on classpath by default for backward 
compatibility with other
-// tasks and gradle infrastructure.
 //
-// At the same time, we also know which dependencies (and their transitive 
graph of dependencies!)
-// should be placed on module-path only.
-//
-// Note that an explicit configuration of modular dependencies also opens 
up the possibility of automatically
-// validating whether the dependency configuration for a gradle project is 
consistent with the information in
-// the module-info descriptor because there is a (nearly?) direct 
correspondence between the two:
-//
-// moduleApi- 'requires transitive'
-// moduleImplementation - 'requires'
-// moduleCompileOnly- 'requires static'
+// Configure modular extensions for each source set.
 //
 project.sourceSets.all { SourceSet sourceSet ->
-  ConfigurationContainer configurations = project.configurations
-
-  // Create modular configurations for convention configurations.
-  Closure createModuleConfigurationForConvention = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = 
configurations.maybeCreate(moduleConfigurationNameFor(configurationName))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(false)
-conventionConfiguration.extendsFrom(moduleConfiguration)
-
-project.logger.info("Created module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration moduleApi = 
createModuleConfigurationForConvention(sourceSet.apiConfigurationName)
-  Configuration moduleImplementation = 
createModuleConfigurationForConvention(sourceSet.implementationConfigurationName)
-  moduleImplementation.extendsFrom(moduleApi)
-  Configuration moduleRuntimeOnly = 
createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName)
-  Configuration moduleCompileOnly = 
createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName)
-  // sourceSet.compileOnlyApiConfigurationName  // This seems like a very 
esoteric use case, leave out.
-
-  // Set up compilation module path configuration combining corresponding 
convention configurations.
-  Closure createResolvableModuleConfiguration = { String 
configurationName ->
-Configuration conventionConfiguration = 
configurations.maybeCreate(configurationName)
-Configuration moduleConfiguration = configurations.maybeCreate(
-moduleConfigurationNameFor(conventionConfiguration.name))
-moduleConfiguration.canBeConsumed(false)
-moduleConfiguration.canBeResolved(true)
-moduleConfiguration.attributes {
-  // Prefer class folders over JARs. The exception is made for tests 
projects which require a composition
-  // of classes and resources, otherwise split into two folders.
-  if (project.name.endsWith(".tests")) {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.JAR))
-  } else {
-attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, 
objects.named(LibraryElements, LibraryElements.CLASSES))
-  }
-}
-
-project.logger.info("Created resolvable module configuration for 
'${conventionConfiguration.name}': ${moduleConfiguration.name}")
-return moduleConfiguration
-  }
-
-  Configuration compileModulePathConfiguration = 
createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName)
-  compileModulePathConfiguration.extendsFrom(moduleCompileOnly, 
moduleImplementation)
-
-  Configuration runtimeModulePathConfiguration = 
c

[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field



[ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469392#comment-17469392
 ] 

ASF subversion and git services commented on LUCENE-10291:
--

Commit 7572352b7927c8099847d87bb2bb468af6c15958 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7572352 ]

LUCENE-10291: Bug fix.


> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field

2022-01-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469388#comment-17469388
 ] 

ASF subversion and git services commented on LUCENE-10291:
--

Commit f9ff620ec6b368f94669eb71c5f0c92ac89e6951 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f9ff620 ]

LUCENE-10291: CHANGES entry


> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field

2022-01-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469391#comment-17469391
 ] 

ASF subversion and git services commented on LUCENE-10291:
--

Commit 5920486671995f3752ae09519bb8a9e931d3056a in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5920486 ]

LUCENE-10291: CHANGES entry


> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10291) Only read/write postings when there is at least one indexed field

2022-01-05 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10291.
---
Fix Version/s: 9.1
   Resolution: Fixed

> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field



[ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469390#comment-17469390
 ] 

ASF subversion and git services commented on LUCENE-10291:
--

Commit 738247e78d1d5ff22f3755d0ceca8ad99d4f69f4 in lucene's branch 
refs/heads/branch_9x from Yannick Welsch
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=738247e ]

LUCENE-10291: Only read/write postings when there is at least one indexed field 
(#539)



> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field

2022-01-05 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469389#comment-17469389
 ] 

ASF subversion and git services commented on LUCENE-10291:
--

Commit 7fdba369415a3882df5f83ce6197a2f638b37fad in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7fdba36 ]

LUCENE-10291: Bug fix.


> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong



uschindler commented on pull request #571:
URL: https://github.com/apache/lucene/pull/571#issuecomment-1005827412


   > Please take a look at this comment/ chart, Uwe. 
https://issues.apache.org/jira/browse/LUCENE-10328?focusedCommentId=17468676&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17468676
   > 
   > How tests are run depends on the combination of whether source and test 
source sets are modules.
   > 
   > We do not have module patching. I don't think it's possible to configure 
gradle internal infrastructure and plugins to reasonably use it.
   
   OK thanks for the picture. Much more information. I was hoping it is like 
that. That module patching does not work was known to me, I just stumbled on 
some support methods for it and I did not understand when they are used.
   
   I am perfectly fine when we run our tests for a class in classpath mode, 
although the main sourceset is modular. Technically this won't change the test 
results, as UNIT tests should only check internal assertions. Of course in 
reality it is more complicated.
   
   What I understood and which is missing in the image: The depenendecies are 
always modular, so lucene.core is put on module-path, even so we are running 
tests in classpath mode. This is waht this PR mainly changes, correct? 
reviously it was not fully working unless you explicitely declared it.
   
   What we should now do (after this is merged):
   - Review implementation vs api dependencies (both on gradle and on 
module-info). With my other PR for test-random-chains i found an issue because 
of this. E.g. the phonetic module uses commons-codec also in its public API. 
Compilation of my module worked for some reason, but forbiddenapis failed, as 
it was not able to see the classes (when inspecting the method signatures). 
Which is understandable. Also ICU needs to refer to ICU in an API (gradle) / 
transitive (module-system) way. So we should enable the exports checks. When 
developing the last patch about logging in core, i would make java.logging 
still non-transitive, because it is unlikely that you would use it in 
downstream code (although theres a public signature using JUL). Because of that 
I added a `SuppressWarnings("exports")` on the class using it.
   - Fix up the module-decriptor files to figure out that all is sane
   
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] ywelsch commented on pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field

2022-01-05 Thread ASF subversion and git services (Jira)



ywelsch commented on pull request #539:
URL: https://github.com/apache/lucene/pull/539#issuecomment-1005809668


   Thanks @jpountz!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field



[ 
https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469383#comment-17469383
 ] 

ASF subversion and git services commented on LUCENE-10291:
--

Commit 8fa7412dec458e42f379cc856bd6ffebe8c6f8e9 in lucene's branch 
refs/heads/main from Yannick Welsch
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8fa7412 ]

LUCENE-10291: Only read/write postings when there is at least one indexed field 
(#539)



> Only read/write postings when there is at least one indexed field
> -
>
> Key: LUCENE-10291
> URL: https://issues.apache.org/jira/browse/LUCENE-10291
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Unlike points, norms, term vectors or doc values which only get written to 
> the directory when at least one of the fields uses the data structure, 
> postings always get written to the directory.
> While this isn't hurting much, it can be surprising at times, e.g. if you 
> index with SimpleText you will have a file for postings even though none of 
> the fields indexes postings. This inconsistency is hidden with the default 
> codec due to the fact that it uses PerFieldPostingsFormat, which only 
> delegates to any of the per-field codecs if any of the fields is actually 
> indexed, so you don't actually get a file if none of the fields is indexed.
> We noticed this behavior by creating a codec that throws 
> UnsupportedOperationException for postings since it's not expected to have 
> postings, and it always fails writing or reading data. While it's easy to 
> work around this issue on top of Lucene by using a dummy postings format, it 
> would be better to fix Lucene to handle postings consistently with other data 
> structures?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field



jpountz merged pull request #539:
URL: https://github.com/apache/lucene/pull/539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene



[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469380#comment-17469380
 ] 

Adrien Grand commented on LUCENE-10157:
---

If you don't mind, I would like to move this functionality to the sandbox and 
undo changes to core APIs like \{{Scorable#smoothingScore}} until we have a 
better idea of whether we should make these things first-class citizens in 
Lucene's scoring APIs.

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



rmuir commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005769579


   > Thanks for the pointer, Robert. I wonder what the "acceptable level" 
criteria are. ;)
   
   I wonder too, i searched some commonly used java libraries mainline branches 
(`guava`, `log4j2`) and found `AccessController` calls in each. If OpenJDK 
actually remove theses methods anytime soon, it will break probably every java 
app right now. So I'm not worried.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



dweiss commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005762127


   Thanks for the pointer, Robert. I wonder what the "acceptable level" 
criteria are. ;) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10343) Remove MyRandom in favor of test framework random



 [ 
https://issues.apache.org/jira/browse/LUCENE-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10343.
---
Fix Version/s: 9.1
   Resolution: Fixed

> Remove MyRandom in favor of test framework random
> -
>
> Key: LUCENE-10343
> URL: https://issues.apache.org/jira/browse/LUCENE-10343
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Feng Guo
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler commented on a change in pull request #579:
URL: https://github.com/apache/lucene/pull/579#discussion_r778884007



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/HardlinkCopyDirectoryWrapper.java
##
@@ -66,7 +67,7 @@ public void copyFrom(Directory from, String srcFile, String 
destFile, IOContext
 // only try hardlinks if we have permission to access the files
 // if not super.copyFrom() will give us the right exceptions
 suppressedException =
-LegacySecurityManager.doPrivileged(
+doPrivileged(

Review comment:
   Same at other places 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.

2022-01-05 Thread ASF subversion and git services (Jira)



uschindler commented on a change in pull request #579:
URL: https://github.com/apache/lucene/pull/579#discussion_r778883566



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/HardlinkCopyDirectoryWrapper.java
##
@@ -66,7 +67,7 @@ public void copyFrom(Directory from, String srcFile, String 
destFile, IOContext
 // only try hardlinks if we have permission to access the files
 // if not super.copyFrom() will give us the right exceptions
 suppressedException =
-LegacySecurityManager.doPrivileged(
+doPrivileged(

Review comment:
   Now we can also remove the cast in next line. As doPrivileged is not 
overloaded.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10343) Remove MyRandom in favor of test framework random



[ 
https://issues.apache.org/jira/browse/LUCENE-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469370#comment-17469370
 ] 

ASF subversion and git services commented on LUCENE-10343:
--

Commit 76d83507beddcc421fc1906e0be4562e16531819 in lucene's branch 
refs/heads/branch_9x from gf2121
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=76d8350 ]

LUCENE-10343: Remove MyRandom in favor of test framework random (#573)



> Remove MyRandom in favor of test framework random
> -
>
> Key: LUCENE-10343
> URL: https://issues.apache.org/jira/browse/LUCENE-10343
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Feng Guo
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #529: Use CDN to download source release.



jpountz merged pull request #529:
URL: https://github.com/apache/lucene/pull/529


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



rmuir commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005755345


   I don't think the code is going to stop compiling, instead you will just 
"lose protection"? 
   
   > In Java 18 and later, we will degrade other Security Manager APIs so that 
they remain in place but with limited or no functionality. For example, we may 
revise AccessController::doPrivileged simply to run the given action, or revise 
System::getSecurityManager always to return null. This will allow libraries 
that support the Security Manager and were compiled against previous Java 
releases to continue to work without change or even recompilation. We expect to 
remove the APIs once the compatibility risk of doing so declines to an 
acceptable level.
   
   https://openjdk.java.net/jeps/411


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #525: Modernize release announcement text.



jpountz merged pull request #525:
URL: https://github.com/apache/lucene/pull/525


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.



uschindler commented on a change in pull request #579:
URL: https://github.com/apache/lucene/pull/579#discussion_r778881181



##
File path: 
lucene/core/src/java/org/apache/lucene/util/LegacySecurityManager.java
##
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.security.AccessController;
+import java.security.PrivilegedAction;
+
+/**
+ * Encapsulates access to the security manager, which is deprecated as of Java 
17.
+ *
+ * @lucene.internal
+ */
+@SuppressWarnings("removal")
+@SuppressForbidden(reason = "security manager")
+public final class LegacySecurityManager {
+
+  /** Delegates to {@link AccessController#doPrivileged(PrivilegedAction)}. */
+  public static  T doPrivileged(PrivilegedAction action) {
+return AccessController.doPrivileged(action);
+  }

Review comment:
   Yes. This breaks security. AccessController is caller sensitive. So 
having it as public method kills all.
   
   Better just put SuppressForbidden and SuppressWarnings everywhere.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



jpountz commented on a change in pull request #534:
URL: https://github.com/apache/lucene/pull/534#discussion_r778880348



##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/perfield/TestPerFieldKnnVectorsFormat.java
##
@@ -172,9 +171,14 @@ public KnnVectorsWriter fieldsWriter(SegmentWriteState 
state) throws IOException
   KnnVectorsWriter writer = delegate.fieldsWriter(state);
   return new KnnVectorsWriter() {
 @Override
-public void writeField(FieldInfo fieldInfo, VectorValues values) 
throws IOException {
+public void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
+throws IOException {
   fieldsWritten.add(fieldInfo.name);
-  writer.writeField(fieldInfo, values);
+  // assert that knnVectorsReader#getVectorValues returns different 
instances upon repeated
+  // calls

Review comment:
   This is the sort of things that we usually check via 
AssertingKnnVectorsReader.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system



 [ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-10352.

Fix Version/s: 9.1
   10.0 (main)
   Resolution: Fixed

We opened several issues about broken analysis componets. If you want to run 
beaster to find more bugs, you can run the following command on main or 
branch_9x:

{{$ gradlew :lucene:analysis.tests:beast -Dtests.dups=100 --tests 
TestRandomChains -Dtests.nightly=true}}

Thanks to all who helped.

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: 9.1, 10.0 (main)
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-01-05 Thread ASF subversion and git services (Jira)



jpountz commented on pull request #541:
URL: https://github.com/apache/lucene/pull/541#issuecomment-1005745531


   Nice. I wonder if we need to specialize for so many numbers of bits per 
value like we do for postings, or if we should only specialize for a few 
numbers of bits per value that are both useful and fast, e.g. 0, 4, 8, 16, 24 
and 32.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system



[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469362#comment-17469362
 ] 

ASF subversion and git services commented on LUCENE-10352:
--

Commit 75259417f1b8de05eda4cf3a8b8c5e8177c7f0dd in lucene's branch 
refs/heads/branch_9x from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7525941 ]

LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a 
global integration test and discover classes to check from module system (#582)

Co-authored-by: Robert Muir 

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.

2022-01-05 Thread ASF subversion and git services (Jira)



dweiss commented on pull request #579:
URL: https://github.com/apache/lucene/pull/579#issuecomment-1005744351


   I do have the same concerns but at the same time - if they remove the 
security manager entirely in, say, JDK 17+X then the code will stop compiling/ 
working anyway. Maybe these concerns should be left for JDK maintainers though. 
:)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system



[ 
https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469354#comment-17469354
 ] 

ASF subversion and git services commented on LUCENE-10352:
--

Commit 475fbd0bdde31c6a2ae62c59505cf9e8becd50e4 in lucene's branch 
refs/heads/main from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=475fbd0 ]

LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a 
global integration test and discover classes to check from module system (#582)

Co-authored-by: Robert Muir 

> Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global 
> integration test and discover classes to check from module system
> 
>
> Key: LUCENE-10352
> URL: https://issues.apache.org/jira/browse/LUCENE-10352
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the 
> analysis-commons module, but e.g. we do not do a random chain with kuromoji 
> and ICU. Also both tests rely on some hacky classpath-inspection and the 
> tests fail if ran on a JAR file.
> This issue tracks progress I am currently doing to refactor this:
> - Move those 2 classes to a new gradle subproject 
> :lucene:analysis:integration.tests and add a module-info referring to all 
> other analysis packages
> - Rewrite the class discovery to use ModuleReader
> - Run TestAllAnalyzersHaveFactories per module (using one module reader), so 
> it discovers all classes and ensures that factory and stream are in same 
> module (there are some core vs. analysis.common discrepancies)
> - RunTestRandomChains on the whole module graph. The classes are discovered 
> from all module readers in the graph (filtering on module name starting with 
> "org.apache.lucene.analysis."
> - Also compare that the SPI factories returned by discovery match those we 
> have in the module graphs
> While doing this I disovered some bad things:
> - TestRandomChains depends on test-only resources. We may need to replicate 
> those (it is about 5 files that are fed into the ctors)
> - We have 5 different StringMockResourceLoaders: Originally it was only in 
> analysis common, now its everywhere. I will move this class to 
> test-framework. This is unrelated but can be done here. The background of 
> this was that analysis factories and resource loaders were not part of lucene 
> core, so the resourceloader interface couldn't be in test-framework.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #545: LUCENE-10319: make ForUtil#BLOCK_SIZE changeable