[GitHub] [lucene] jpountz commented on pull request #588: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (9.1.0 Backporting)
jpountz commented on pull request #588: URL: https://github.com/apache/lucene/pull/588#issuecomment-1006349216 Feel free to merge this if tests pass and you didn't have to make significant changes upon backporting. Do we also need to move the CHANGES entry to a different version on other branches? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()
jpountz commented on a change in pull request #578: URL: https://github.com/apache/lucene/pull/578#discussion_r779353869 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java ## @@ -74,11 +74,6 @@ protected boolean useHashTable(FacetsCollector fc, TaxonomyReader taxoReader) { return sumTotalHits < maxDoc / 10; } - /** Increment the count for this ordinal by 1. */ - protected void increment(int ordinal) { Review comment: @gsmiller I wonder if we should reconsider the backward-compatibility guarantees of the faceting APIs. Except for APIs that are really meant for end users to extend like analysis components, my understanding is that we usually consider overriding of our own classes expert usage that is not subject to backward compatibility (as opposed to direct usage of these classes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn opened a new pull request #588: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (Backporting)
zacharymorn opened a new pull request #588: URL: https://github.com/apache/lucene/pull/588 This PR backports bug fix #444 to version `9.1.0` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn opened a new pull request #587: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (Backporting)
zacharymorn opened a new pull request #587: URL: https://github.com/apache/lucene/pull/587 This PR backports bug fix apache/lucene#444 to version `9.0.1` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn opened a new pull request #2637: LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (Backporting)
zacharymorn opened a new pull request #2637: URL: https://github.com/apache/lucene-solr/pull/2637 This PR backports bug fix https://github.com/apache/lucene/pull/444 to version `8.11.2` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
zacharymorn commented on a change in pull request #534: URL: https://github.com/apache/lucene/pull/534#discussion_r779299155 ## File path: lucene/core/src/test/org/apache/lucene/codecs/perfield/TestPerFieldKnnVectorsFormat.java ## @@ -172,9 +171,14 @@ public KnnVectorsWriter fieldsWriter(SegmentWriteState state) throws IOException KnnVectorsWriter writer = delegate.fieldsWriter(state); return new KnnVectorsWriter() { @Override -public void writeField(FieldInfo fieldInfo, VectorValues values) throws IOException { +public void writeField(FieldInfo fieldInfo, KnnVectorsReader knnVectorsReader) +throws IOException { fieldsWritten.add(fieldInfo.name); - writer.writeField(fieldInfo, values); + // assert that knnVectorsReader#getVectorValues returns different instances upon repeated + // calls Review comment: Ah right. I've moved it to `AssertingKnnVectorsReader`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on a change in pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts
gf2121 commented on a change in pull request #585: URL: https://github.com/apache/lucene/pull/585#discussion_r779279138 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -71,17 +71,27 @@ public FastTaxonomyFacetCounts( private final void count(List matchingDocs) throws IOException { for (MatchingDocs hits : matchingDocs) { - SortedNumericDocValues dv = hits.context.reader().getSortedNumericDocValues(indexFieldName); - if (dv == null) { + SortedNumericDocValues multiValued = + hits.context.reader().getSortedNumericDocValues(indexFieldName); + if (multiValued == null) { continue; } + NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); + + DocIdSetIterator valuesIt = singleValued != null ? singleValued : multiValued; DocIdSetIterator it = - ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), dv)); + ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), valuesIt)); - for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { -for (int i = 0; i < dv.docValueCount(); i++) { - increment((int) dv.nextValue()); + if (singleValued != null) { +for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { Review comment: Maybe simplify this a bit with `while(it.nextDoc() != DocIdSetIterator.NO_MORE_DOCS)` as `doc` is not used in the loop body? ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -91,31 +101,36 @@ private final void count(List matchingDocs) throws IOException { private final void countAll(IndexReader reader) throws IOException { for (LeafReaderContext context : reader.leaves()) { - SortedNumericDocValues dv = context.reader().getSortedNumericDocValues(indexFieldName); - if (dv == null) { + SortedNumericDocValues multiValued = Review comment: +1 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -71,17 +71,27 @@ public FastTaxonomyFacetCounts( private final void count(List matchingDocs) throws IOException { for (MatchingDocs hits : matchingDocs) { - SortedNumericDocValues dv = hits.context.reader().getSortedNumericDocValues(indexFieldName); - if (dv == null) { + SortedNumericDocValues multiValued = + hits.context.reader().getSortedNumericDocValues(indexFieldName); + if (multiValued == null) { continue; } + NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); + + DocIdSetIterator valuesIt = singleValued != null ? singleValued : multiValued; DocIdSetIterator it = - ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), dv)); + ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), valuesIt)); - for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { -for (int i = 0; i < dv.docValueCount(); i++) { - increment((int) dv.nextValue()); + if (singleValued != null) { +for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { + increment((int) singleValued.longValue()); +} + } else { +for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { Review comment: We can use `while(it.nextDoc() != DocIdSetIterator.NO_MORE_DOCS)` here too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time
[ https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed LUCENE-6121. > Fix CachingTokenFilter to propagate reset() the first time > -- > > Key: LUCENE-6121 > URL: https://issues.apache.org/jira/browse/LUCENE-6121 > Project: Lucene - Core > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Labels: random-chains > Fix For: 5.0, 6.0 > > Attachments: > LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, > LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch > > > CachingTokenFilter should have been propagating reset() _but only the first > time_ and thus you would then use CachingTokenFilter in a more normal way – > wrap it and call reset() then increment in a loop, etc., instead of knowing > you need to reset() on what it wraps but not this token filter itself. That's > weird. It's ab-normal for a TokenFilter to never propagate reset, so every > user of CachingTokenFilter to date has worked around this by calling reset() > on the underlying input instead of the final wrapping token filter > (CachingTokenFilter in this case). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time
[ https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved LUCENE-6121. -- Resolution: Fixed > Fix CachingTokenFilter to propagate reset() the first time > -- > > Key: LUCENE-6121 > URL: https://issues.apache.org/jira/browse/LUCENE-6121 > Project: Lucene - Core > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Labels: random-chains > Fix For: 6.0, 5.0 > > Attachments: > LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, > LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch > > > CachingTokenFilter should have been propagating reset() _but only the first > time_ and thus you would then use CachingTokenFilter in a more normal way – > wrap it and call reset() then increment in a loop, etc., instead of knowing > you need to reset() on what it wraps but not this token filter itself. That's > weird. It's ab-normal for a TokenFilter to never propagate reset, so every > user of CachingTokenFilter to date has worked around this by calling reset() > on the underlying input instead of the final wrapping token filter > (CachingTokenFilter in this case). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #543: LUCENE-10245: Addition of MultiDoubleValues(Source) and MultiLongValues(Source) along with faceting capabilities
gsmiller commented on pull request #543: URL: https://github.com/apache/lucene/pull/543#issuecomment-1006175062 @romseygeek I think this PR is ready for another look when you have a moment. Thanks again for your input! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #586: LUCENE-10353: add random null injection to TestRandomChains
rmuir commented on pull request #586: URL: https://github.com/apache/lucene/pull/586#issuecomment-1006157404 > I am fine, except the NPEs should have a message. Why? for users that throw the stacktrace away? > > P.S.: And as said maybe require a message always!? Maybe, we should just decide how it should look? FWIW, if you care about messages, the implicit NPEs from the JDK are superior to anything we do: ``` java.lang.NullPointerException: Cannot load from int array because "x" is null at npe.implicitArray(npe.java:8) ... java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because "x" is null at npe.implicit(npe.java:5) ... ``` If we just do `Objects.requireNonNull(x)`, we get: ``` java.lang.NullPointerException at java.base/java.util.Objects.requireNonNull(Objects.java:208) at npe.objects(npe.java:11) ... ``` If we do `Objects.requireNonNull(x, "x")`, it is only slightly better: ``` java.lang.NullPointerException: x at java.base/java.util.Objects.requireNonNull(Objects.java:233) at npe.message(npe.java:14) ... ``` In all cases there is a stack trace, users can't expect to debug anything if they throw that away. So part of me says, don't even bother with message. Especially I would be against formatting fancy strings for every null check. I can go along with just putting local variable's name in the message as a compromise (it is still an ugly hack! the "friendly" NPE feature in java seems half-baked!), but if we want that to be the standard, let's ban the one-arg method in forbidden-apis and fix it consistently everywhere? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10356) Special-case singleton doc values for general taxonomy facet counting
[ https://issues.apache.org/jira/browse/LUCENE-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469567#comment-17469567 ] Greg Miller commented on LUCENE-10356: -- [~gf2121] mind having a look at [https://github.com/apache/lucene/pull/585] when you have a free moment? This just extends a change you made recently. Curious to get your thoughts on it. Thanks! > Special-case singleton doc values for general taxonomy facet counting > - > > Key: LUCENE-10356 > URL: https://issues.apache.org/jira/browse/LUCENE-10356 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Assignee: Greg Miller >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Inspired by [https://github.com/apache/lucene/pull/574,] we should also > special-case singleton dvs in the general count path (#573 specialized it for > countAll). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #578: LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll()
gsmiller commented on a change in pull request #578: URL: https://github.com/apache/lucene/pull/578#discussion_r779174169 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java ## @@ -74,11 +74,6 @@ protected boolean useHashTable(FacetsCollector fc, TaxonomyReader taxoReader) { return sumTotalHits < maxDoc / 10; } - /** Increment the count for this ordinal by 1. */ - protected void increment(int ordinal) { Review comment: I think we ought to leave this in. Removing it is a backwards-compatibility concern since it's possible (likely) that users have sub-classed `intTaxonomyFacets` and rely on this. I think it's also nice to keep in general so that sub-classes can rely on this instead of having to manage direct access to the dense/sparse structures if they choose to. ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java ## @@ -32,9 +32,9 @@ public abstract class IntTaxonomyFacets extends TaxonomyFacets { /** Per-ordinal value. */ - private final int[] values; + final int[] values; Review comment: I'd suggest adding some javadoc to these two fields mentioning that they're exposed for sub-classes that want "expert" functionality (e.g., direct access along with the burden of knowing which one is being used). The doc could point users to `#increment` and `#getValue` for more typical use-cases that don't want the burden of directly accessing these. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #586: LUCENE-10353: add random null injection to TestRandomChains
uschindler commented on pull request #586: URL: https://github.com/apache/lucene/pull/586#issuecomment-1006113376 The only thing: we should pass the parameter name on NPE (2nd argument of requireNonNull. This allows to figure out easier which one was wrong for the caller. Maybe add a check in the random chains fuzzer to require a message on the exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #586: LUCENE-10353: add random null injection to TestRandomChains
uschindler commented on pull request #586: URL: https://github.com/apache/lucene/pull/586#issuecomment-1006112599 I switched to 100% null and it now still passes. We should also run one time with 50% to fuzz some cases where the first parameter is non-null and second is null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469557#comment-17469557 ] Greg Miller commented on LUCENE-10151: -- Just so I don't loose track of this thought, we'll probably also want the blocking call to {{Future#get}} to specify a timeout as well if the user has specified one (here: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L721). > Add timeout support to IndexSearcher > > > Key: LUCENE-10151 > URL: https://issues.apache.org/jira/browse/LUCENE-10151 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Minor > > I'd like to explore adding optional "timeout" capabilities to > {{IndexSearcher}}. This would enable users to (optionally) specify a maximum > time budget for search execution. If the search "times out", partial results > would be available. > This idea originated on the dev list (thanks [~jpountz] for the suggestion). > Thread for reference: > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E] > > A couple things to watch out for with this change: > # We want to make sure it's robust to a two-phase query evaluation scenario > where the "approximate" step matches a large number of candidates but the > "confirmation" step matches very few (or none). This is a particularly tricky > case. > # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is > {{GREATER_THAN_OR_EQUAL_TO}} if the query times out > # We want to make sure it plays nice with the {{LRUCache}} since it iterates > the query to pre-populate a {{BitSet}} when caching. That step shouldn't be > allowed to overrun the timeout. The proper way to handle this probably needs > some thought. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #586: LUCENE-10353: add random null injection to TestRandomChains
uschindler commented on a change in pull request #586: URL: https://github.com/apache/lucene/pull/586#discussion_r779141374 ## File path: lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java ## @@ -754,6 +758,7 @@ public String toString() { } catch (InvocationTargetException ite) { final Throwable cause = ite.getCause(); if (cause instanceof IllegalArgumentException +|| cause instanceof NullPointerException Review comment: Have added it in https://github.com/apache/lucene/pull/586/commits/c40d99c3ac90cac567483c114f158ee74a6c6698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #586: LUCENE-10353: add random null injection to TestRandomChains
rmuir commented on a change in pull request #586: URL: https://github.com/apache/lucene/pull/586#discussion_r779116266 ## File path: lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java ## @@ -754,6 +758,7 @@ public String toString() { } catch (InvocationTargetException ite) { final Throwable cause = ite.getCause(); if (cause instanceof IllegalArgumentException +|| cause instanceof NullPointerException Review comment: If we can tighten this logic to be `(cause instanceof NullPointerException && weProvidedANullArg)`, I think it would be better. then the test wouldn't mask bugs (internal NPEs) that happen for situations where we passed the ctor all non-null arguments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10261) Preset/ custom analyzer pipelines in Luke won't work with the module system
[ https://issues.apache.org/jira/browse/LUCENE-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10261. -- Resolution: Not A Problem > Preset/ custom analyzer pipelines in Luke won't work with the module system > --- > > Key: LUCENE-10261 > URL: https://issues.apache.org/jira/browse/LUCENE-10261 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > A spinoff from LUCENE-10255 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request #586: LUCENE-10353: add random null injection to TestRandomChains
rmuir opened a new pull request #586: URL: https://github.com/apache/lucene/pull/586 10% of the time, TestRandomChains will pass `null` to any object parameters in analyzers' ctors. We allow NPE from the ctor, so it enforces the analyzers check up front. It just means we have to run the test in a loop: ``` ./gradlew :lucene:analysis.tests:beast -Dtests.dups=100 --tests TestRandomChains -Dtests.nightly=true ``` and add missing `Objects.requireNonNull()` to the bugs that it finds at runtime. Example fail: ``` > java.lang.NullPointerException: Cannot invoke "org.apache.lucene.analysis.compound.hyphenation.HyphenationTree.hyphenate(char[], int, int, int, int)" because "this.hyphenator" is null > at __randomizedtesting.SeedInfo.seed([29B8EF94FA5640A3:1459C6F5BD445D63]:0) > at org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter.decompose(HyphenationCompoundWordTokenFilter.java:143) > at org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase.incrementToken(CompoundWordTokenFilterBase.java:115) ``` See issue: https://issues.apache.org/jira/browse/LUCENE-10353 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10328. -- Resolution: Fixed > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: 9.1 > > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469529#comment-17469529 ] ASF subversion and git services commented on LUCENE-10328: -- Commit b8da9f32c8d436cc39601264dbb1f039b9882b57 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b8da9f3 ] LUCENE-10328: open up certain packages for junit and the test framework (reflective access). > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: 9.1 > > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reopened LUCENE-10328: -- Reopen for manual backporting to 9x. > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: 9.1 > > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469515#comment-17469515 ] ASF subversion and git services commented on LUCENE-10328: -- Commit ff547e7bbdc78d6869b6f47d828aa6452664ce58 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ff547e7 ] LUCENE-10328: Module path for compiling and running tests is wrong (#571) > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10328) Module path for compiling and running tests is wrong
[ https://issues.apache.org/jira/browse/LUCENE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10328. -- Fix Version/s: 9.1 Resolution: Fixed > Module path for compiling and running tests is wrong > > > Key: LUCENE-10328 > URL: https://issues.apache.org/jira/browse/LUCENE-10328 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: 9.1 > > Attachments: image-2021-12-19-12-29-21-737.png, > image-2022-01-04-16-04-56-563.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Uwe noticed that the module path for compiling and running tests is empty - > indeed, the modular configurations we create for the test sourceset do not > inherit from their main counterparts. This is not a standard thing created > for a sourceset - the test-main connection link is created by gradle's java > plugin. We need to do a similar thing for modular configurations. > !image-2021-12-19-12-29-21-737.png|width=490,height=280! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
dweiss merged pull request #571: URL: https://github.com/apache/lucene/pull/571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mcimadamore edited a comment on pull request #518: Initial rewrite of MMapDirectory for JDK-18 preview (incubating) Panama APIs (>= JDK-18-ea-b26)
mcimadamore edited a comment on pull request #518: URL: https://github.com/apache/lucene/pull/518#issuecomment-1005995125 > From what I have learned, copy operations have high overhead because: > > * they are not hot, so aren't optimized so fast > > * when not optimized, the setup cost is high (lots of class checks to get array type, decision for swapping bytes). This is especially heavy for small arrays. Hi, I'm not sure as to why copy operations should be slower in the memory access API than with the ByteBuffer API. I would expect most of the checks to be similar (except for the liveness tests of the segment involved). I do recall that the ByteBuffer API does optimize bulk copy for very small buffers (I don't recall what the limit is, but it was very very low, like 4 elements or something). In principle, this JVM fix (as per 18) should help too: https://bugs.openjdk.java.net/browse/JDK-8269119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mcimadamore commented on pull request #518: Initial rewrite of MMapDirectory for JDK-18 preview (incubating) Panama APIs (>= JDK-18-ea-b26)
mcimadamore commented on pull request #518: URL: https://github.com/apache/lucene/pull/518#issuecomment-1005995125 > From what I have learned, copy operations have high overhead because: > > * they are not hot, so aren't optimized so fast > > * when not optimized, the setup cost is high (lots of class checks to get array type, decision for swapping bytes). This is especially heavy for small arrays. Hi, I'm not sure as to why copy operations should be slower in the memory access API then with the ByteBuffer API. I would expect most of the checks to be similar (except for the liveness tests of the segment involved). I do recall that the ByteBuffer API does optimize bulk copy for very small buffers (I don't recall what the limit is, but it was very very low, like 4 elements or something). In principle, this JVM fix (as per 18) should help too: https://bugs.openjdk.java.net/browse/JDK-8269119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts
gsmiller commented on a change in pull request #585: URL: https://github.com/apache/lucene/pull/585#discussion_r779026781 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -91,31 +101,36 @@ private final void count(List matchingDocs) throws IOException { private final void countAll(IndexReader reader) throws IOException { for (LeafReaderContext context : reader.leaves()) { - SortedNumericDocValues dv = context.reader().getSortedNumericDocValues(indexFieldName); - if (dv == null) { + SortedNumericDocValues multiValued = Review comment: I took the liberty to suggest renaming variables here for consistency with other faceting implementations that do this and for slightly improved readability (IMO anyway). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts
gsmiller commented on pull request #585: URL: https://github.com/apache/lucene/pull/585#issuecomment-1005949811 Maybe a very small improvement with this change, but nothing particularly impactful. I wonder how often we actually trigger this case in our benchmarks? Certainly not as often as with the "count all" cases. I think it's worth making this change though for consistency (and I think there's a small improvement there anyway). ``` TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value BrowseDayOfYearTaxoFacets 14.53 (10.8%) 14.08 (11.4%) -3.1% ( -22% - 21%) 0.378 BrowseDateTaxoFacets 14.48 (10.7%) 14.07 (11.2%) -2.8% ( -22% - 21%) 0.416 HighIntervalsOrdered3.00 (5.2%)2.94 (7.0%) -1.8% ( -13% - 10%) 0.359 OrNotHighHigh 1044.76 (4.5%) 1028.26 (3.0%) -1.6% ( -8% -6%) 0.188 MedIntervalsOrdered 35.82 (4.8%) 35.28 (6.1%) -1.5% ( -11% -9%) 0.384 OrNotHighMed 969.48 (2.7%) 957.92 (2.3%) -1.2% ( -5% -3%) 0.129 OrHighLow 1228.88 (2.8%) 1215.30 (2.5%) -1.1% ( -6% -4%) 0.188 BrowseMonthTaxoFacets 15.15 (10.3%) 14.99 (10.0%) -1.1% ( -19% - 21%) 0.740 OrNotHighLow 864.18 (2.4%) 856.27 (2.0%) -0.9% ( -5% -3%) 0.187 LowIntervalsOrdered 14.43 (2.5%) 14.32 (3.3%) -0.7% ( -6% -5%) 0.422 BrowseRandomLabelTaxoFacets 12.34 (9.2%) 12.26 (9.9%) -0.6% ( -18% - 20%) 0.834 OrHighNotHigh 877.19 (3.9%) 872.54 (3.8%) -0.5% ( -7% -7%) 0.662 HighPhrase 106.42 (1.7%) 106.06 (1.4%) -0.3% ( -3% -2%) 0.498 MedTerm 1930.85 (4.6%) 1924.69 (3.5%) -0.3% ( -8% -8%) 0.806 HighTerm 1383.88 (4.6%) 1380.07 (4.1%) -0.3% ( -8% -8%) 0.843 MedPhrase 744.32 (2.1%) 742.71 (2.4%) -0.2% ( -4% -4%) 0.761 LowTerm 3065.99 (3.3%) 3060.87 (3.8%) -0.2% ( -7% -7%) 0.882 LowPhrase 82.62 (1.6%) 82.51 (1.4%) -0.1% ( -3% -2%) 0.768 OrHighNotLow 1094.83 (4.7%) 1094.10 (3.1%) -0.1% ( -7% -8%) 0.958 MedSloppyPhrase 104.44 (2.5%) 104.41 (3.3%) -0.0% ( -5% -5%) 0.980 OrHighMed 193.98 (4.4%) 193.96 (4.6%) -0.0% ( -8% -9%) 0.994 Respell 65.50 (1.1%) 65.52 (1.2%)0.0% ( -2% -2%) 0.913 Fuzzy2 84.12 (1.1%) 84.16 (1.0%)0.0% ( -2% -2%) 0.901 AndHighLow 1163.35 (3.5%) 1163.95 (3.1%)0.1% ( -6% -6%) 0.961 BrowseRandomLabelSSDVFacets9.34 (1.7%)9.35 (2.9%)0.1% ( -4% -4%) 0.940 BrowseMonthSSDVFacets 12.91 (13.1%) 12.92 (13.1%)0.1% ( -23% - 30%) 0.978 OrHighNotMed 1094.06 (3.9%) 1095.46 (3.1%)0.1% ( -6% -7%) 0.908 PKLookup 171.75 (3.0%) 172.12 (3.3%)0.2% ( -5% -6%) 0.829 Prefix3 365.59 (9.9%) 366.48 (9.1%)0.2% ( -17% - 21%) 0.936 Fuzzy1 113.46 (1.2%) 113.82 (1.2%)0.3% ( -2% -2%) 0.390 Wildcard 38.65 (8.2%) 38.77 (7.8%)0.3% ( -14% - 17%) 0.898 HighSloppyPhrase 11.81 (3.3%) 11.85 (3.9%)0.3% ( -6% -7%) 0.773 OrHighHigh 12.04 (4.0%) 12.10 (4.0%)0.5% ( -7% -8%) 0.713 LowSloppyPhrase7.75 (3.6%)7.79 (4.3%)0.5% ( -7% -8%) 0.703 LowSpanNear 29.78 (2.5%) 29.96 (2.4%)0.6% ( -4% -5%) 0.420 HighTermMonthSort 126.96 (16.0%) 127.81 (14.3%)0.7% ( -25% - 36%) 0.889 HighSpanNear 12.43 (3.3%) 12.52 (3.2%)0.7% ( -5% -7%) 0.488 MedSpanNear 12.64 (2.9%) 12.73 (2.9%)0.8% ( -4% -6%) 0.402 AndHighMed 55.03
[GitHub] [lucene] jtibshirani commented on a change in pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.
jtibshirani commented on a change in pull request #583: URL: https://github.com/apache/lucene/pull/583#discussion_r779026136 ## File path: lucene/core/src/java/org/apache/lucene/codecs/FieldsProducer.java ## @@ -42,6 +45,14 @@ protected FieldsProducer() {} */ public abstract void checkIntegrity() throws IOException; + /** + * Get the {@link Terms} for this field. The behavior is undefined if the field doesn't have Review comment: Got it. The latest changes make sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request #585: LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts
gsmiller opened a new pull request #585: URL: https://github.com/apache/lucene/pull/585 # Description Facet implementations have seen performance improvements by unwrapping singleton doc values for situations where the underlying field is actually single-valued. This change adds the optimization for counting in taxonomy faceting (bringing consistency with the countAll implementation along with SSDV faceting, etc.). # Solution Try unwrapping the `SortedNumericDocValues` as a `NumericDocValues` and use the single-valued field directly if possible. # Tests Existing tests cover this faceting implementation. Ran benchmarks as well. Saw very marginal improvements and no regressions. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10354) Clarify contract of codec APIs with missing/disabled fields
[ https://issues.apache.org/jira/browse/LUCENE-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469457#comment-17469457 ] ASF subversion and git services commented on LUCENE-10354: -- Commit c8651afde70c62b4a4f5618b9483953bd2bc1bb8 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c8651af ] LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields. (#583) > Clarify contract of codec APIs with missing/disabled fields > --- > > Key: LUCENE-10354 > URL: https://issues.apache.org/jira/browse/LUCENE-10354 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The question has come up a few times of how codec APIs should react to > fields that are missing or do not have the relevant feature enabled. > This issue proposes that we improve javadocs and AssertingCodec following the > same model as doc values and norms: > - The behavior of codec APIs on fields that are missing or don't have the > feature enabled is undefined. > - CodecReader is responsible for checking FieldInfos before delegating to > codec APIs. > - AssertingCodec ensures that we never call codec APIs on missing/disabled > fields. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #583: LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields.
jpountz merged pull request #583: URL: https://github.com/apache/lucene/pull/583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler edited a comment on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005928057 To conclude here: I was already thinking several times during the module system devlopment that it might be a good idea to have some pattern in forbidden/errorprone/... that detects if you call a caller-sensitive method like those in `AccessController#doPrivileged() / Class#getResourceAsStream() / Class#getResource()` or reflective invokes (not MethodHandles) and do that in some public/protected method that injects one of the method call parameters directly/indirectly into the caller-sensitive method. Because this pattern is mostly wrong and a security leak (or kills functionality of your public/protected method when used in module system encapsulation). Example of such a broken method (it is public and injects the `resource` parameter into `Class#getResourceAsStream()`, which is caller-sensitive: https://github.com/apache/lucene/blob/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/core/src/java/org/apache/lucene/util/IOUtils.java#L193-L212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler edited a comment on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005928057 To conclude here: I was already thinking several times during the module system devlopment that it might be a good idea to have some pattern in forbidden/errorprone/... that detects if you call a caller-sensitive method like those in `AccessController#doPrivileged() / Class#getResourceAsStream() / Class#getResource()` or reflective invokes (not MethodHandles) and do that in some public/protected method that injects one of the method call parameters directly/indirectly into the caller-sensitive method. Because this pattern is mostly wrong and a security leak (or kills functionality of your public/protected method when used in module system encapsulation). Example of such a broken method: https://github.com/apache/lucene/blob/cc342ea7407c729a743123d8f7957aff6c6f9792/lucene/core/src/java/org/apache/lucene/util/IOUtils.java#L193-L212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005928057 To conclude here: I was already thinking several times during the module system devlopment that it might be a good idea to have some pattern in forbidden/errorprone/... that detects if you call a caller-sensitive method like those in `AccessController#doPrivileged() / Class#getResourceAsStream() / Class#getResource()` or reflective invokes (not MethodHandles) and do that in some public method that injects one of the method call parameters directly/indirectly into the caller-sensitive method. Because this pattern is mostly wrong and a security leak (or kills your public method when used in module system encapsulation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
dweiss commented on pull request #571: URL: https://github.com/apache/lucene/pull/571#issuecomment-1005918864 > The depenendecies are always modular, so lucene.core is put on module-path, even so we are running tests in classpath mode. This is waht this PR mainly changes, correct? reviously it was not fully working unless you explicitely declared it. lucene.core (and any other dependency placed in modular configurations) is correctly inserted on module-path if this reference is from "outside" the project itself. In other words, the tests within lucene.core run with main source set classes on classpath (otherwise you'd have split package errors) but anywhere else where you reference lucene.core, it will be placed on module-path. That debug flag (-Pbuild.debug.paths=true) shows verbosely how classpath and module path is configured for each task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
dweiss commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778993450 ## File path: gradle/java/modules.gradle ## @@ -27,194 +29,167 @@ allprojects { modularity.inferModulePath.set(false) } -// Map convention configuration names to "modular" corresponding configurations. -Closure moduleConfigurationNameFor = { String configurationName -> - return "module" + configurationName.capitalize().replace("Classpath", "Path") -} - -// -// For each source set, create explicit configurations for declaring modular dependencies. -// These "modular" configurations correspond 1:1 to Gradle's conventions but have a 'module' prefix -// and a capitalized remaining part of the conventional name. For example, an 'api' configuration in -// the main source set would have a corresponding 'moduleApi' configuration for declaring modular -// dependencies. -// -// Gradle's java plugin "convention" configurations extend from their modular counterparts -// so all dependencies end up on classpath by default for backward compatibility with other -// tasks and gradle infrastructure. // -// At the same time, we also know which dependencies (and their transitive graph of dependencies!) -// should be placed on module-path only. -// -// Note that an explicit configuration of modular dependencies also opens up the possibility of automatically -// validating whether the dependency configuration for a gradle project is consistent with the information in -// the module-info descriptor because there is a (nearly?) direct correspondence between the two: -// -// moduleApi- 'requires transitive' -// moduleImplementation - 'requires' -// moduleCompileOnly- 'requires static' +// Configure modular extensions for each source set. // project.sourceSets.all { SourceSet sourceSet -> - ConfigurationContainer configurations = project.configurations - - // Create modular configurations for convention configurations. - Closure createModuleConfigurationForConvention = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate(moduleConfigurationNameFor(configurationName)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(false) -conventionConfiguration.extendsFrom(moduleConfiguration) - -project.logger.info("Created module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration moduleApi = createModuleConfigurationForConvention(sourceSet.apiConfigurationName) - Configuration moduleImplementation = createModuleConfigurationForConvention(sourceSet.implementationConfigurationName) - moduleImplementation.extendsFrom(moduleApi) - Configuration moduleRuntimeOnly = createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName) - Configuration moduleCompileOnly = createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName) - // sourceSet.compileOnlyApiConfigurationName // This seems like a very esoteric use case, leave out. - - // Set up compilation module path configuration combining corresponding convention configurations. - Closure createResolvableModuleConfiguration = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate( -moduleConfigurationNameFor(conventionConfiguration.name)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(true) -moduleConfiguration.attributes { - // Prefer class folders over JARs. The exception is made for tests projects which require a composition - // of classes and resources, otherwise split into two folders. - if (project.name.endsWith(".tests")) { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.JAR)) - } else { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.CLASSES)) - } -} - -project.logger.info("Created resolvable module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration compileModulePathConfiguration = createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName) - compileModulePathConfiguration.extendsFrom(moduleCompileOnly, moduleImplementation) - - Configuration runtimeModulePathConfiguration = creat
[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on a change in pull request #579: URL: https://github.com/apache/lucene/pull/579#discussion_r778990326 ## File path: lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java ## @@ -584,9 +585,13 @@ public static long shallowSizeOfInstance(Class clazz) { final Class target = clazz; final Field[] fields; try { -fields = +@SuppressWarnings("removal") Review comment: maybe extract method here in same way. Then the extra variable would not be needed and suppressforbidden works only on the call. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on a change in pull request #545: LUCENE-10319: make ForUtil#BLOCK_SIZE changeable
gf2121 commented on a change in pull request #545: URL: https://github.com/apache/lucene/pull/545#discussion_r778988719 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/ForUtil.java ## @@ -1051,4 +1052,76 @@ private static void decode24(DataInput in, long[] tmp, long[] longs) throws IOEx longs[longsIdx + 0] = l0; } } + Review comment: Let these codes be generated from the script, so that it can change with the change of BLOCK_SIZE. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler edited a comment on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005906362 > > Thanks for the pointer, Robert. I wonder what the "acceptable level" criteria are. ;) > > I wonder too, i searched some commonly used java libraries mainline branches (`guava`, `log4j2`) and found `AccessController` calls in each. If OpenJDK actually remove theses methods anytime soon, it will break probably every java app right now. So I'm not worried. When I met the OpenJDK committers before COVID started in Brussels and this was discussed for the first time the statement in the well-known beer-bar was "trust me, we won't remove SecurityManager and AccessController before Java 43 [fictive number]. But we will soon make all AccessController operations noops." I think the "deprecation for removal" is just to make it more prominent, so you build does not just print a warning that is always supressed (`-Xlint -deprecation` switch). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler edited a comment on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler edited a comment on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005906362 > > Thanks for the pointer, Robert. I wonder what the "acceptable level" criteria are. ;) > > I wonder too, i searched some commonly used java libraries mainline branches (`guava`, `log4j2`) and found `AccessController` calls in each. If OpenJDK actually remove theses methods anytime soon, it will break probably every java app right now. So I'm not worried. When I met the OpenJDK committers before COVID started in Brussels and this was discussed for the first time the statement in the well-known beer-bar was "trust me, we won't remove SecurityManager and AccessController before Java 43 [fictive number]. But we will soon make all AccessController operations noops." I think the "deprecation for removal" is just to make it more prominent, so you build does not just print a warning that is always supressed ({{-Xlint -deprecation}} switch). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on pull request #545: LUCENE-10319: make ForUtil#BLOCK_SIZE changeable
gf2121 commented on pull request #545: URL: https://github.com/apache/lucene/pull/545#issuecomment-1005906911 Thanks @jpountz ! This is indeed making codes harder to read. I tried to make all these complex constants generated from script, keeping `ForUtil.java` clean. How does it look now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005906362 > > Thanks for the pointer, Robert. I wonder what the "acceptable level" criteria are. ;) > > I wonder too, i searched some commonly used java libraries mainline branches (`guava`, `log4j2`) and found `AccessController` calls in each. If OpenJDK actually remove theses methods anytime soon, it will break probably every java app right now. So I'm not worried. When I met the OpenJDK committers before COVID started in Brussels and this was discussed for the first time the statement in the well-known beer-bar was "trust me, we won't remove SecurityManager and AccessController before Java 43". But we will soon make all AccessControler operations noops. I think the "deprecation for removal" is just to make it more prominent, so you build does not just print a warning that is always supressed ({{-Xlint -deprecation}} switch). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4983) CommonGramsFilter assumes all input tokens have a length of 1
[ https://issues.apache.org/jira/browse/LUCENE-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4983: -- Labels: random-chains (was: ) > CommonGramsFilter assumes all input tokens have a length of 1 > - > > Key: LUCENE-4983 > URL: https://issues.apache.org/jira/browse/LUCENE-4983 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Major > Labels: random-chains > > CommonGramsFilter set posLenAttribute to 2 for bi-grams, no matter the length > of the input tokens. Here is an example seed that produces a failure: > {noformat} > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 > [junit4:junit4] Executing 1 suite with 1 JVM. > [junit4:junit4] > [junit4:junit4] Started J0 PID(23946@RD-38). > [junit4:junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains > [junit4:junit4] 2> TEST FAIL: useCharFilter=true text='apuqdgtr wjco mpc ' > [junit4:junit4] 2> Exception from random analyzer: > [junit4:junit4] 2> charfilters= > [junit4:junit4] 2> > org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, , > java.io.StringReader@699f982d) > [junit4:junit4] 2> > org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@6cbfe887, > []) > [junit4:junit4] 2> tokenizer= > [junit4:junit4] 2> > org.apache.lucene.analysis.core.LetterTokenizer(LUCENE_44, > org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory@4fb3c3d9, > > org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@2b3b2ed8) > [junit4:junit4] 2> filters= > [junit4:junit4] 2> > org.apache.lucene.analysis.util.ElisionFilter(org.apache.lucene.analysis.ValidatingTokenFilter@0, > [iez]) > [junit4:junit4] 2> > org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@3a807d14, > org.apache.lucene.analysis.ValidatingTokenFilter@20) > [junit4:junit4] 2> > org.apache.lucene.analysis.commongrams.CommonGramsFilter(LUCENE_44, > org.apache.lucene.analysis.ValidatingTokenFilter@37caea, [bbtzjxco, , > jafehvlp, kujsm, znpfw, xqfni]) > [junit4:junit4] 2> > org.apache.lucene.analysis.bg.BulgarianStemFilter(org.apache.lucene.analysis.ValidatingTokenFilter@6c1927b) > [junit4:junit4] 2> offsetsAreCorrect=true > [junit4:junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestRandomChains -Dtests.method=testRandomChains > -Dtests.seed=3296009A5B3B7A05 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=ar_YE -Dtests.timezone=Europe/London > -Dtests.file.encoding=US-ASCII > [junit4:junit4] ERROR 14.7s | TestRandomChains.testRandomChains <<< > [junit4:junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: > inconsistent endOffset at pos=2: 13 vs 8; token=[㑮ٯb_ > [junit4:junit4]> at > __randomizedtesting.SeedInfo.seed([3296009A5B3B7A05:F7729FB1C2967C5]:0) > [junit4:junit4]> at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135) > [junit4:junit4]> at > org.apache.lucene.analysis.bg.BulgarianStemFilter.incrementToken(BulgarianStemFilter.java:48) > [junit4:junit4]> at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) > [junit4:junit4]> at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:635) > [junit4:junit4]> at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546) > [junit4:junit4]> at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447) > [junit4:junit4]> at > org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:944) > [junit4:junit4]> at java.lang.Thread.run(Thread.java:679) > [junit4:junit4] 2> NOTE: test params are: codec=Lucene42: > {dummy=PostingsFormat(name=Direct)}, docValues:{}, sim=DefaultSimilarity, > locale=ar_YE, timezone=Europe/London > [junit4:junit4] 2> NOTE: Linux 3.5.0-27-generic amd64/Sun Microsystems Inc. > 1.6.0_27 (64-bit)/cpus=2,threads=1,free=96085824,total=223412224 > [junit4:junit4] 2> NOTE: All tests run in this JVM: [TestRandomChains] > [junit4:junit4] Completed in 16.32s, 1 test, 1 error <<< FAILURES! > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time
[ https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6121: -- Labels: random-chains (was: ) > Fix CachingTokenFilter to propagate reset() the first time > -- > > Key: LUCENE-6121 > URL: https://issues.apache.org/jira/browse/LUCENE-6121 > Project: Lucene - Core > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Labels: random-chains > Fix For: 5.0, 6.0 > > Attachments: > LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, > LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch > > > CachingTokenFilter should have been propagating reset() _but only the first > time_ and thus you would then use CachingTokenFilter in a more normal way – > wrap it and call reset() then increment in a loop, etc., instead of knowing > you need to reset() on what it wraps but not this token filter itself. That's > weird. It's ab-normal for a TokenFilter to never propagate reset, so every > user of CachingTokenFilter to date has worked around this by calling reset() > on the underlying input instead of the final wrapping token filter > (CachingTokenFilter in this case). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8092) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-8092: -- Labels: random-chains (was: ) > TestRandomChains failure > > > Key: LUCENE-8092 > URL: https://issues.apache.org/jira/browse/LUCENE-8092 > Project: Lucene - Core > Issue Type: Bug >Reporter: Alan Woodward >Priority: Major > Labels: random-chains > > https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.2/1/ > ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains > -Dtests.seed=C006DAD2E1FC77AF -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true > -Dtests.linedocsfile=/Users/romseygeek/projects/lucene-test-data/enwiki.random.lines.txt > -Dtests.locale=tr -Dtests.timezone=Europe/Simferopol -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > Reproduces locally on 7.2 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10352: --- Labels: random-chains (was: ) > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Labels: random-chains > Fix For: 9.1, 10.0 (main) > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10362) JapaneseNumberFilter messes up offsets
[ https://issues.apache.org/jira/browse/LUCENE-10362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10362: --- Labels: random-chains (was: ) > JapaneseNumberFilter messes up offsets > -- > > Key: LUCENE-10362 > URL: https://issues.apache.org/jira/browse/LUCENE-10362 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Labels: random-chains > > It is a tokenfilter, tries to change offsets, so of course TestRandomChains > finds bugs in it > {noformat} > 2> NOTE: reproduce with: gradlew test --tests > TestRandomChains.testRandomChains -Dtests.seed=CE566FFD0024BDB0 > -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=en-PG > -Dtests.timezone=CST -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {noformat} > {noformat} > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to > /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> stage 0: m<[0-1] +1> mi<[0-2] +1> i<[1-2] +1> iy<[1-3] +1> y<[2-3] +1> > yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> k<[4-5] +1> kt<[4-6] +1> t<[5-6] +1> t > <[5-7] +1> <[6-7] +1> 2<[6-8] +1> 2<[7-8] +1> 26<[7-9] +1> 6<[8-9] +1> > 64<[8-10] +1> > 2> stage 1: m<[0-1] +1> mi<[0-2] +1> i<[1-2] +1> iy<[1-3] +1> y<[2-3] +1> > yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> k<[4-5] +1> kt<[4-6] +1> t<[5-6] +1> t > <[5-7] +1> <[6-7] +1> 2<[6-8] +1> 2<[7-8] +1> 26<[7-9] +1> 6<[8-9] +1> > 64<[8-10] +1> > 2> stage 2: n<[3-4] +1> nk<[3-5] +1> word<[3-5] +0> k<[4-5] +1> > word<[4-5] +0> kt<[4-6] +1> word<[4-6] +0> t<[5-6] +1> > word<[5-6] +0> t <[5-7] +1> <[6-7] +1> 2<[6-8] +1> > word<[6-8] +0> 2<[7-8] +1> word<[7-8] +0> 26<[7-9] +1> > word<[7-9] +0> 6<[8-9] +1> 64<[8-10] +1> word<[8-10] +0> > 2> last stage: yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> word<[3-5] > +0> k<[4-5] +1> word<[4-5] +0> kt<[4-6] +1> word<[4-6] > +0> t<[5-6] +1> word<[5-6] +0> t <[5-7] +1> <[6-7] +1> 2<[6-8] > +1> word<[6-8] +0> 2<[7-8] +1> word<[7-8] +0> 26<[7-9] > +1> word<[7-9] +0> 6<[8-9] +1> word<[8-10] +0> > 2> TEST FAIL: useCharFilter=false text='miynkt 264957329' > 2> Exception from random analyzer: > 2> charfilters= > 2> tokenizer= > 2> org.apache.lucene.analysis.ngram.NGramTokenizer() > 2> filters= > 2> > Conditional:org.apache.lucene.analysis.icu.ICUNormalizer2Filter(OneTimeWrapper@3b5fdc7f > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, > com.ibm.icu.impl.Norm2AllModes$ComposeNormalizer2@5ef6381c) > 2> > Conditional:org.apache.lucene.analysis.miscellaneous.TypeAsSynonymFilter(OneTimeWrapper@3e803db2 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0, > ) > 2> > Conditional:org.apache.lucene.analysis.ja.JapaneseNumberFilter(OneTimeWrapper@20de0223 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,keyword=false) >> java.lang.IllegalStateException: last stage: inconsistent endOffset > at pos=17: 9 vs 10; token=word >> at > __randomizedtesting.SeedInfo.seed([CE566FFD0024BDB0:F3B7469C4736A070]:0) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:164) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10363) JapaneseCompletionFilter messes up offsets
[ https://issues.apache.org/jira/browse/LUCENE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10363: --- Labels: random-chains (was: ) > JapaneseCompletionFilter messes up offsets > -- > > Key: LUCENE-10363 > URL: https://issues.apache.org/jira/browse/LUCENE-10363 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Labels: random-chains > > It is a tokenfilter, tries to change offsets, so of course TestRandomChains > finds bugs in it: > {noformat} > NOTE: reproduce with: gradlew test --tests > TestRandomChains.testRandomChainsWithLargeStrings > -Dtests.seed=E233A5FAC016E02 -Dtests.nightly=true -Dtests.slow=true > -Dtests.locale=en-TV -Dtests.timezone=Asia/Saigon -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {noformat} > {noformat} > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to > /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_54/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> stage 0: lk<[1-3] +1> p<[6-7] +1> ngtoixtmldzsjz<[10-24] +1> uoq<[25-28] > +1> HANGUL<[28-28] +1> o<[29-30] +1> HANGUL<[31-31] +1> VulliPHsZzn<[32-43] > +1> > 2> stage 1: lk<[1-3] +1> 85<[1-3] +0> p<[6-7] +1> 70<[6-7] +0> > ngtoixtmldzsjz<[10-24] +1> 653543<[10-24] +0> uoq<[25-28] +1> 05<[25-28] > +0> HANGUL<[28-28] +1> 565800<[28-28] +0> o<[29-30] +1> 00<[29-30] +0> > HANGUL<[31-31] +1> 565800<[31-31] +0> VulliPHsZzn<[32-43] +1> 787460<[32-43] > +0> > 2> stage 2: ngtoixtmldzsjz 653543<[10-24] +0> 653543<[10-24] +1> 653543 > uoq<[10-28] +0> uoq<[25-28] +1> uoq 05<[25-28] +0> 05<[25-28] +1> > 05 HANGUL<[25-28] +0> HANGUL<[28-28] +1> HANGUL 565800<[28-28] +0> > 565800<[28-28] +1> 565800 o<[28-30] +0> o<[29-30] +1> o 00<[29-30] +0> > 00<[29-30] +1> 00 HANGUL<[29-31] +0> HANGUL<[31-31] +1> HANGUL > 565800<[31-31] +0> 565800<[31-31] +1> 565800 VulliPHsZzn<[31-43] +0> > VulliPHsZzn<[32-43] +1> > 2> last stage: ngtoixtmldzsjz<[10-24] +1> ngtoixtmldzsjz 653543<[10-24] +0> > 653543<[10-24] +1> 653543 uoq<[10-28] +0> uoq<[25-28] +1> uoq 05<[25-28] > +1> 05<[25-28] +1> 05 HANGUL<[25-28] +1> HANGUL<[28-28] +1> HANGUL > 565800<[28-28] +0> 565800<[28-28] +1> 565800 o<[28-30] +0> o<[29-30] +1> o > 00<[29-30] +0> 00<[29-30] +1> 00 HANGUL<[29-31] +0> > HANGUL<[31-31] +1> HANGUL 565800<[31-31] +1> 565800<[31-31] +1> 565800 > VulliPHsZzn<[31-43] +0> > 2> TEST FAIL: useCharFilter=true text='[lk[-.p|) ngtoixtmldzsjz uoqao > aVulliPHsZzn wxsk' > 2> Exception from random analyzer: > 2> charfilters= > 2> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, > , java.io.StringReader@5b3b54eb) > 2> tokenizer= > 2> > org.apache.lucene.analysis.classic.ClassicTokenizer(org.apache.lucene.util.AttributeFactory$1@e29311e9) > 2> filters= > 2> > org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(ValidatingTokenFilter@32a6de77 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, > true) > 2> > org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@3d044414 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, > q) > 2> > Conditional:org.apache.lucene.analysis.ja.JapaneseCompletionFilter(OneTimeWrapper@435207ec > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,reading=null,reading > (en)=null,pronunciation=null,pronunciation (en)=null, INDEX) >> java.lang.IllegalStateException: last stage: inconsistent endOffset > at pos=19: 31 vs 43; token=565800 VulliPHsZzn > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10361) KoreanNumberFilter messes up offsets
[ https://issues.apache.org/jira/browse/LUCENE-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10361: --- Labels: random-chains (was: ) > KoreanNumberFilter messes up offsets > > > Key: LUCENE-10361 > URL: https://issues.apache.org/jira/browse/LUCENE-10361 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Labels: random-chains > > It is a tokenfilter, tries to change offsets, so of course TestRandomChains > finds bugs in it: > {noformat} > NOTE: reproduce with: gradlew test --tests TestRandomChains.testRandomChains > -Dtests.seed=12BC606B774693E4 -Dtests.nightly=true -Dtests.slow=true > -Dtests.locale=om-Latn-ET -Dtests.timezone=Australia/Yancowinna > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {noformat} > {noformat} > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to > /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> stage 0: 뱅<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 履<[6-7] +1> jEqyzUT<[8-15] > +1> > 2> stage 1: 00<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 00<[6-7] +1> > 154300<[8-15] +1> 454300<[8-15] +0> > 2> last stage: 0<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 00<[6-7] +1> > 454300<[8-15] +0> > 2> TEST FAIL: useCharFilter=false > text='\ubc45\u0191(\u0117\ud8ad\udf0a\uf9df jEqyzUT ' > 2> Exception from random analyzer: > 2> charfilters= > 2> > org.apache.lucene.analysis.cjk.CJKWidthCharFilter(java.io.StringReader@17af5384) > 2> > org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@33e5bdbb, > org.apache.lucene.analysis.cjk.CJKWidthCharFilter@1aafd271) > 2> tokenizer= > 2> > org.apache.lucene.analysis.icu.segmentation.ICUTokenizer(org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig@4e6f4690) > 2> filters= > 2> > Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@34215eb7 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common, > false) > 2> > org.apache.lucene.analysis.ko.KoreanNumberFilter(ValidatingTokenFilter@7b4a2a5b > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common,keyword=false) >> java.lang.IllegalStateException: last stage: inconsistent > startOffset at pos=3: 6 vs 8; token=454300 >> at > __randomizedtesting.SeedInfo.seed([12BC606B774693E4:2F5D490A30548E24]:0) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) >> at > org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains(TestRandomChains.java:915) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10360) BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term text
[ https://issues.apache.org/jira/browse/LUCENE-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10360: --- Labels: random-chains (was: ) > BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term > text > -- > > Key: LUCENE-10360 > URL: https://issues.apache.org/jira/browse/LUCENE-10360 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Uwe Schindler >Priority: Major > Labels: random-chains > > Error seen: > {noformat} > 2> TEST FAIL: useCharFilter=true text='Uf?F ?wlu{0
[jira] [Updated] (LUCENE-10353) Add null injection to analyzer integration tests (e.g. TestRandomChains)
[ https://issues.apache.org/jira/browse/LUCENE-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10353: --- Labels: random-chains (was: ) > Add null injection to analyzer integration tests (e.g. TestRandomChains) > > > Key: LUCENE-10353 > URL: https://issues.apache.org/jira/browse/LUCENE-10353 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Assignee: Uwe Schindler >Priority: Major > Labels: random-chains > > These tests inject random parameter values (from argumentProviders). Some > generated values may be illegal and IllegalArgumentException is "allowed" if > the constructor returns it. None of the values should cause failures at > runtime. > But for object types, we never inject null values (unless the > argumentProvider were to do it itself). We should do this some low % of the > time, and "allow" ctors to return NPE too. > I see bugs in some of the analyzers where they are just a missing null check > in the constructor. It is important to fail on invalid configuration up-front > in the ctor, rather than failing e.g. at index time. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10358) JapaneseIterationMarkCharFilter: TestRandomChains fails with incorrect offsets or causes IndexOutOfBounds
[ https://issues.apache.org/jira/browse/LUCENE-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10358: --- Labels: random-chains (was: ) > JapaneseIterationMarkCharFilter: TestRandomChains fails with incorrect > offsets or causes IndexOutOfBounds > - > > Key: LUCENE-10358 > URL: https://issues.apache.org/jira/browse/LUCENE-10358 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Uwe Schindler >Priority: Major > Labels: random-chains > > Failures seen: > {noformat} > $ gradlew :lucene:analysis:integration.tests:test --tests > TestRandomChains.testRandomChainsWithLargeStrings > -Dtests.seed=AA632771CC823702 -Dtests.slow=true -Dtests.locale=fr-MF > -Dtests.timezone=America/Panama -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to C:\Users\Uwe > Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> stage 0: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> ppkarrpf<[7-14] +1> 1<[16-17] +1> > 5<[18-19] +1> > 2> stage 1: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] > +1> 759700<[7-14] +0> 1<[16-17] +1> 5<[18-19] +1> 00<[18-19] +0> > 2> stage 2: ÉÆû<[0-2] +1> ÉÆä<[4-6] +1> 00<[4-6] +0> ppkarrpf<[7-14] > +1> 759700<[7-14] +0> 1<[16-17] +1> 00<[18-19] +0> > 2> TEST FAIL: useCharFilter=true text='\ud801\udc96\ud801\udcaa\ud801\udc84 > ppkarpf {1,5}g?)u em mbm hbil' > 2> Exception from random analyzer: > 2> charfilters= > 2> > org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter(java.io.StringReader@105e6aa7, > true, false) > 2> tokenizer= > 2> org.apache.lucene.analysis.th.ThaiTokenizer() > 2> filters= > 2> > Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@79889b7f > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, > true) > 2> > org.apache.lucene.analysis.ja.JapaneseNumberFilter(ValidatingTokenFilter@53a9e96c > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false) > 2> > org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter(ValidatingTokenFilter@6cb4578d > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false, > > org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter$StemmerOverrideMap@51fc8124) >> java.lang.IllegalStateException: stage 2: inconsistent startOffset > at pos=3: 16 vs 18; token=00 >> at > __randomizedtesting.SeedInfo.seed([AA632771CC823702:C038986095CC17F1]:0) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) >> at > org.apache.lucene.analysis.common@10.0.0-SNAPSHOT/org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.incrementToken(StemmerOverrideFilter.java:67) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:81) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) >> at > org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943) > {noformat} > and also: > {noformat} > $ gradlew :lucene:analysis:integration.tests:test --tests > TestRandomChains.testRandomChains -Dtests.seed=3A0D0E91E0CA5BFC > -Dtests.slow=true -Dtests.locale=nmg-CM -Dtests.timezone=Antarctica/Vostok > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to C:\Users\Uwe > Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test_17\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> TEST FAIL: useCharFilter=false text='' > 2> Except
[jira] [Updated] (LUCENE-10359) KoreanTokenizer: TestRandomChains fails with incorrect offsets
[ https://issues.apache.org/jira/browse/LUCENE-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10359: --- Labels: random-chains (was: ) > KoreanTokenizer: TestRandomChains fails with incorrect offsets > -- > > Key: LUCENE-10359 > URL: https://issues.apache.org/jira/browse/LUCENE-10359 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Uwe Schindler >Priority: Major > Labels: random-chains > > It looks like KoreanTokenizer is causing this (NORI), but Kuromoji may be > affected in the same way: > {noformat} > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to C:\Users\Uwe > Schindler\Projects\lucene\lucene\lucene\analysis\integration.tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> stage 0: e<[2-3] +1> ek<[4-6] +1> oy<[8-10] +1> 1<[11-12] +1> > zzkuxp<[13-19] +1> > 2> stage 1: e<[2-3] +1> ek<[4-6] +1> oy<[8-10] +1> 1<[11-12] +1> > zzkuxp<[13-19] +1> > 2> stage 2: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> > oy<[8-10] +1> oy 1<[8-12] +0> 1<[11-12] +1> 1 zzkuxp<[11-19] +0> > 2> stage 3: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> > oy<[8-10] +1> oy 1<[8-12] +0> 1<[11-12] +1> 1 zzkuxp<[11-19] +0> > 2> last stage: e<[2-3] +1> e ek<[2-6] +0> ek<[4-6] +1> ek oy<[4-10] +0> > oy<[8-10] +1> oy 1<[8-12] +0> 1 zzkuxp<[11-19] +0> > 2> TEST FAIL: useCharFilter=false text='?.e|ek|]oy{1 zzkuxp ZyzzV ycuqjnv > axtpppvk \u233b\u23c8\u2314\u232e\u236e\u238d\u235e x d \"' > 2> Exception from random analyzer: > 2> charfilters= > 2> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, > ifywufhi, java.io.StringReader@48586999) > 2> > org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@65036838, > org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@11d4ba35) > 2> tokenizer= > 2> org.apache.lucene.analysis.ko.KoreanTokenizer() > 2> filters= > 2> > org.apache.lucene.analysis.en.KStemFilter(ValidatingTokenFilter@595d7938 > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false) > 2> > org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@13d08b48 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false, > u) > 2> > org.apache.lucene.analysis.util.ElisionFilter(ValidatingTokenFilter@6396b917 > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false, > [fh, hiiwwxyyd, fcpodqor, qogvhmywr, l, icad]) > 2> > Conditional:org.apache.lucene.analysis.ko.KoreanNumberFilter(OneTimeWrapper@5f0558f6 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,posType=null,leftPOS=null,rightPOS=null,morphemes=null,reading=null,keyword=false) >> java.lang.IllegalStateException: last stage: inconsistent > startOffset at pos=2: 8 vs 11; token=1 zzkuxp >> at > __randomizedtesting.SeedInfo.seed([E4552C7844FC2DA3:8E0E93691DB20D50]:0) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) >> at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) >> at > org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:943) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue
magibney commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-1005887111 Apologies for the delay, and thanks for bearing with me, @spyk. I'm inclined to err on the cautious side with this, since I'm not as familiar with this part of the codebase or the OpenNLP community. That said, this seems really straightforward to me, and clearly an improvement. (I considered, but decided against, suggesting to adopt `computeIfAbsent()` in place of `cached = map.get(...); if (cached==null) map.put(...)` ... the former is "newer" and semantically clearer, but the latter is more idiomatic to this part of the codebase). The only thing giving me pause now is that I notice we're changing the return type of a _public_ method. If there are third-party extensions that rely on the existing return type of this method, they will break. An easy fix, but still ... I'm additionally chastened to see that the issue introducing this code, [LUCENE-2899](https://issues.apache.org/jira/browse/LUCENE-2899), has 36 "votes" and 68 "watchers" (!) -- so the chance of this being a breaking change for some third party extension are not insignificant (FWIW I'd be surprised if third parties actually called this method in practice, but I'm not sure I have the perspective to judge :slightly_smiling_face:) I dislike the idea of maintaining backward compatibility "just because", when this seems like such a clear improvement, and when I suspect that the `public` access for these static methods may not necessarily represent an explicit design choice (?); and with the 9.0 release still quite fresh, arguably now would not be the worst time to break backcompat (esp. in such a minor way). But I'm afraid I really would like another committer (ideally, @sarowe?) to weigh in on this. Thank you for your patience! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469420#comment-17469420 ] Cameron VandenBerg commented on LUCENE-10157: - Hi Adrien, it makes sense to me why you would like to move Indri code to the sandbox, but I am still hesitant about moving the smoothingScore API because that required changes outside of core. I am worried that the smoothingScore changes will be lost, which is the building block to a lot of functionality in lucene. Is there anything that I could do to help keep the smoothingScore in core? I am happy to submit a new smaller PR that simply fixes the IndriAndScorer and adds additional tests. I am open to suggestions and happy to work with you. > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #536: Don't store graph offsets for HNSW graph
msokolov commented on pull request #536: URL: https://github.com/apache/lucene/pull/536#issuecomment-1005868557 Thanks for the thorough testing, @mayya-sharipova. I think we want to minimize heap usage, the index size cost is small; basically we are trading off on-heap for on-disk/off-heap, which is always a tradeoff we like. The search time change seems like noise? So +1 from me. Also, glad to see the fanout numbers are sane :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #444: LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test
jpountz commented on pull request #444: URL: https://github.com/apache/lucene/pull/444#issuecomment-1005865609 Correct, changes should no longer be backported to `branch_8x`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778933921 ## File path: gradle/java/modules.gradle ## @@ -27,194 +29,167 @@ allprojects { modularity.inferModulePath.set(false) } -// Map convention configuration names to "modular" corresponding configurations. -Closure moduleConfigurationNameFor = { String configurationName -> - return "module" + configurationName.capitalize().replace("Classpath", "Path") -} - -// -// For each source set, create explicit configurations for declaring modular dependencies. -// These "modular" configurations correspond 1:1 to Gradle's conventions but have a 'module' prefix -// and a capitalized remaining part of the conventional name. For example, an 'api' configuration in -// the main source set would have a corresponding 'moduleApi' configuration for declaring modular -// dependencies. -// -// Gradle's java plugin "convention" configurations extend from their modular counterparts -// so all dependencies end up on classpath by default for backward compatibility with other -// tasks and gradle infrastructure. // -// At the same time, we also know which dependencies (and their transitive graph of dependencies!) -// should be placed on module-path only. -// -// Note that an explicit configuration of modular dependencies also opens up the possibility of automatically -// validating whether the dependency configuration for a gradle project is consistent with the information in -// the module-info descriptor because there is a (nearly?) direct correspondence between the two: -// -// moduleApi- 'requires transitive' -// moduleImplementation - 'requires' -// moduleCompileOnly- 'requires static' +// Configure modular extensions for each source set. // project.sourceSets.all { SourceSet sourceSet -> - ConfigurationContainer configurations = project.configurations - - // Create modular configurations for convention configurations. - Closure createModuleConfigurationForConvention = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate(moduleConfigurationNameFor(configurationName)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(false) -conventionConfiguration.extendsFrom(moduleConfiguration) - -project.logger.info("Created module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration moduleApi = createModuleConfigurationForConvention(sourceSet.apiConfigurationName) - Configuration moduleImplementation = createModuleConfigurationForConvention(sourceSet.implementationConfigurationName) - moduleImplementation.extendsFrom(moduleApi) - Configuration moduleRuntimeOnly = createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName) - Configuration moduleCompileOnly = createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName) - // sourceSet.compileOnlyApiConfigurationName // This seems like a very esoteric use case, leave out. - - // Set up compilation module path configuration combining corresponding convention configurations. - Closure createResolvableModuleConfiguration = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate( -moduleConfigurationNameFor(conventionConfiguration.name)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(true) -moduleConfiguration.attributes { - // Prefer class folders over JARs. The exception is made for tests projects which require a composition - // of classes and resources, otherwise split into two folders. - if (project.name.endsWith(".tests")) { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.JAR)) - } else { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.CLASSES)) - } -} - -project.logger.info("Created resolvable module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration compileModulePathConfiguration = createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName) - compileModulePathConfiguration.extendsFrom(moduleCompileOnly, moduleImplementation) - - Configuration runtimeModulePathConfiguration = c
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469399#comment-17469399 ] Adrien Grand commented on LUCENE-10157: --- Hi Cameron, the sandbox is actually part of Lucene (https://github.com/apache/lucene/tree/main/lucene/sandbox), it is just a different jar from lucene-core, just like the query parser you added under lucene/queryparser would have been released in a different JAR (lucene-queryparser-\{version}.jar). I'm not suggestion we do not add it to Lucene, just in a less intrusive way. And having it in the sandbox will leave us more time to think more about how this could be integrated with dynamic pruning, two-phase iteration, the similarity API, etc. > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778925282 ## File path: gradle/documentation/render-javadoc.gradle ## @@ -57,7 +57,7 @@ allprojects { outputDir = project.javadoc.destinationDir } -if (project.path == ':lucene:luke' || project.path.endsWith(".tests")) { +if (project.path == ':lucene:luke' || !(project in rootProject.ext.mavenProjects)) { Review comment: Ah it is inverse. All projects that will land in Maven central. Yeah thats a better check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778931152 ## File path: gradle/java/modules.gradle ## @@ -27,194 +29,167 @@ allprojects { modularity.inferModulePath.set(false) } -// Map convention configuration names to "modular" corresponding configurations. -Closure moduleConfigurationNameFor = { String configurationName -> - return "module" + configurationName.capitalize().replace("Classpath", "Path") -} - -// -// For each source set, create explicit configurations for declaring modular dependencies. -// These "modular" configurations correspond 1:1 to Gradle's conventions but have a 'module' prefix -// and a capitalized remaining part of the conventional name. For example, an 'api' configuration in -// the main source set would have a corresponding 'moduleApi' configuration for declaring modular -// dependencies. -// -// Gradle's java plugin "convention" configurations extend from their modular counterparts -// so all dependencies end up on classpath by default for backward compatibility with other -// tasks and gradle infrastructure. // -// At the same time, we also know which dependencies (and their transitive graph of dependencies!) -// should be placed on module-path only. -// -// Note that an explicit configuration of modular dependencies also opens up the possibility of automatically -// validating whether the dependency configuration for a gradle project is consistent with the information in -// the module-info descriptor because there is a (nearly?) direct correspondence between the two: -// -// moduleApi- 'requires transitive' -// moduleImplementation - 'requires' -// moduleCompileOnly- 'requires static' +// Configure modular extensions for each source set. // project.sourceSets.all { SourceSet sourceSet -> - ConfigurationContainer configurations = project.configurations - - // Create modular configurations for convention configurations. - Closure createModuleConfigurationForConvention = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate(moduleConfigurationNameFor(configurationName)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(false) -conventionConfiguration.extendsFrom(moduleConfiguration) - -project.logger.info("Created module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration moduleApi = createModuleConfigurationForConvention(sourceSet.apiConfigurationName) - Configuration moduleImplementation = createModuleConfigurationForConvention(sourceSet.implementationConfigurationName) - moduleImplementation.extendsFrom(moduleApi) - Configuration moduleRuntimeOnly = createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName) - Configuration moduleCompileOnly = createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName) - // sourceSet.compileOnlyApiConfigurationName // This seems like a very esoteric use case, leave out. - - // Set up compilation module path configuration combining corresponding convention configurations. - Closure createResolvableModuleConfiguration = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate( -moduleConfigurationNameFor(conventionConfiguration.name)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(true) -moduleConfiguration.attributes { - // Prefer class folders over JARs. The exception is made for tests projects which require a composition - // of classes and resources, otherwise split into two folders. - if (project.name.endsWith(".tests")) { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.JAR)) - } else { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.CLASSES)) - } -} - -project.logger.info("Created resolvable module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration compileModulePathConfiguration = createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName) - compileModulePathConfiguration.extendsFrom(moduleCompileOnly, moduleImplementation) - - Configuration runtimeModulePathConfiguration = c
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778930571 ## File path: gradle/java/modules.gradle ## @@ -27,194 +29,167 @@ allprojects { modularity.inferModulePath.set(false) } -// Map convention configuration names to "modular" corresponding configurations. -Closure moduleConfigurationNameFor = { String configurationName -> - return "module" + configurationName.capitalize().replace("Classpath", "Path") -} - -// -// For each source set, create explicit configurations for declaring modular dependencies. -// These "modular" configurations correspond 1:1 to Gradle's conventions but have a 'module' prefix -// and a capitalized remaining part of the conventional name. For example, an 'api' configuration in -// the main source set would have a corresponding 'moduleApi' configuration for declaring modular -// dependencies. -// -// Gradle's java plugin "convention" configurations extend from their modular counterparts -// so all dependencies end up on classpath by default for backward compatibility with other -// tasks and gradle infrastructure. // -// At the same time, we also know which dependencies (and their transitive graph of dependencies!) -// should be placed on module-path only. -// -// Note that an explicit configuration of modular dependencies also opens up the possibility of automatically -// validating whether the dependency configuration for a gradle project is consistent with the information in -// the module-info descriptor because there is a (nearly?) direct correspondence between the two: -// -// moduleApi- 'requires transitive' -// moduleImplementation - 'requires' -// moduleCompileOnly- 'requires static' +// Configure modular extensions for each source set. // project.sourceSets.all { SourceSet sourceSet -> - ConfigurationContainer configurations = project.configurations - - // Create modular configurations for convention configurations. - Closure createModuleConfigurationForConvention = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate(moduleConfigurationNameFor(configurationName)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(false) -conventionConfiguration.extendsFrom(moduleConfiguration) - -project.logger.info("Created module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration moduleApi = createModuleConfigurationForConvention(sourceSet.apiConfigurationName) - Configuration moduleImplementation = createModuleConfigurationForConvention(sourceSet.implementationConfigurationName) - moduleImplementation.extendsFrom(moduleApi) - Configuration moduleRuntimeOnly = createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName) - Configuration moduleCompileOnly = createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName) - // sourceSet.compileOnlyApiConfigurationName // This seems like a very esoteric use case, leave out. - - // Set up compilation module path configuration combining corresponding convention configurations. - Closure createResolvableModuleConfiguration = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate( -moduleConfigurationNameFor(conventionConfiguration.name)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(true) -moduleConfiguration.attributes { - // Prefer class folders over JARs. The exception is made for tests projects which require a composition - // of classes and resources, otherwise split into two folders. - if (project.name.endsWith(".tests")) { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.JAR)) - } else { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.CLASSES)) - } -} - -project.logger.info("Created resolvable module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration compileModulePathConfiguration = createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName) - compileModulePathConfiguration.extendsFrom(moduleCompileOnly, moduleImplementation) - - Configuration runtimeModulePathConfiguration = c
[GitHub] [lucene] mayya-sharipova commented on pull request #536: Don't store graph offsets for HNSW graph
mayya-sharipova commented on pull request #536: URL: https://github.com/apache/lucene/pull/536#issuecomment-1005844307 I've also run the comparison on a bigger dataset: deep-image-96-angular of 10M docs. M: 16; efConstruction: 500 Disk size before the change: 4.2G; after change: 4.3G => 2% increase Not much affect on search performance: | | baseline recall | baseline QPS | candidate recall | candidate QPS | | --- | --: | ---: | ---: | : | | n_cands=10 | 0.726 | 1527.894 |0.728 | 870.721 | | n_cands=20 | 0.793 | 1350.206 |0.794 | 1364.301 | | n_cands=40 | 0.862 | 1053.906 |0.862 | 1068.798 | | n_cands=80 | 0.917 | 737.711 |0.918 | 741.551 | | n_cands=120 | 0.942 | 573.783 |0.942 | 589.756 | | n_cands=200 | 0.964 | 402.166 |0.964 | 414.730 | | n_cands=400 | 0.982 | 237.545 |0.982 | 251.678 | | n_cands=600 | 0.988 | 174.223 |0.988 | 177.968 | | n_cands=800 | 0.991 | 137.420 |0.991 | 143.290 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778926381 ## File path: gradle/documentation/render-javadoc.gradle ## @@ -57,7 +57,7 @@ allprojects { outputDir = project.javadoc.destinationDir } -if (project.path == ':lucene:luke' || project.path.endsWith(".tests")) { +if (project.path == ':lucene:luke' || !(project in rootProject.ext.mavenProjects)) { Review comment: by the way theres also the better operator `project !in rootProject.ext.mavenProjects` I used is elsewhere already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778924772 ## File path: gradle/documentation/render-javadoc.gradle ## @@ -57,7 +57,7 @@ allprojects { outputDir = project.javadoc.destinationDir } -if (project.path == ':lucene:luke' || project.path.endsWith(".tests")) { +if (project.path == ':lucene:luke' || !(project in rootProject.ext.mavenProjects)) { Review comment: What is this `mavenProjects`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on pull request #571: URL: https://github.com/apache/lucene/pull/571#issuecomment-1005839073 I think you have to solve the conflicts caused by the change for running "gradlew beast". I leave that up to you. Maybe it works better after this branch is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778922786 ## File path: gradle/java/modules.gradle ## @@ -27,194 +29,167 @@ allprojects { modularity.inferModulePath.set(false) } -// Map convention configuration names to "modular" corresponding configurations. -Closure moduleConfigurationNameFor = { String configurationName -> - return "module" + configurationName.capitalize().replace("Classpath", "Path") -} - -// -// For each source set, create explicit configurations for declaring modular dependencies. -// These "modular" configurations correspond 1:1 to Gradle's conventions but have a 'module' prefix -// and a capitalized remaining part of the conventional name. For example, an 'api' configuration in -// the main source set would have a corresponding 'moduleApi' configuration for declaring modular -// dependencies. -// -// Gradle's java plugin "convention" configurations extend from their modular counterparts -// so all dependencies end up on classpath by default for backward compatibility with other -// tasks and gradle infrastructure. // -// At the same time, we also know which dependencies (and their transitive graph of dependencies!) -// should be placed on module-path only. -// -// Note that an explicit configuration of modular dependencies also opens up the possibility of automatically -// validating whether the dependency configuration for a gradle project is consistent with the information in -// the module-info descriptor because there is a (nearly?) direct correspondence between the two: -// -// moduleApi- 'requires transitive' -// moduleImplementation - 'requires' -// moduleCompileOnly- 'requires static' +// Configure modular extensions for each source set. // project.sourceSets.all { SourceSet sourceSet -> - ConfigurationContainer configurations = project.configurations - - // Create modular configurations for convention configurations. - Closure createModuleConfigurationForConvention = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate(moduleConfigurationNameFor(configurationName)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(false) -conventionConfiguration.extendsFrom(moduleConfiguration) - -project.logger.info("Created module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration moduleApi = createModuleConfigurationForConvention(sourceSet.apiConfigurationName) - Configuration moduleImplementation = createModuleConfigurationForConvention(sourceSet.implementationConfigurationName) - moduleImplementation.extendsFrom(moduleApi) - Configuration moduleRuntimeOnly = createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName) - Configuration moduleCompileOnly = createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName) - // sourceSet.compileOnlyApiConfigurationName // This seems like a very esoteric use case, leave out. - - // Set up compilation module path configuration combining corresponding convention configurations. - Closure createResolvableModuleConfiguration = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate( -moduleConfigurationNameFor(conventionConfiguration.name)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(true) -moduleConfiguration.attributes { - // Prefer class folders over JARs. The exception is made for tests projects which require a composition - // of classes and resources, otherwise split into two folders. - if (project.name.endsWith(".tests")) { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.JAR)) - } else { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.CLASSES)) - } -} - -project.logger.info("Created resolvable module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration compileModulePathConfiguration = createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName) - compileModulePathConfiguration.extendsFrom(moduleCompileOnly, moduleImplementation) - - Configuration runtimeModulePathConfiguration = c
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469393#comment-17469393 ] Cameron VandenBerg commented on LUCENE-10157: - Hi [~jpountz], I would really love to be able to keep these changes in lucene if possible. I am very happy to write more tests and make any changes you feel are necessary. I am free to work on this right now and can do quick turnarounds. I have worked a lot more with the lucene testing framework now, and I feel that I can do a good job showing that the smoothingScore API does work. The reason I am hopeful that we can keep the smoothingScore is that it is important to our reasearch. I am actually actively using the smoothingScore API in our research at Carnegie Mellon University for creating a new search dataset. I do have it working in my project because I have some additional functionality that I have not committed to lucene yet because I was trying to minimize the scope of my first PR. Thank you for your time! Let me know what I can do to help keep the smoothingScore functionality in the lucene API. > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on a change in pull request #571: URL: https://github.com/apache/lucene/pull/571#discussion_r778921454 ## File path: gradle/java/modules.gradle ## @@ -27,194 +29,167 @@ allprojects { modularity.inferModulePath.set(false) } -// Map convention configuration names to "modular" corresponding configurations. -Closure moduleConfigurationNameFor = { String configurationName -> - return "module" + configurationName.capitalize().replace("Classpath", "Path") -} - -// -// For each source set, create explicit configurations for declaring modular dependencies. -// These "modular" configurations correspond 1:1 to Gradle's conventions but have a 'module' prefix -// and a capitalized remaining part of the conventional name. For example, an 'api' configuration in -// the main source set would have a corresponding 'moduleApi' configuration for declaring modular -// dependencies. -// -// Gradle's java plugin "convention" configurations extend from their modular counterparts -// so all dependencies end up on classpath by default for backward compatibility with other -// tasks and gradle infrastructure. // -// At the same time, we also know which dependencies (and their transitive graph of dependencies!) -// should be placed on module-path only. -// -// Note that an explicit configuration of modular dependencies also opens up the possibility of automatically -// validating whether the dependency configuration for a gradle project is consistent with the information in -// the module-info descriptor because there is a (nearly?) direct correspondence between the two: -// -// moduleApi- 'requires transitive' -// moduleImplementation - 'requires' -// moduleCompileOnly- 'requires static' +// Configure modular extensions for each source set. // project.sourceSets.all { SourceSet sourceSet -> - ConfigurationContainer configurations = project.configurations - - // Create modular configurations for convention configurations. - Closure createModuleConfigurationForConvention = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate(moduleConfigurationNameFor(configurationName)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(false) -conventionConfiguration.extendsFrom(moduleConfiguration) - -project.logger.info("Created module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration moduleApi = createModuleConfigurationForConvention(sourceSet.apiConfigurationName) - Configuration moduleImplementation = createModuleConfigurationForConvention(sourceSet.implementationConfigurationName) - moduleImplementation.extendsFrom(moduleApi) - Configuration moduleRuntimeOnly = createModuleConfigurationForConvention(sourceSet.runtimeOnlyConfigurationName) - Configuration moduleCompileOnly = createModuleConfigurationForConvention(sourceSet.compileOnlyConfigurationName) - // sourceSet.compileOnlyApiConfigurationName // This seems like a very esoteric use case, leave out. - - // Set up compilation module path configuration combining corresponding convention configurations. - Closure createResolvableModuleConfiguration = { String configurationName -> -Configuration conventionConfiguration = configurations.maybeCreate(configurationName) -Configuration moduleConfiguration = configurations.maybeCreate( -moduleConfigurationNameFor(conventionConfiguration.name)) -moduleConfiguration.canBeConsumed(false) -moduleConfiguration.canBeResolved(true) -moduleConfiguration.attributes { - // Prefer class folders over JARs. The exception is made for tests projects which require a composition - // of classes and resources, otherwise split into two folders. - if (project.name.endsWith(".tests")) { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.JAR)) - } else { -attribute(LibraryElements.LIBRARY_ELEMENTS_ATTRIBUTE, objects.named(LibraryElements, LibraryElements.CLASSES)) - } -} - -project.logger.info("Created resolvable module configuration for '${conventionConfiguration.name}': ${moduleConfiguration.name}") -return moduleConfiguration - } - - Configuration compileModulePathConfiguration = createResolvableModuleConfiguration(sourceSet.compileClasspathConfigurationName) - compileModulePathConfiguration.extendsFrom(moduleCompileOnly, moduleImplementation) - - Configuration runtimeModulePathConfiguration = c
[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469392#comment-17469392 ] ASF subversion and git services commented on LUCENE-10291: -- Commit 7572352b7927c8099847d87bb2bb468af6c15958 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7572352 ] LUCENE-10291: Bug fix. > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469388#comment-17469388 ] ASF subversion and git services commented on LUCENE-10291: -- Commit f9ff620ec6b368f94669eb71c5f0c92ac89e6951 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f9ff620 ] LUCENE-10291: CHANGES entry > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469391#comment-17469391 ] ASF subversion and git services commented on LUCENE-10291: -- Commit 5920486671995f3752ae09519bb8a9e931d3056a in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5920486 ] LUCENE-10291: CHANGES entry > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10291. --- Fix Version/s: 9.1 Resolution: Fixed > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469390#comment-17469390 ] ASF subversion and git services commented on LUCENE-10291: -- Commit 738247e78d1d5ff22f3755d0ceca8ad99d4f69f4 in lucene's branch refs/heads/branch_9x from Yannick Welsch [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=738247e ] LUCENE-10291: Only read/write postings when there is at least one indexed field (#539) > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469389#comment-17469389 ] ASF subversion and git services commented on LUCENE-10291: -- Commit 7fdba369415a3882df5f83ce6197a2f638b37fad in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7fdba36 ] LUCENE-10291: Bug fix. > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #571: LUCENE-10328: Module path for compiling and running tests is wrong
uschindler commented on pull request #571: URL: https://github.com/apache/lucene/pull/571#issuecomment-1005827412 > Please take a look at this comment/ chart, Uwe. https://issues.apache.org/jira/browse/LUCENE-10328?focusedCommentId=17468676&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17468676 > > How tests are run depends on the combination of whether source and test source sets are modules. > > We do not have module patching. I don't think it's possible to configure gradle internal infrastructure and plugins to reasonably use it. OK thanks for the picture. Much more information. I was hoping it is like that. That module patching does not work was known to me, I just stumbled on some support methods for it and I did not understand when they are used. I am perfectly fine when we run our tests for a class in classpath mode, although the main sourceset is modular. Technically this won't change the test results, as UNIT tests should only check internal assertions. Of course in reality it is more complicated. What I understood and which is missing in the image: The depenendecies are always modular, so lucene.core is put on module-path, even so we are running tests in classpath mode. This is waht this PR mainly changes, correct? reviously it was not fully working unless you explicitely declared it. What we should now do (after this is merged): - Review implementation vs api dependencies (both on gradle and on module-info). With my other PR for test-random-chains i found an issue because of this. E.g. the phonetic module uses commons-codec also in its public API. Compilation of my module worked for some reason, but forbiddenapis failed, as it was not able to see the classes (when inspecting the method signatures). Which is understandable. Also ICU needs to refer to ICU in an API (gradle) / transitive (module-system) way. So we should enable the exports checks. When developing the last patch about logging in core, i would make java.logging still non-transitive, because it is unlikely that you would use it in downstream code (although theres a public signature using JUL). Because of that I added a `SuppressWarnings("exports")` on the class using it. - Fix up the module-decriptor files to figure out that all is sane Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ywelsch commented on pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field
ywelsch commented on pull request #539: URL: https://github.com/apache/lucene/pull/539#issuecomment-1005809668 Thanks @jpountz! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10291) Only read/write postings when there is at least one indexed field
[ https://issues.apache.org/jira/browse/LUCENE-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469383#comment-17469383 ] ASF subversion and git services commented on LUCENE-10291: -- Commit 8fa7412dec458e42f379cc856bd6ffebe8c6f8e9 in lucene's branch refs/heads/main from Yannick Welsch [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8fa7412 ] LUCENE-10291: Only read/write postings when there is at least one indexed field (#539) > Only read/write postings when there is at least one indexed field > - > > Key: LUCENE-10291 > URL: https://issues.apache.org/jira/browse/LUCENE-10291 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Unlike points, norms, term vectors or doc values which only get written to > the directory when at least one of the fields uses the data structure, > postings always get written to the directory. > While this isn't hurting much, it can be surprising at times, e.g. if you > index with SimpleText you will have a file for postings even though none of > the fields indexes postings. This inconsistency is hidden with the default > codec due to the fact that it uses PerFieldPostingsFormat, which only > delegates to any of the per-field codecs if any of the fields is actually > indexed, so you don't actually get a file if none of the fields is indexed. > We noticed this behavior by creating a codec that throws > UnsupportedOperationException for postings since it's not expected to have > postings, and it always fails writing or reading data. While it's easy to > work around this issue on top of Lucene by using a dummy postings format, it > would be better to fix Lucene to handle postings consistently with other data > structures? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field
jpountz merged pull request #539: URL: https://github.com/apache/lucene/pull/539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469380#comment-17469380 ] Adrien Grand commented on LUCENE-10157: --- If you don't mind, I would like to move this functionality to the sandbox and undo changes to core APIs like \{{Scorable#smoothingScore}} until we have a better idea of whether we should make these things first-class citizens in Lucene's scoring APIs. > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
rmuir commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005769579 > Thanks for the pointer, Robert. I wonder what the "acceptable level" criteria are. ;) I wonder too, i searched some commonly used java libraries mainline branches (`guava`, `log4j2`) and found `AccessController` calls in each. If OpenJDK actually remove theses methods anytime soon, it will break probably every java app right now. So I'm not worried. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
dweiss commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005762127 Thanks for the pointer, Robert. I wonder what the "acceptable level" criteria are. ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10343) Remove MyRandom in favor of test framework random
[ https://issues.apache.org/jira/browse/LUCENE-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10343. --- Fix Version/s: 9.1 Resolution: Fixed > Remove MyRandom in favor of test framework random > - > > Key: LUCENE-10343 > URL: https://issues.apache.org/jira/browse/LUCENE-10343 > Project: Lucene - Core > Issue Type: Test >Reporter: Feng Guo >Priority: Trivial > Fix For: 9.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on a change in pull request #579: URL: https://github.com/apache/lucene/pull/579#discussion_r778884007 ## File path: lucene/misc/src/java/org/apache/lucene/misc/store/HardlinkCopyDirectoryWrapper.java ## @@ -66,7 +67,7 @@ public void copyFrom(Directory from, String srcFile, String destFile, IOContext // only try hardlinks if we have permission to access the files // if not super.copyFrom() will give us the right exceptions suppressedException = -LegacySecurityManager.doPrivileged( +doPrivileged( Review comment: Same at other places -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on a change in pull request #579: URL: https://github.com/apache/lucene/pull/579#discussion_r778883566 ## File path: lucene/misc/src/java/org/apache/lucene/misc/store/HardlinkCopyDirectoryWrapper.java ## @@ -66,7 +67,7 @@ public void copyFrom(Directory from, String srcFile, String destFile, IOContext // only try hardlinks if we have permission to access the files // if not super.copyFrom() will give us the right exceptions suppressedException = -LegacySecurityManager.doPrivileged( +doPrivileged( Review comment: Now we can also remove the cast in next line. As doPrivileged is not overloaded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10343) Remove MyRandom in favor of test framework random
[ https://issues.apache.org/jira/browse/LUCENE-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469370#comment-17469370 ] ASF subversion and git services commented on LUCENE-10343: -- Commit 76d83507beddcc421fc1906e0be4562e16531819 in lucene's branch refs/heads/branch_9x from gf2121 [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=76d8350 ] LUCENE-10343: Remove MyRandom in favor of test framework random (#573) > Remove MyRandom in favor of test framework random > - > > Key: LUCENE-10343 > URL: https://issues.apache.org/jira/browse/LUCENE-10343 > Project: Lucene - Core > Issue Type: Test >Reporter: Feng Guo >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #529: Use CDN to download source release.
jpountz merged pull request #529: URL: https://github.com/apache/lucene/pull/529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
rmuir commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005755345 I don't think the code is going to stop compiling, instead you will just "lose protection"? > In Java 18 and later, we will degrade other Security Manager APIs so that they remain in place but with limited or no functionality. For example, we may revise AccessController::doPrivileged simply to run the given action, or revise System::getSecurityManager always to return null. This will allow libraries that support the Security Manager and were compiled against previous Java releases to continue to work without change or even recompilation. We expect to remove the APIs once the compatibility risk of doing so declines to an acceptable level. https://openjdk.java.net/jeps/411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #525: Modernize release announcement text.
jpountz merged pull request #525: URL: https://github.com/apache/lucene/pull/525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
uschindler commented on a change in pull request #579: URL: https://github.com/apache/lucene/pull/579#discussion_r778881181 ## File path: lucene/core/src/java/org/apache/lucene/util/LegacySecurityManager.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.security.AccessController; +import java.security.PrivilegedAction; + +/** + * Encapsulates access to the security manager, which is deprecated as of Java 17. + * + * @lucene.internal + */ +@SuppressWarnings("removal") +@SuppressForbidden(reason = "security manager") +public final class LegacySecurityManager { + + /** Delegates to {@link AccessController#doPrivileged(PrivilegedAction)}. */ + public static T doPrivileged(PrivilegedAction action) { +return AccessController.doPrivileged(action); + } Review comment: Yes. This breaks security. AccessController is caller sensitive. So having it as public method kills all. Better just put SuppressForbidden and SuppressWarnings everywhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
jpountz commented on a change in pull request #534: URL: https://github.com/apache/lucene/pull/534#discussion_r778880348 ## File path: lucene/core/src/test/org/apache/lucene/codecs/perfield/TestPerFieldKnnVectorsFormat.java ## @@ -172,9 +171,14 @@ public KnnVectorsWriter fieldsWriter(SegmentWriteState state) throws IOException KnnVectorsWriter writer = delegate.fieldsWriter(state); return new KnnVectorsWriter() { @Override -public void writeField(FieldInfo fieldInfo, VectorValues values) throws IOException { +public void writeField(FieldInfo fieldInfo, KnnVectorsReader knnVectorsReader) +throws IOException { fieldsWritten.add(fieldInfo.name); - writer.writeField(fieldInfo, values); + // assert that knnVectorsReader#getVectorValues returns different instances upon repeated + // calls Review comment: This is the sort of things that we usually check via AssertingKnnVectorsReader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-10352. Fix Version/s: 9.1 10.0 (main) Resolution: Fixed We opened several issues about broken analysis componets. If you want to run beaster to find more bugs, you can run the following command on main or branch_9x: {{$ gradlew :lucene:analysis.tests:beast -Dtests.dups=100 --tests TestRandomChains -Dtests.nightly=true}} Thanks to all who helped. > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Fix For: 9.1, 10.0 (main) > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil
jpountz commented on pull request #541: URL: https://github.com/apache/lucene/pull/541#issuecomment-1005745531 Nice. I wonder if we need to specialize for so many numbers of bits per value like we do for postings, or if we should only specialize for a few numbers of bits per value that are both useful and fast, e.g. 0, 4, 8, 16, 24 and 32. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469362#comment-17469362 ] ASF subversion and git services commented on LUCENE-10352: -- Commit 75259417f1b8de05eda4cf3a8b8c5e8177c7f0dd in lucene's branch refs/heads/branch_9x from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7525941 ] LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system (#582) Co-authored-by: Robert Muir > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #579: LUCENE-10283: Bump minimum required Java version to 17.
dweiss commented on pull request #579: URL: https://github.com/apache/lucene/pull/579#issuecomment-1005744351 I do have the same concerns but at the same time - if they remove the security manager entirely in, say, JDK 17+X then the code will stop compiling/ working anyway. Maybe these concerns should be left for JDK maintainers though. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10352) Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system
[ https://issues.apache.org/jira/browse/LUCENE-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469354#comment-17469354 ] ASF subversion and git services commented on LUCENE-10352: -- Commit 475fbd0bdde31c6a2ae62c59505cf9e8becd50e4 in lucene's branch refs/heads/main from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=475fbd0 ] LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system (#582) Co-authored-by: Robert Muir > Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global > integration test and discover classes to check from module system > > > Key: LUCENE-10352 > URL: https://issues.apache.org/jira/browse/LUCENE-10352 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently TestAllAnalyzersHaveFactories and TestRandomChains only work on the > analysis-commons module, but e.g. we do not do a random chain with kuromoji > and ICU. Also both tests rely on some hacky classpath-inspection and the > tests fail if ran on a JAR file. > This issue tracks progress I am currently doing to refactor this: > - Move those 2 classes to a new gradle subproject > :lucene:analysis:integration.tests and add a module-info referring to all > other analysis packages > - Rewrite the class discovery to use ModuleReader > - Run TestAllAnalyzersHaveFactories per module (using one module reader), so > it discovers all classes and ensures that factory and stream are in same > module (there are some core vs. analysis.common discrepancies) > - RunTestRandomChains on the whole module graph. The classes are discovered > from all module readers in the graph (filtering on module name starting with > "org.apache.lucene.analysis." > - Also compare that the SPI factories returned by discovery match those we > have in the module graphs > While doing this I disovered some bad things: > - TestRandomChains depends on test-only resources. We may need to replicate > those (it is about 5 files that are fed into the ctors) > - We have 5 different StringMockResourceLoaders: Originally it was only in > analysis common, now its everywhere. I will move this class to > test-framework. This is unrelated but can be done here. The background of > this was that analysis factories and resource loaders were not part of lucene > core, so the resourceloader interface couldn't be in test-framework. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #545: LUCENE-10319: make ForUtil#BLOCK_SIZE changeable
jpountz commented on pull request #545: URL: https://github.com/apache/lucene/pull/545#issuecomment-1005739072 I'm a bit torn as this also makes the code harder to read with all these constants with complex names. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org