[GitHub] [lucene] mocobeta commented on pull request #811: Add some basic tasks to help/workflow
mocobeta commented on PR #811: URL: https://github.com/apache/lucene/pull/811#issuecomment-1100574008 > I think we can still do some more to help new contributors but I have no specific action items in my head. I'll create separate JIRAs/PRs if I come up with something. Sounds great, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #813: Backport LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders…
zhaih commented on PR #813: URL: https://github.com/apache/lucene/pull/813#issuecomment-1100483369 Thank you @gautamworah96 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
[ https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522988#comment-17522988 ] ASF subversion and git services commented on LUCENE-10482: -- Commit 5c2cbd712590fe98d101f30b528bee976af173ee in lucene's branch refs/heads/branch_9x from Gautam Worah [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5c2cbd71259 ] LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide (#762) (#813) > Allow users to create their own DirectoryTaxonomyReaders with empty > taxoArrays instead of letting the taxoEpoch decide > -- > > Key: LUCENE-10482 > URL: https://issues.apache.org/jira/browse/LUCENE-10482 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 9.1 >Reporter: Gautam Worah >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} > in my day job where we were trying to replace the index underneath a reader > asynchronously and then call the {{doOpenIfChanged}} call on it. > It turns out that the taxonomy index uses its own index based counter (the > {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write > mode after the last time it was written and if not, it directly tries to > reuse the previous {{taxoArrays}} it had created. This logic fails in a > scenario where both the old and new index were opened just once but the index > itself is completely different in both the cases. > In such a case, it would be good to give the user the flexibility to inform > the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and > {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in > various ways). Luckily, such a constructor already exists! But it is private > today! The idea here is to allow subclasses of DTR to use this constructor. > Curious to see what other folks think about this idea. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih merged pull request #813: Backport LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders…
zhaih merged PR #813: URL: https://github.com/apache/lucene/pull/813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-7863) Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc
[ https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev closed LUCENE-7863. Lucene Fields: (was: New) > Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc > > > Key: LUCENE-7863 > URL: https://issues.apache.org/jira/browse/LUCENE-7863 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Mikhail Khludnev >Priority: Major > Attachments: LUCENE-7863.hazard, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, bench-byte-array-long.out, bench-byte-array2.out, > benchmark-1m.out, byterefshash-bench.txt > > > h2. Context > \*suffix and \*infix\* searches on large indexes. > h2. Problem > Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm > shuddering to think about EdgeNGrams... > h2. Proposal > _DR_-Y- postings -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-7863) Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc
[ https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev resolved LUCENE-7863. -- Resolution: Won't Fix > Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc > > > Key: LUCENE-7863 > URL: https://issues.apache.org/jira/browse/LUCENE-7863 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Mikhail Khludnev >Priority: Major > Attachments: LUCENE-7863.hazard, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, bench-byte-array-long.out, bench-byte-array2.out, > benchmark-1m.out, byterefshash-bench.txt > > > h2. Context > \*suffix and \*infix\* searches on large indexes. > h2. Problem > Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm > shuddering to think about EdgeNGrams... > h2. Proposal > _DR_-Y- postings -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 opened a new pull request, #813: Backport LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders…
gautamworah96 opened a new pull request, #813: URL: https://github.com/apache/lucene/pull/813 Backport of PR: https://github.com/apache/lucene/pull/762, commit: 10ebc099c846c7d96f4ff5f9b7853df850fa8442 for branch_9x. Changes entry is under 9.2 in both the earlier PR and this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on pull request #811: Add some basic tasks to help/workflow
gautamworah96 commented on PR #811: URL: https://github.com/apache/lucene/pull/811#issuecomment-1100391643 Hmm. I had not looked at the `help/tests.txt` file. +1 on not adding all the comprehensive options to this workflow file. LGTM overall. I think we can still do some more to help new contributors but I have no specific action items in my head. I'll create separate JIRAs/PRs if I come up with something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1100389761 @LuXugang Thanks a lot for your work. I was thinking may be a better way to present these changes is to leave all formats changes to a later PR. And for this PR just to make changes on `Lucene91HnswVectorsWriter` and `Lucene91HnswVectorsReader` directly, otherwise it is difficult to see what exactly has been changed. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r851509289 ## lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsWriter.java: ## @@ -0,0 +1,328 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.codecs.lucene92; + +import static org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsFormat.DIRECT_MONOTONIC_BLOCK_SHIFT; +import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS; + +import java.io.IOException; +import java.util.Arrays; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.KnnVectorsReader; +import org.apache.lucene.codecs.KnnVectorsWriter; +import org.apache.lucene.codecs.lucene90.IndexedDISI; +import org.apache.lucene.index.DocsWithFieldSet; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.RandomAccessVectorValuesProducer; +import org.apache.lucene.index.SegmentWriteState; +import org.apache.lucene.index.VectorSimilarityFunction; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; +import org.apache.lucene.util.hnsw.HnswGraph.NodesIterator; +import org.apache.lucene.util.hnsw.HnswGraphBuilder; +import org.apache.lucene.util.hnsw.NeighborArray; +import org.apache.lucene.util.hnsw.OnHeapHnswGraph; +import org.apache.lucene.util.packed.DirectMonotonicWriter; + +/** + * Writes vector values and knn graphs to index segments. + * + * @lucene.experimental + */ +public final class Lucene92HnswVectorsWriter extends KnnVectorsWriter { + + private final SegmentWriteState segmentWriteState; + private final IndexOutput meta, vectorData, vectorIndex; + private final int maxDoc; + + private final int maxConn; + private final int beamWidth; + private boolean finished; + + Lucene92HnswVectorsWriter(SegmentWriteState state, int maxConn, int beamWidth) + throws IOException { +this.maxConn = maxConn; +this.beamWidth = beamWidth; + +assert state.fieldInfos.hasVectorValues(); +segmentWriteState = state; + +String metaFileName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene92HnswVectorsFormat.META_EXTENSION); + +String vectorDataFileName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, +state.segmentSuffix, +Lucene92HnswVectorsFormat.VECTOR_DATA_EXTENSION); + +String indexDataFileName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, +state.segmentSuffix, +Lucene92HnswVectorsFormat.VECTOR_INDEX_EXTENSION); + +boolean success = false; +try { + meta = state.directory.createOutput(metaFileName, state.context); + vectorData = state.directory.createOutput(vectorDataFileName, state.context); + vectorIndex = state.directory.createOutput(indexDataFileName, state.context); + + CodecUtil.writeIndexHeader( + meta, + Lucene92HnswVectorsFormat.META_CODEC_NAME, + Lucene92HnswVectorsFormat.VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.writeIndexHeader( + vectorData, + Lucene92HnswVectorsFormat.VECTOR_DATA_CODEC_NAME, + Lucene92HnswVectorsFormat.VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.writeIndexHeader( + vectorIndex, + Lucene92HnswVectorsFormat.VECTOR_INDEX_CODEC_NAME, + Lucene92HnswVectorsFormat.VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + maxDoc = state.segmentInfo.maxDoc(); + success = true; +} finally { + if (success == false) { +IOUtils.closeWhileHandlingException(this); + } +} + } + + @Override + public void writeField(FieldInfo fieldInfo, KnnVectorsReader knnVectorsReader) + throws IOException { +
[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
[ https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522911#comment-17522911 ] ASF subversion and git services commented on LUCENE-10482: -- Commit 10ebc099c846c7d96f4ff5f9b7853df850fa8442 in lucene's branch refs/heads/main from Gautam Worah [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=10ebc099c84 ] LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide (#762) > Allow users to create their own DirectoryTaxonomyReaders with empty > taxoArrays instead of letting the taxoEpoch decide > -- > > Key: LUCENE-10482 > URL: https://issues.apache.org/jira/browse/LUCENE-10482 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 9.1 >Reporter: Gautam Worah >Priority: Minor > Time Spent: 5h 40m > Remaining Estimate: 0h > > I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} > in my day job where we were trying to replace the index underneath a reader > asynchronously and then call the {{doOpenIfChanged}} call on it. > It turns out that the taxonomy index uses its own index based counter (the > {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write > mode after the last time it was written and if not, it directly tries to > reuse the previous {{taxoArrays}} it had created. This logic fails in a > scenario where both the old and new index were opened just once but the index > itself is completely different in both the cases. > In such a case, it would be good to give the user the flexibility to inform > the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and > {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in > various ways). Luckily, such a constructor already exists! But it is private > today! The idea here is to allow subclasses of DTR to use this constructor. > Curious to see what other folks think about this idea. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
zhaih commented on PR #762: URL: https://github.com/apache/lucene/pull/762#issuecomment-1100258275 Pushed, could you also open a backport PR? @gautamworah96 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih merged pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
zhaih merged PR #762: URL: https://github.com/apache/lucene/pull/762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori
mocobeta commented on PR #805: URL: https://github.com/apache/lucene/pull/805#issuecomment-1100187804 Hi Robert and Mike, thank you for your response. I think this can be kept open for a sufficient time period - it is unlikely to happen large conflicts between these changes and the main branch. I reviewed it several times during making this patch and I believe this does not change the tokenizers' behavior (although I can't fully guarantee that). I'd be glad if you give feedback about the overall design, and ideas to make the interfaces safer for further refactoring or development. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types
ChrisHegarty commented on code in PR #812: URL: https://github.com/apache/lucene/pull/812#discussion_r851301618 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java: ## @@ -126,23 +126,41 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs == null) { Review Comment: This change just hoists the null check outside of the (potentially hot) loop. In our analysis we don't observe that C2 can consistently automatically do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Hegarty updated LUCENE-10517: --- Description: While analysing various profiles, [@grcevski|https://github.com/grcevski] and I can came across this potential improvement. SortedSetDV faceting (and friends), can improve performance within tight loops by using invokevirtual (rather than invokeinterface). The C2 JIT compiler can produce slightly more optimal code in this case, and since these loops are very hot, the impact can be significant (in the order of 10-30%). This issue is in some ways similar to, and builds upon, prior optimisations in this area, like say LUCENE-5300 or more recently LUCENE-5309 was: SortedSetDV faceting (and friends), can improve performance within tight loops by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT compiler can produce slightly more optimal code in this case, and since these loops are very hot, the impact can be significant (in the order of 10-20%). The code change amounts to using `SortedDocValues` or `SortedSetDocValues` class types, rather than the `DocIdSetIterator` interface type, in loops (specifically for invocation of `nextDoc()`, when the iterator type is known and not wrapped. This issue is in some ways similar, and builds upon, prior optimisations in this area, like say LUCENE-5300. > Improve performance of SortedSetDV faceting by iterating on class types > --- > > Key: LUCENE-10517 > URL: https://issues.apache.org/jira/browse/LUCENE-10517 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.1 >Reporter: Chris Hegarty >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > While analysing various profiles, [@grcevski|https://github.com/grcevski] and > I can came across this potential improvement. > SortedSetDV faceting (and friends), can improve performance within tight > loops by using invokevirtual (rather than invokeinterface). The C2 JIT > compiler can produce slightly more optimal code in this case, and since these > loops are very hot, the impact can be significant (in the order of 10-30%). > This issue is in some ways similar to, and builds upon, prior optimisations > in this area, like say LUCENE-5300 or more recently LUCENE-5309 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Hegarty updated LUCENE-10517: --- Issue Type: Improvement (was: Bug) > Improve performance of SortedSetDV faceting by iterating on class types > --- > > Key: LUCENE-10517 > URL: https://issues.apache.org/jira/browse/LUCENE-10517 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.1 >Reporter: Chris Hegarty >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > SortedSetDV faceting (and friends), can improve performance within tight > loops by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT > compiler can produce slightly more optimal code in this case, and since these > loops are very hot, the impact can be significant (in the order of 10-20%). > The code change amounts to using `SortedDocValues` or `SortedSetDocValues` > class types, rather than the `DocIdSetIterator` interface type, in loops > (specifically for invocation of `nextDoc()`, when the iterator type is known > and not wrapped. > This issue is in some ways similar, and builds upon, prior optimisations in > this area, like say LUCENE-5300. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522838#comment-17522838 ] Chris Hegarty edited comment on LUCENE-10517 at 4/15/22 2:21 PM: - I my M1 I get the following luceneutil benchmark results. Hardware Overview: Chip: Apple M1 Total Number of Cores: 8 (4 performance and 4 efficiency) Memory: 16 GB {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value LowPhrase 148.35 (2.1%) 143.66 (2.6%) -3.2% ( -7% - 1%) 0.000 MedIntervalsOrdered 197.27 (3.7%) 191.24 (5.7%) -3.1% ( -12% - 6%) 0.044 HighIntervalsOrdered 11.55 (2.6%) 11.33 (3.5%) -1.9% ( -7% - 4%) 0.055 AndHighMed 447.74 (2.1%) 441.26 (2.4%) -1.4% ( -5% - 3%) 0.042 HighTerm 2397.60 (4.0%) 2367.10 (2.4%) -1.3% ( -7% - 5%) 0.223 LowTerm 3939.37 (2.7%) 3890.14 (2.3%) -1.2% ( -6% - 3%) 0.111 OrHighNotHigh 1917.21 (2.8%) 1893.94 (3.2%) -1.2% ( -6% - 4%) 0.198 HighPhrase 32.93 (1.9%) 32.55 (1.1%) -1.2% ( -4% - 1%) 0.022 PKLookup 340.11 (4.5%) 336.69 (4.3%) -1.0% ( -9% - 8%) 0.471 TermDTSort 145.39 (4.1%) 144.09 (2.3%) -0.9% ( -7% - 5%) 0.394 HighSpanNear 10.38 (3.7%) 10.32 (1.9%) -0.6% ( -5% - 5%) 0.531 MedSpanNear 206.69 (2.8%) 205.70 (1.5%) -0.5% ( -4% - 3%) 0.500 Fuzzy2 91.75 (2.5%) 91.41 (1.4%) -0.4% ( -4% - 3%) 0.562 OrHighNotMed 1975.22 (3.5%) 1968.91 (2.7%) -0.3% ( -6% - 6%) 0.744 OrHighMed 66.62 (3.9%) 66.45 (4.8%) -0.3% ( -8% - 8%) 0.850 LowSloppyPhrase 62.60 (2.1%) 62.44 (2.5%) -0.3% ( -4% - 4%) 0.726 OrHighNotLow 1876.16 (2.5%) 1871.56 (2.4%) -0.2% ( -5% - 4%) 0.756 OrHighHigh 55.70 (3.9%) 55.64 (4.9%) -0.1% ( -8% - 9%) 0.940 Fuzzy1 100.97 (2.2%) 100.88 (2.1%) -0.1% ( -4% - 4%) 0.898 LowIntervalsOrdered 42.24 (0.7%) 42.21 (1.0%) -0.1% ( -1% - 1%) 0.766 MedPhrase 923.85 (1.3%) 923.14 (1.6%) -0.1% ( -2% - 2%) 0.867 OrNotHighMed 1427.45 (2.0%) 1428.11 (2.5%) 0.0% ( -4% - 4%) 0.949 Respell 82.74 (2.6%) 82.81 (1.9%) 0.1% ( -4% - 4%) 0.903 LowSpanNear 373.63 (2.6%) 373.97 (1.6%) 0.1% ( -4% - 4%) 0.893 HighTermDayOfYearSort 199.64 (1.7%) 199.83 (2.5%) 0.1% ( -4% - 4%) 0.887 OrNotHighHigh 1523.02 (2.2%) 1526.12 (2.0%) 0.2% ( -3% - 4%) 0.759 AndHighMedDayTaxoFacets 185.23 (0.9%) 185.79 (1.4%) 0.3% ( -1% - 2%) 0.416 MedTerm 3016.98 (3.4%) 3026.53 (3.2%) 0.3% ( -6% - 7%) 0.761 OrNotHighLow 1867.65 (2.5%) 1876.63 (2.4%) 0.5% ( -4% - 5%) 0.535 AndHighLow 1571.61 (3.1%) 1579.86 (2.6%) 0.5% ( -5% - 6%) 0.564 OrHighLow 1485.93 (3.7%) 1494.56 (2.5%) 0.6% ( -5% - 7%) 0.559 AndHighHigh 80.42 (2.8%) 81.06 (1.7%) 0.8% ( -3% - 5%) 0.273 HighSloppyPhrase 50.68 (4.0%) 51.14 (4.7%) 0.9% ( -7% - 9%) 0.506 MedSloppyPhrase 40.76 (2.6%) 41.13 (3.6%) 0.9% ( -5% - 7%) 0.356 Wildcard 123.13 (7.3%) 124.34 (6.5%) 1.0% ( -11% - 15%) 0.654 AndHighHighDayTaxoFacets 17.77 (2.8%) 17.95 (2.7%) 1.0% ( -4% - 6%) 0.256 MedTermDayTaxoFacets 46.83 (2.6%) 47.38 (1.8%) 1.2% ( -3% - 5%) 0.097 HighTermMonthSort 193.35 (1.5%) 195.77 (5.4%) 1.2% ( -5% - 8%) 0.320 IntNRQ 69.13 (17.2%) 70.81 (16.2%) 2.4% ( -26% - 43%) 0.646 High
[GitHub] [lucene] mikemccand commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori
mikemccand commented on PR #805: URL: https://github.com/apache/lucene/pull/805#issuecomment-1100135300 Whoa, this sounds awesome! I will try to review soon. Thanks @mocobeta. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types
ChrisHegarty commented on PR #812: URL: https://github.com/apache/lucene/pull/812#issuecomment-1100132616 The perf improvements come from changing the target type of the `nextDoc` invocations - which results in an invokevirtual rather than an invokeinterface. The changes in this PR proposed to add variants of countXX where the aforementioned is possible, but alternatively branches could be the existing code (rather than adding new methods), which achieves similar perf results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types
ChrisHegarty commented on PR #812: URL: https://github.com/apache/lucene/pull/812#issuecomment-1100130595 I added some luceneutil benchmark output in the JIRA issue, but while positive someone more familiar running these benchmarks should verify in their own environment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty opened a new pull request, #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types
ChrisHegarty opened a new pull request, #812: URL: https://github.com/apache/lucene/pull/812 # Description SortedSetDV faceting (and friends), can improve performance within tight loops by using invokevirtual (rather than invokeinterface). The C2 JIT compiler can produce slightly more optimal code in this case, and since these loops are very hot, the impact can be significant (in the order of 10-20%). This issue is in some ways similar, and builds upon, prior optimisations in this area, like say [LUCENE-5300](https://issues.apache.org/jira/browse/LUCENE-5300). # Solution The code change amounts to using `SortedDocValues` or `SortedSetDocValues` class types, rather than the `DocIdSetIterator` interface type, in loops (specifically for invocation of `nextDoc()`, when the iterator type is known and not wrapped. # Tests No new tests. Existing tests all pass successfully. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522842#comment-17522842 ] Chris Hegarty commented on LUCENE-10517: While the two sets of results above show significant improvements, ~30%, on some benchmarks, we see some variation and somewhat lesser improvements on said benchmarks from run to run. Nevertheless, always positive. > Improve performance of SortedSetDV faceting by iterating on class types > --- > > Key: LUCENE-10517 > URL: https://issues.apache.org/jira/browse/LUCENE-10517 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 9.1 >Reporter: Chris Hegarty >Priority: Minor > > SortedSetDV faceting (and friends), can improve performance within tight > loops by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT > compiler can produce slightly more optimal code in this case, and since these > loops are very hot, the impact can be significant (in the order of 10-20%). > The code change amounts to using `SortedDocValues` or `SortedSetDocValues` > class types, rather than the `DocIdSetIterator` interface type, in loops > (specifically for invocation of `nextDoc()`, when the iterator type is known > and not wrapped. > This issue is in some ways similar, and builds upon, prior optimisations in > this area, like say LUCENE-5300. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522839#comment-17522839 ] Chris Hegarty commented on LUCENE-10517: [~grcevski] observes the following on his standalone Linux x64 machine: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value IntNRQ 113.25 (13.0%) 109.09 (16.2%) -3.7% ( -29% - 29%) 0.428 MedIntervalsOrdered 219.98 (8.5%) 212.53 (7.4%) -3.4% ( -17% - 13%) 0.177 HighIntervalsOrdered 13.46 (9.2%) 13.07 (8.0%) -2.9% ( -18% - 15%) 0.291 LowIntervalsOrdered 47.61 (4.7%) 46.68 (5.5%) -2.0% ( -11% -8%) 0.226 Fuzzy2 89.86 (2.0%) 89.12 (1.7%) -0.8% ( -4% -2%) 0.161 OrHighMed 82.18 (3.9%) 81.69 (4.1%) -0.6% ( -8% -7%) 0.633 Fuzzy1 103.27 (1.7%) 102.66 (1.7%) -0.6% ( -3% -2%) 0.282 OrHighLow 1473.95 (2.1%) 1468.06 (2.7%) -0.4% ( -5% -4%) 0.603 Respell 77.25 (1.7%) 77.03 (2.0%) -0.3% ( -3% -3%) 0.619 PKLookup 278.02 (1.6%) 277.37 (2.5%) -0.2% ( -4% -3%) 0.721 HighTermDayOfYearSort 208.62 (13.4%) 208.28 (8.9%) -0.2% ( -19% - 25%) 0.964 AndHighMed 478.44 (3.4%) 477.87 (3.4%) -0.1% ( -6% -6%) 0.912 HighSpanNear 11.46 (3.8%) 11.47 (2.9%)0.0% ( -6% -7%) 0.973 LowTerm 3361.20 (6.6%) 3362.81 (4.4%)0.0% ( -10% - 11%) 0.978 MedSpanNear 175.70 (3.5%) 175.99 (3.2%)0.2% ( -6% -7%) 0.877 OrHighHigh 65.85 (3.6%) 65.95 (3.9%)0.2% ( -7% -8%) 0.890 HighTermTitleBDVSort 198.74 (14.7%) 199.10 (11.1%)0.2% ( -22% - 30%) 0.965 HighSloppyPhrase 60.81 (3.0%) 60.95 (3.4%)0.2% ( -5% -6%) 0.812 MedPhrase 923.07 (3.2%) 925.41 (2.8%)0.3% ( -5% -6%) 0.793 AndHighMedDayTaxoFacets 173.34 (1.9%) 174.54 (1.9%)0.7% ( -3% -4%) 0.249 HighPhrase 35.97 (2.8%) 36.29 (2.9%)0.9% ( -4% -6%) 0.315 AndHighHighDayTaxoFacets 19.91 (3.2%) 20.10 (2.5%)1.0% ( -4% -6%) 0.287 AndHighHigh 89.95 (3.7%) 90.90 (3.2%)1.1% ( -5% -8%) 0.339 MedSloppyPhrase 46.68 (3.9%) 47.22 (4.4%)1.2% ( -6% -9%) 0.380 OrNotHighHigh 1510.87 (3.7%) 1530.06 (4.2%)1.3% ( -6% -9%) 0.310 Wildcard 117.63 (3.1%) 119.21 (4.0%)1.3% ( -5% -8%) 0.236 OrNotHighLow 1702.07 (2.4%) 1725.22 (2.9%)1.4% ( -3% -6%) 0.101 LowSpanNear 377.84 (3.5%) 383.09 (2.6%)1.4% ( -4% -7%) 0.157 LowPhrase 132.15 (2.7%) 134.16 (2.9%)1.5% ( -3% -7%) 0.086 HighTermMonthSort 198.05 (14.9%) 201.09 (13.1%)1.5% ( -23% - 34%) 0.730 LowSloppyPhrase 62.78 (1.8%) 63.78 (2.1%)1.6% ( -2% -5%) 0.010 OrHighNotMed 1951.27 (4.7%) 1983.85 (5.4%)1.7% ( -8% - 12%) 0.296 OrHighNotHigh 1934.75 (3.8%) 1974.20 (4.1%)2.0% ( -5% - 10%) 0.102 OrHighNotLow 1850.82 (4.2%) 1889.29 (5.6%)2.1% ( -7% - 12%) 0.187 AndHighLow 1487.40 (4.6%) 1519.08 (3.1%)2.1% ( -5% - 10%) 0.088 BrowseDateSSDVFacets5.13 (4.2%)5.25 (5.5%)2.2% ( -7% - 12%) 0.150 MedTerm 3151.81 (6.1%) 3232.28 (5.9%)2.6% ( -8% - 15%) 0.178 OrNotHighMed 1169.40 (3.2%) 1199.40 (4.4%)2.6% ( -4% - 10%) 0.035 MedTermDayTaxoFacets 55.14 (4.1%) 56.59 (4.1%)2.6% ( -5% - 11%) 0.040 Prefix3 206.64 (12.8%) 212.16 (15.8%)2.7% ( -23% - 35%) 0.558 HighTerm 2405.51
[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522838#comment-17522838 ] Chris Hegarty commented on LUCENE-10517: I my M1 I get the following luceneutil benchmark results. $ sw_vers ProductName:macOS ProductVersion: 11.5.2 BuildVersion: 20G95 $ uname -a Darwin chegar-MBP.local 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:27 PDT 2021; root:xnu-7195.141.2~5/RELEASE_ARM64_T8101 arm64 $ sysctl -n machdep.cpu.brand_string Apple M1 $ system_profiler SPHardwareDataType Hardware: Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro17,1 Chip: Apple M1 Total Number of Cores: 8 (4 performance and 4 efficiency) Memory: 16 GB System Firmware Version: 6723.140.2 OS Loader Version: 6723.140.2 Serial Number (system): FVFG731MQ05P Hardware UUID: 1D7BA696-DBDB-5E9C-BD46-5A18758DE699 Provisioning UDID: 8103-000A05E001C0801E Activation Lock Status: Disabled {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value LowPhrase 148.35 (2.1%) 143.66 (2.6%) -3.2% ( -7% - 1%) 0.000 MedIntervalsOrdered 197.27 (3.7%) 191.24 (5.7%) -3.1% ( -12% - 6%) 0.044 HighIntervalsOrdered 11.55 (2.6%) 11.33 (3.5%) -1.9% ( -7% - 4%) 0.055 AndHighMed 447.74 (2.1%) 441.26 (2.4%) -1.4% ( -5% - 3%) 0.042 HighTerm 2397.60 (4.0%) 2367.10 (2.4%) -1.3% ( -7% - 5%) 0.223 LowTerm 3939.37 (2.7%) 3890.14 (2.3%) -1.2% ( -6% - 3%) 0.111 OrHighNotHigh 1917.21 (2.8%) 1893.94 (3.2%) -1.2% ( -6% - 4%) 0.198 HighPhrase 32.93 (1.9%) 32.55 (1.1%) -1.2% ( -4% - 1%) 0.022 PKLookup 340.11 (4.5%) 336.69 (4.3%) -1.0% ( -9% - 8%) 0.471 TermDTSort 145.39 (4.1%) 144.09 (2.3%) -0.9% ( -7% - 5%) 0.394 HighSpanNear 10.38 (3.7%) 10.32 (1.9%) -0.6% ( -5% - 5%) 0.531 MedSpanNear 206.69 (2.8%) 205.70 (1.5%) -0.5% ( -4% - 3%) 0.500 Fuzzy2 91.75 (2.5%) 91.41 (1.4%) -0.4% ( -4% - 3%) 0.562 OrHighNotMed 1975.22 (3.5%) 1968.91 (2.7%) -0.3% ( -6% - 6%) 0.744 OrHighMed 66.62 (3.9%) 66.45 (4.8%) -0.3% ( -8% - 8%) 0.850 LowSloppyPhrase 62.60 (2.1%) 62.44 (2.5%) -0.3% ( -4% - 4%) 0.726 OrHighNotLow 1876.16 (2.5%) 1871.56 (2.4%) -0.2% ( -5% - 4%) 0.756 OrHighHigh 55.70 (3.9%) 55.64 (4.9%) -0.1% ( -8% - 9%) 0.940 Fuzzy1 100.97 (2.2%) 100.88 (2.1%) -0.1% ( -4% - 4%) 0.898 LowIntervalsOrdered 42.24 (0.7%) 42.21 (1.0%) -0.1% ( -1% - 1%) 0.766 MedPhrase 923.85 (1.3%) 923.14 (1.6%) -0.1% ( -2% - 2%) 0.867 OrNotHighMed 1427.45 (2.0%) 1428.11 (2.5%) 0.0% ( -4% - 4%) 0.949 Respell 82.74 (2.6%) 82.81 (1.9%) 0.1% ( -4% - 4%) 0.903 LowSpanNear 373.63 (2.6%) 373.97 (1.6%) 0.1% ( -4% - 4%) 0.893 HighTermDayOfYearSort 199.64 (1.7%) 199.83 (2.5%) 0.1% ( -4% - 4%) 0.887 OrNotHighHigh 1523.02 (2.2%) 1526.12 (2.0%) 0.2% ( -3% - 4%) 0.759 AndHighMedDayTaxoFacets 185.23 (0.9%) 185.79 (1.4%) 0.3% ( -1% - 2%) 0.416 MedTerm 3016.98 (3.4%) 3026.53 (3.2%) 0.3% ( -6% - 7%) 0.761 OrNotHighLow 1867.65 (2.5%) 1876.63 (2.4%) 0.5% ( -4% - 5%) 0.535 AndHighLow 1571.61 (3.1%) 1579.86 (2.6%) 0.5% ( -5% - 6%) 0.564 OrHighLow 1485.93 (3.7%) 1494.56 (2.5%) 0.6% ( -5% - 7%) 0.559 AndHighHigh 80.42 (2.8%) 81.06 (1.7%) 0.8% ( -3% - 5%) 0.273 HighSloppyPhrase 50.68 (4.0%) 51.14 (4.7%) 0.9% ( -7% - 9%) 0.506 MedSloppyPhrase 40.76 (2.6%
[jira] [Created] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
Chris Hegarty created LUCENE-10517: -- Summary: Improve performance of SortedSetDV faceting by iterating on class types Key: LUCENE-10517 URL: https://issues.apache.org/jira/browse/LUCENE-10517 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 9.1 Reporter: Chris Hegarty SortedSetDV faceting (and friends), can improve performance within tight loops by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT compiler can produce slightly more optimal code in this case, and since these loops are very hot, the impact can be significant (in the order of 10-20%). The code change amounts to using `SortedDocValues` or `SortedSetDocValues` class types, rather than the `DocIdSetIterator` interface type, in loops (specifically for invocation of `nextDoc()`, when the iterator type is known and not wrapped. This issue is in some ways similar, and builds upon, prior optimisations in this area, like say LUCENE-5300. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori
rmuir commented on PR #805: URL: https://github.com/apache/lucene/pull/805#issuecomment-1100064836 I think @mikemccand actually created most of this code and is most familiar with it. Mike, if you have time can you look too? For the special n-best class, is the issue that `nori` simply doesn't offer this n-best feature? For the future, I wonder if there is some reason it doesn't make sense there too? Maybe it was just overlooked before? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori
rmuir commented on PR #805: URL: https://github.com/apache/lucene/pull/805#issuecomment-1100062662 Sure, sorry for the slow response. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #811: Add some basic tasks to help/workflow
mocobeta commented on PR #811: URL: https://github.com/apache/lucene/pull/811#issuecomment-1100018613 @gautamworah96 thanks for your comments. > I sometimes use the -Ptests.iters= param for beasting out multiple runs of a single test to catch random edge cases that I might have missed (this was another trick that I just stumbled upon through JIRA). Maybe we could add this to the workflow file as well? I found `help/tests.txt` (the source for `gradlew helpTests` task) has a dedicated section about "Reiteration", where `-Ptests.iters` option is explained. https://github.com/apache/lucene/blob/main/help/tests.txt#L87 `test` task has many parameters, I think it'd be better to encourage devs to refer to that file than augment `workflow.txt`? (You can see a pointer says `run "gradlew :helpTests" for more` in L12 in workflow.txt.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #811: Add some basic tasks to help/workflow
mocobeta commented on code in PR #811: URL: https://github.com/apache/lucene/pull/811#discussion_r851184763 ## help/workflow.txt: ## @@ -25,11 +25,22 @@ Assemble a single module's JAR (here for lucene-core): gradlew -p lucene/core assemble ls lucene/core/build/libs +Assemble all JARs: Review Comment: Updated in https://github.com/apache/lucene/pull/811/commits/9c41951601d2b07deae84e8700c7342385141754. I didn't touch the existing description for the same command with `-p`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #811: Add some basic tasks to help/workflow
mocobeta commented on code in PR #811: URL: https://github.com/apache/lucene/pull/811#discussion_r851181043 ## help/workflow.txt: ## @@ -25,11 +25,22 @@ Assemble a single module's JAR (here for lucene-core): gradlew -p lucene/core assemble ls lucene/core/build/libs +Assemble all JARs: Review Comment: ```suggestion Assemble all Lucene artifacts (JARs, and so on): ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.
[ https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kkewwei resolved LUCENE-10448. -- Resolution: Not A Problem > MergeRateLimiter doesn't always limit instant rate. > --- > > Key: LUCENE-10448 > URL: https://issues.apache.org/jira/browse/LUCENE-10448 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 8.11.1 >Reporter: kkewwei >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > We can see the code in *MergeRateLimiter*: > {code:java} > private long maybePause(long bytes, long curNS) throws > MergePolicy.MergeAbortedException { > > double rate = mbPerSec; > double secondsToPause = (bytes / 1024. / 1024.) / rate; > long targetNS = lastNS + (long) (10 * secondsToPause); > long curPauseNS = targetNS - curNS; > // We don't bother with thread pausing if the pause is smaller than 2 > msec. > if (curPauseNS <= MIN_PAUSE_NS) { > // Set to curNS, not targetNS, to enforce the instant rate, not > // the "averaged over all history" rate: > lastNS = curNS; > return -1; > } >.. > } > {code} > If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, > then the *maybePause* is called in 7:05 again, so the value of > *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than > *curNS*, no matter how big the bytes is, we will return -1 and ignore to > pause. > I count the total times(callTimes) calling *maybePause* and ignored pause > times(ignorePauseTimes) and detail ignored bytes(detailBytes): > {code:java} > [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] > [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 > docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec > throttle], [callTimes=857], [ignorePauseTimes=25], [detailBytes(mb) = > [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, > 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, > 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, > 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]] > {code} > There are 857 times calling *maybePause*, including 25 times which is ignored > to pause, we can see that the ignored detail bytes (such as 0.28125mb) are > not small. > As long as the interval between two *maybePause* calls is relatively long, > the pause action that should be executed will not be executed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org