[GitHub] [lucene] mocobeta commented on pull request #811: Add some basic tasks to help/workflow

2022-04-15 Thread GitBox


mocobeta commented on PR #811:
URL: https://github.com/apache/lucene/pull/811#issuecomment-1100574008

   > I think we can still do some more to help new contributors but I have no 
specific action items in my head. I'll create separate JIRAs/PRs if I come up 
with something.
   
   Sounds great, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #813: Backport LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders…

2022-04-15 Thread GitBox


zhaih commented on PR #813:
URL: https://github.com/apache/lucene/pull/813#issuecomment-1100483369

   Thank you @gautamworah96 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522988#comment-17522988
 ] 

ASF subversion and git services commented on LUCENE-10482:
--

Commit 5c2cbd712590fe98d101f30b528bee976af173ee in lucene's branch 
refs/heads/branch_9x from Gautam Worah
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5c2cbd71259 ]

LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with 
empty taxoArrays instead of letting the taxoEpoch decide (#762) (#813)



> Allow users to create their own DirectoryTaxonomyReaders with empty 
> taxoArrays instead of letting the taxoEpoch decide
> --
>
> Key: LUCENE-10482
> URL: https://issues.apache.org/jira/browse/LUCENE-10482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 9.1
>Reporter: Gautam Worah
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} 
> in my day job where we were trying to replace the index underneath a reader 
> asynchronously and then call the {{doOpenIfChanged}} call on it.
> It turns out that the taxonomy index uses its own index based counter (the 
> {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write 
> mode after the last time it was written and if not, it directly tries to 
> reuse the previous {{taxoArrays}} it had created. This logic fails in a 
> scenario where both the old and new index were opened just once but the index 
> itself is completely different in both the cases.
> In such a case, it would be good to give the user the flexibility to inform 
> the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and 
> {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in 
> various ways). Luckily, such a constructor already exists! But it is private 
> today! The idea here is to allow subclasses of DTR to use this constructor.
> Curious to see what other folks think about this idea. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih merged pull request #813: Backport LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders…

2022-04-15 Thread GitBox


zhaih merged PR #813:
URL: https://github.com/apache/lucene/pull/813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (LUCENE-7863) Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc

2022-04-15 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev closed LUCENE-7863.

Lucene Fields:   (was: New)

> Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc  
> 
>
> Key: LUCENE-7863
> URL: https://issues.apache.org/jira/browse/LUCENE-7863
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: LUCENE-7863.hazard, LUCENE-7863.patch, 
> LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, 
> LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, 
> LUCENE-7863.patch, bench-byte-array-long.out, bench-byte-array2.out, 
> benchmark-1m.out, byterefshash-bench.txt
>
>
> h2. Context
> \*suffix and \*infix\* searches on large indexes. 
> h2. Problem
> Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm 
> shuddering to think about EdgeNGrams...
> h2. Proposal 
> _DR_-Y- postings



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7863) Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc

2022-04-15 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev resolved LUCENE-7863.
--
Resolution: Won't Fix

> Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc  
> 
>
> Key: LUCENE-7863
> URL: https://issues.apache.org/jira/browse/LUCENE-7863
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: LUCENE-7863.hazard, LUCENE-7863.patch, 
> LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, 
> LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, 
> LUCENE-7863.patch, bench-byte-array-long.out, bench-byte-array2.out, 
> benchmark-1m.out, byterefshash-bench.txt
>
>
> h2. Context
> \*suffix and \*infix\* searches on large indexes. 
> h2. Problem
> Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm 
> shuddering to think about EdgeNGrams...
> h2. Proposal 
> _DR_-Y- postings



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 opened a new pull request, #813: Backport LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders…

2022-04-15 Thread GitBox


gautamworah96 opened a new pull request, #813:
URL: https://github.com/apache/lucene/pull/813

   Backport of PR: https://github.com/apache/lucene/pull/762, commit: 
10ebc099c846c7d96f4ff5f9b7853df850fa8442 for branch_9x.
   
   Changes entry is under 9.2 in both the earlier PR and this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on pull request #811: Add some basic tasks to help/workflow

2022-04-15 Thread GitBox


gautamworah96 commented on PR #811:
URL: https://github.com/apache/lucene/pull/811#issuecomment-1100391643

   Hmm. I had not looked at the `help/tests.txt` file. +1 on not adding all the 
comprehensive options to this workflow file.
   LGTM overall. 
   
   I think we can still do some more to help new contributors but I have no 
specific action items in my head. I'll create separate JIRAs/PRs if I come up 
with something. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-04-15 Thread GitBox


mayya-sharipova commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1100389761

   @LuXugang  Thanks a lot for your work. I was thinking may be a better way to 
present these changes is to leave all formats changes to a later PR. And for 
this PR just to make changes on `Lucene91HnswVectorsWriter` and 
`Lucene91HnswVectorsReader` directly, otherwise it is difficult to see what 
exactly has been changed.  WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-04-15 Thread GitBox


mayya-sharipova commented on code in PR #792:
URL: https://github.com/apache/lucene/pull/792#discussion_r851509289


##
lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsWriter.java:
##
@@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs.lucene92;
+
+import static 
org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsFormat.DIRECT_MONOTONIC_BLOCK_SHIFT;
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Arrays;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.KnnVectorsWriter;
+import org.apache.lucene.codecs.lucene90.IndexedDISI;
+import org.apache.lucene.index.DocsWithFieldSet;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.RandomAccessVectorValuesProducer;
+import org.apache.lucene.index.SegmentWriteState;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.hnsw.HnswGraph.NodesIterator;
+import org.apache.lucene.util.hnsw.HnswGraphBuilder;
+import org.apache.lucene.util.hnsw.NeighborArray;
+import org.apache.lucene.util.hnsw.OnHeapHnswGraph;
+import org.apache.lucene.util.packed.DirectMonotonicWriter;
+
+/**
+ * Writes vector values and knn graphs to index segments.
+ *
+ * @lucene.experimental
+ */
+public final class Lucene92HnswVectorsWriter extends KnnVectorsWriter {
+
+  private final SegmentWriteState segmentWriteState;
+  private final IndexOutput meta, vectorData, vectorIndex;
+  private final int maxDoc;
+
+  private final int maxConn;
+  private final int beamWidth;
+  private boolean finished;
+
+  Lucene92HnswVectorsWriter(SegmentWriteState state, int maxConn, int 
beamWidth)
+  throws IOException {
+this.maxConn = maxConn;
+this.beamWidth = beamWidth;
+
+assert state.fieldInfos.hasVectorValues();
+segmentWriteState = state;
+
+String metaFileName =
+IndexFileNames.segmentFileName(
+state.segmentInfo.name, state.segmentSuffix, 
Lucene92HnswVectorsFormat.META_EXTENSION);
+
+String vectorDataFileName =
+IndexFileNames.segmentFileName(
+state.segmentInfo.name,
+state.segmentSuffix,
+Lucene92HnswVectorsFormat.VECTOR_DATA_EXTENSION);
+
+String indexDataFileName =
+IndexFileNames.segmentFileName(
+state.segmentInfo.name,
+state.segmentSuffix,
+Lucene92HnswVectorsFormat.VECTOR_INDEX_EXTENSION);
+
+boolean success = false;
+try {
+  meta = state.directory.createOutput(metaFileName, state.context);
+  vectorData = state.directory.createOutput(vectorDataFileName, 
state.context);
+  vectorIndex = state.directory.createOutput(indexDataFileName, 
state.context);
+
+  CodecUtil.writeIndexHeader(
+  meta,
+  Lucene92HnswVectorsFormat.META_CODEC_NAME,
+  Lucene92HnswVectorsFormat.VERSION_CURRENT,
+  state.segmentInfo.getId(),
+  state.segmentSuffix);
+  CodecUtil.writeIndexHeader(
+  vectorData,
+  Lucene92HnswVectorsFormat.VECTOR_DATA_CODEC_NAME,
+  Lucene92HnswVectorsFormat.VERSION_CURRENT,
+  state.segmentInfo.getId(),
+  state.segmentSuffix);
+  CodecUtil.writeIndexHeader(
+  vectorIndex,
+  Lucene92HnswVectorsFormat.VECTOR_INDEX_CODEC_NAME,
+  Lucene92HnswVectorsFormat.VERSION_CURRENT,
+  state.segmentInfo.getId(),
+  state.segmentSuffix);
+  maxDoc = state.segmentInfo.maxDoc();
+  success = true;
+} finally {
+  if (success == false) {
+IOUtils.closeWhileHandlingException(this);
+  }
+}
+  }
+
+  @Override
+  public void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
+  throws IOException {
+   

[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522911#comment-17522911
 ] 

ASF subversion and git services commented on LUCENE-10482:
--

Commit 10ebc099c846c7d96f4ff5f9b7853df850fa8442 in lucene's branch 
refs/heads/main from Gautam Worah
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=10ebc099c84 ]

LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with 
empty taxoArrays instead of letting the taxoEpoch decide (#762)



> Allow users to create their own DirectoryTaxonomyReaders with empty 
> taxoArrays instead of letting the taxoEpoch decide
> --
>
> Key: LUCENE-10482
> URL: https://issues.apache.org/jira/browse/LUCENE-10482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 9.1
>Reporter: Gautam Worah
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} 
> in my day job where we were trying to replace the index underneath a reader 
> asynchronously and then call the {{doOpenIfChanged}} call on it.
> It turns out that the taxonomy index uses its own index based counter (the 
> {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write 
> mode after the last time it was written and if not, it directly tries to 
> reuse the previous {{taxoArrays}} it had created. This logic fails in a 
> scenario where both the old and new index were opened just once but the index 
> itself is completely different in both the cases.
> In such a case, it would be good to give the user the flexibility to inform 
> the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and 
> {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in 
> various ways). Luckily, such a constructor already exists! But it is private 
> today! The idea here is to allow subclasses of DTR to use this constructor.
> Curious to see what other folks think about this idea. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-15 Thread GitBox


zhaih commented on PR #762:
URL: https://github.com/apache/lucene/pull/762#issuecomment-1100258275

   Pushed, could you also open a backport PR? @gautamworah96 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih merged pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-15 Thread GitBox


zhaih merged PR #762:
URL: https://github.com/apache/lucene/pull/762


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-15 Thread GitBox


mocobeta commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1100187804

   Hi Robert and Mike, thank you for your response. I think this can be kept 
open for a sufficient time period - it is unlikely to happen large conflicts 
between these changes and the main branch.
   
   I reviewed it several times during making this patch and I believe this does 
not change the tokenizers' behavior (although I can't fully guarantee that). 
I'd be glad if you give feedback about the overall design, and ideas to make 
the interfaces safer for further refactoring or development.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread GitBox


ChrisHegarty commented on code in PR #812:
URL: https://github.com/apache/lucene/pull/812#discussion_r851301618


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java:
##
@@ -126,23 +126,41 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs == null) {

Review Comment:
   This change just hoists the null check outside of the (potentially hot) 
loop. In our analysis we don't observe that C2 can consistently automatically 
do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Hegarty updated LUCENE-10517:
---
Description: 
While analysing various profiles, [@grcevski|https://github.com/grcevski] and I 
can came across this potential improvement.

SortedSetDV faceting (and friends), can improve performance within tight loops 
by using invokevirtual (rather than invokeinterface). The C2 JIT compiler can 
produce slightly more optimal code in this case, and since these loops are very 
hot, the impact can be significant (in the order of 10-30%).

This issue is in some ways similar to, and builds upon, prior optimisations in 
this area, like say LUCENE-5300 or more recently LUCENE-5309

  was:
SortedSetDV faceting (and friends), can improve performance within tight loops 
by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT compiler 
can produce slightly more optimal code in this case, and since these loops are 
very hot, the impact can be significant (in the order of 10-20%).

The code change amounts to using `SortedDocValues` or `SortedSetDocValues` 
class types, rather than the `DocIdSetIterator` interface type, in loops 
(specifically for invocation of `nextDoc()`, when the iterator type is known 
and not wrapped. 

This issue is in some ways similar, and builds upon, prior optimisations in 
this area, like say LUCENE-5300.


> Improve performance of SortedSetDV faceting by iterating on class types
> ---
>
> Key: LUCENE-10517
> URL: https://issues.apache.org/jira/browse/LUCENE-10517
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Chris Hegarty
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While analysing various profiles, [@grcevski|https://github.com/grcevski] and 
> I can came across this potential improvement.
> SortedSetDV faceting (and friends), can improve performance within tight 
> loops by using invokevirtual (rather than invokeinterface). The C2 JIT 
> compiler can produce slightly more optimal code in this case, and since these 
> loops are very hot, the impact can be significant (in the order of 10-30%).
> This issue is in some ways similar to, and builds upon, prior optimisations 
> in this area, like say LUCENE-5300 or more recently LUCENE-5309



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Hegarty updated LUCENE-10517:
---
Issue Type: Improvement  (was: Bug)

> Improve performance of SortedSetDV faceting by iterating on class types
> ---
>
> Key: LUCENE-10517
> URL: https://issues.apache.org/jira/browse/LUCENE-10517
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Chris Hegarty
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SortedSetDV faceting (and friends), can improve performance within tight 
> loops by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT 
> compiler can produce slightly more optimal code in this case, and since these 
> loops are very hot, the impact can be significant (in the order of 10-20%).
> The code change amounts to using `SortedDocValues` or `SortedSetDocValues` 
> class types, rather than the `DocIdSetIterator` interface type, in loops 
> (specifically for invocation of `nextDoc()`, when the iterator type is known 
> and not wrapped. 
> This issue is in some ways similar, and builds upon, prior optimisations in 
> this area, like say LUCENE-5300.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522838#comment-17522838
 ] 

Chris Hegarty edited comment on LUCENE-10517 at 4/15/22 2:21 PM:
-

I my M1 I get the following luceneutil benchmark results.

Hardware Overview:

Chip: Apple M1
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB


{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                       LowPhrase      148.35      (2.1%)      143.66      
(2.6%)   -3.2% (  -7% -    1%) 0.000
             MedIntervalsOrdered      197.27      (3.7%)      191.24      
(5.7%)   -3.1% ( -12% -    6%) 0.044
            HighIntervalsOrdered       11.55      (2.6%)       11.33      
(3.5%)   -1.9% (  -7% -    4%) 0.055
                      AndHighMed      447.74      (2.1%)      441.26      
(2.4%)   -1.4% (  -5% -    3%) 0.042
                        HighTerm     2397.60      (4.0%)     2367.10      
(2.4%)   -1.3% (  -7% -    5%) 0.223
                         LowTerm     3939.37      (2.7%)     3890.14      
(2.3%)   -1.2% (  -6% -    3%) 0.111
                   OrHighNotHigh     1917.21      (2.8%)     1893.94      
(3.2%)   -1.2% (  -6% -    4%) 0.198
                      HighPhrase       32.93      (1.9%)       32.55      
(1.1%)   -1.2% (  -4% -    1%) 0.022
                        PKLookup      340.11      (4.5%)      336.69      
(4.3%)   -1.0% (  -9% -    8%) 0.471
                      TermDTSort      145.39      (4.1%)      144.09      
(2.3%)   -0.9% (  -7% -    5%) 0.394
                    HighSpanNear       10.38      (3.7%)       10.32      
(1.9%)   -0.6% (  -5% -    5%) 0.531
                     MedSpanNear      206.69      (2.8%)      205.70      
(1.5%)   -0.5% (  -4% -    3%) 0.500
                          Fuzzy2       91.75      (2.5%)       91.41      
(1.4%)   -0.4% (  -4% -    3%) 0.562
                    OrHighNotMed     1975.22      (3.5%)     1968.91      
(2.7%)   -0.3% (  -6% -    6%) 0.744
                       OrHighMed       66.62      (3.9%)       66.45      
(4.8%)   -0.3% (  -8% -    8%) 0.850
                 LowSloppyPhrase       62.60      (2.1%)       62.44      
(2.5%)   -0.3% (  -4% -    4%) 0.726
                    OrHighNotLow     1876.16      (2.5%)     1871.56      
(2.4%)   -0.2% (  -5% -    4%) 0.756
                      OrHighHigh       55.70      (3.9%)       55.64      
(4.9%)   -0.1% (  -8% -    9%) 0.940
                          Fuzzy1      100.97      (2.2%)      100.88      
(2.1%)   -0.1% (  -4% -    4%) 0.898
             LowIntervalsOrdered       42.24      (0.7%)       42.21      
(1.0%)   -0.1% (  -1% -    1%) 0.766
                       MedPhrase      923.85      (1.3%)      923.14      
(1.6%)   -0.1% (  -2% -    2%) 0.867
                    OrNotHighMed     1427.45      (2.0%)     1428.11      
(2.5%)    0.0% (  -4% -    4%) 0.949
                         Respell       82.74      (2.6%)       82.81      
(1.9%)    0.1% (  -4% -    4%) 0.903
                     LowSpanNear      373.63      (2.6%)      373.97      
(1.6%)    0.1% (  -4% -    4%) 0.893
           HighTermDayOfYearSort      199.64      (1.7%)      199.83      
(2.5%)    0.1% (  -4% -    4%) 0.887
                   OrNotHighHigh     1523.02      (2.2%)     1526.12      
(2.0%)    0.2% (  -3% -    4%) 0.759
         AndHighMedDayTaxoFacets      185.23      (0.9%)      185.79      
(1.4%)    0.3% (  -1% -    2%) 0.416
                         MedTerm     3016.98      (3.4%)     3026.53      
(3.2%)    0.3% (  -6% -    7%) 0.761
                    OrNotHighLow     1867.65      (2.5%)     1876.63      
(2.4%)    0.5% (  -4% -    5%) 0.535
                      AndHighLow     1571.61      (3.1%)     1579.86      
(2.6%)    0.5% (  -5% -    6%) 0.564
                       OrHighLow     1485.93      (3.7%)     1494.56      
(2.5%)    0.6% (  -5% -    7%) 0.559
                     AndHighHigh       80.42      (2.8%)       81.06      
(1.7%)    0.8% (  -3% -    5%) 0.273
                HighSloppyPhrase       50.68      (4.0%)       51.14      
(4.7%)    0.9% (  -7% -    9%) 0.506
                 MedSloppyPhrase       40.76      (2.6%)       41.13      
(3.6%)    0.9% (  -5% -    7%) 0.356
                        Wildcard      123.13      (7.3%)      124.34      
(6.5%)    1.0% ( -11% -   15%) 0.654
        AndHighHighDayTaxoFacets       17.77      (2.8%)       17.95      
(2.7%)    1.0% (  -4% -    6%) 0.256
            MedTermDayTaxoFacets       46.83      (2.6%)       47.38      
(1.8%)    1.2% (  -3% -    5%) 0.097
               HighTermMonthSort      193.35      (1.5%)      195.77      
(5.4%)    1.2% (  -5% -    8%) 0.320
                          IntNRQ       69.13     (17.2%)       70.81     
(16.2%)    2.4% ( -26% -   43%) 0.646
            High

[GitHub] [lucene] mikemccand commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-15 Thread GitBox


mikemccand commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1100135300

   Whoa, this sounds awesome!  I will try to review soon.  Thanks @mocobeta.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread GitBox


ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1100132616

   The perf improvements come from changing the target type of the `nextDoc` 
invocations - which results in an invokevirtual rather than an invokeinterface. 
The changes in this PR proposed to add variants of countXX where the 
aforementioned is possible, but alternatively branches could be the existing 
code (rather than adding new methods), which achieves similar perf results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread GitBox


ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1100130595

   I added some luceneutil benchmark output in the JIRA issue, but while 
positive someone more familiar running these benchmarks should verify in their 
own environment.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ChrisHegarty opened a new pull request, #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread GitBox


ChrisHegarty opened a new pull request, #812:
URL: https://github.com/apache/lucene/pull/812

   # Description
   
   SortedSetDV faceting (and friends), can improve performance within tight 
loops by using invokevirtual (rather than invokeinterface). The C2 JIT compiler 
can produce slightly more optimal code in this case, and since these loops are 
very hot, the impact can be significant (in the order of 10-20%).
   
   This issue is in some ways similar, and builds upon, prior optimisations in 
this area, like say 
[LUCENE-5300](https://issues.apache.org/jira/browse/LUCENE-5300).
   # Solution
   
   The code change amounts to using `SortedDocValues` or `SortedSetDocValues` 
class types, rather than the `DocIdSetIterator` interface type, in loops 
(specifically for invocation of `nextDoc()`, when the iterator type is known 
and not wrapped.
   
   # Tests
   
   No new tests. Existing tests all pass successfully.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522842#comment-17522842
 ] 

Chris Hegarty commented on LUCENE-10517:


While the two sets of results above show significant improvements, ~30%, on 
some benchmarks, we see some variation and somewhat lesser improvements on said 
benchmarks from run to run. Nevertheless, always positive.

> Improve performance of SortedSetDV faceting by iterating on class types
> ---
>
> Key: LUCENE-10517
> URL: https://issues.apache.org/jira/browse/LUCENE-10517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Chris Hegarty
>Priority: Minor
>
> SortedSetDV faceting (and friends), can improve performance within tight 
> loops by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT 
> compiler can produce slightly more optimal code in this case, and since these 
> loops are very hot, the impact can be significant (in the order of 10-20%).
> The code change amounts to using `SortedDocValues` or `SortedSetDocValues` 
> class types, rather than the `DocIdSetIterator` interface type, in loops 
> (specifically for invocation of `nextDoc()`, when the iterator type is known 
> and not wrapped. 
> This issue is in some ways similar, and builds upon, prior optimisations in 
> this area, like say LUCENE-5300.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522839#comment-17522839
 ] 

Chris Hegarty commented on LUCENE-10517:


[~grcevski] observes the following on his standalone Linux x64 machine:

 
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  IntNRQ  113.25 (13.0%)  109.09 
(16.2%)   -3.7% ( -29% -   29%) 0.428
 MedIntervalsOrdered  219.98  (8.5%)  212.53  
(7.4%)   -3.4% ( -17% -   13%) 0.177
HighIntervalsOrdered   13.46  (9.2%)   13.07  
(8.0%)   -2.9% ( -18% -   15%) 0.291
 LowIntervalsOrdered   47.61  (4.7%)   46.68  
(5.5%)   -2.0% ( -11% -8%) 0.226
  Fuzzy2   89.86  (2.0%)   89.12  
(1.7%)   -0.8% (  -4% -2%) 0.161
   OrHighMed   82.18  (3.9%)   81.69  
(4.1%)   -0.6% (  -8% -7%) 0.633
  Fuzzy1  103.27  (1.7%)  102.66  
(1.7%)   -0.6% (  -3% -2%) 0.282
   OrHighLow 1473.95  (2.1%) 1468.06  
(2.7%)   -0.4% (  -5% -4%) 0.603
 Respell   77.25  (1.7%)   77.03  
(2.0%)   -0.3% (  -3% -3%) 0.619
PKLookup  278.02  (1.6%)  277.37  
(2.5%)   -0.2% (  -4% -3%) 0.721
   HighTermDayOfYearSort  208.62 (13.4%)  208.28  
(8.9%)   -0.2% ( -19% -   25%) 0.964
  AndHighMed  478.44  (3.4%)  477.87  
(3.4%)   -0.1% (  -6% -6%) 0.912
HighSpanNear   11.46  (3.8%)   11.47  
(2.9%)0.0% (  -6% -7%) 0.973
 LowTerm 3361.20  (6.6%) 3362.81  
(4.4%)0.0% ( -10% -   11%) 0.978
 MedSpanNear  175.70  (3.5%)  175.99  
(3.2%)0.2% (  -6% -7%) 0.877
  OrHighHigh   65.85  (3.6%)   65.95  
(3.9%)0.2% (  -7% -8%) 0.890
HighTermTitleBDVSort  198.74 (14.7%)  199.10 
(11.1%)0.2% ( -22% -   30%) 0.965
HighSloppyPhrase   60.81  (3.0%)   60.95  
(3.4%)0.2% (  -5% -6%) 0.812
   MedPhrase  923.07  (3.2%)  925.41  
(2.8%)0.3% (  -5% -6%) 0.793
 AndHighMedDayTaxoFacets  173.34  (1.9%)  174.54  
(1.9%)0.7% (  -3% -4%) 0.249
  HighPhrase   35.97  (2.8%)   36.29  
(2.9%)0.9% (  -4% -6%) 0.315
AndHighHighDayTaxoFacets   19.91  (3.2%)   20.10  
(2.5%)1.0% (  -4% -6%) 0.287
 AndHighHigh   89.95  (3.7%)   90.90  
(3.2%)1.1% (  -5% -8%) 0.339
 MedSloppyPhrase   46.68  (3.9%)   47.22  
(4.4%)1.2% (  -6% -9%) 0.380
   OrNotHighHigh 1510.87  (3.7%) 1530.06  
(4.2%)1.3% (  -6% -9%) 0.310
Wildcard  117.63  (3.1%)  119.21  
(4.0%)1.3% (  -5% -8%) 0.236
OrNotHighLow 1702.07  (2.4%) 1725.22  
(2.9%)1.4% (  -3% -6%) 0.101
 LowSpanNear  377.84  (3.5%)  383.09  
(2.6%)1.4% (  -4% -7%) 0.157
   LowPhrase  132.15  (2.7%)  134.16  
(2.9%)1.5% (  -3% -7%) 0.086
   HighTermMonthSort  198.05 (14.9%)  201.09 
(13.1%)1.5% ( -23% -   34%) 0.730
 LowSloppyPhrase   62.78  (1.8%)   63.78  
(2.1%)1.6% (  -2% -5%) 0.010
OrHighNotMed 1951.27  (4.7%) 1983.85  
(5.4%)1.7% (  -8% -   12%) 0.296
   OrHighNotHigh 1934.75  (3.8%) 1974.20  
(4.1%)2.0% (  -5% -   10%) 0.102
OrHighNotLow 1850.82  (4.2%) 1889.29  
(5.6%)2.1% (  -7% -   12%) 0.187
  AndHighLow 1487.40  (4.6%) 1519.08  
(3.1%)2.1% (  -5% -   10%) 0.088
BrowseDateSSDVFacets5.13  (4.2%)5.25  
(5.5%)2.2% (  -7% -   12%) 0.150
 MedTerm 3151.81  (6.1%) 3232.28  
(5.9%)2.6% (  -8% -   15%) 0.178
OrNotHighMed 1169.40  (3.2%) 1199.40  
(4.4%)2.6% (  -4% -   10%) 0.035
MedTermDayTaxoFacets   55.14  (4.1%)   56.59  
(4.1%)2.6% (  -5% -   11%) 0.040
 Prefix3  206.64 (12.8%)  212.16 
(15.8%)2.7% ( -23% -   35%) 0.558
HighTerm 2405.51  

[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522838#comment-17522838
 ] 

Chris Hegarty commented on LUCENE-10517:


I my M1 I get the following luceneutil benchmark results.
$ sw_vers
ProductName:macOS
ProductVersion: 11.5.2
BuildVersion:   20G95

$ uname -a
Darwin chegar-MBP.local 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 
00:26:27 PDT 2021; root:xnu-7195.141.2~5/RELEASE_ARM64_T8101 arm64

$ sysctl -n machdep.cpu.brand_string
Apple M1

$ system_profiler SPHardwareDataType
Hardware:

Hardware Overview:

  Model Name: MacBook Pro
  Model Identifier: MacBookPro17,1
  Chip: Apple M1
  Total Number of Cores: 8 (4 performance and 4 efficiency)
  Memory: 16 GB
  System Firmware Version: 6723.140.2
  OS Loader Version: 6723.140.2
  Serial Number (system): FVFG731MQ05P
  Hardware UUID: 1D7BA696-DBDB-5E9C-BD46-5A18758DE699
  Provisioning UDID: 8103-000A05E001C0801E
  Activation Lock Status: Disabled
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                       LowPhrase      148.35      (2.1%)      143.66      
(2.6%)   -3.2% (  -7% -    1%) 0.000
             MedIntervalsOrdered      197.27      (3.7%)      191.24      
(5.7%)   -3.1% ( -12% -    6%) 0.044
            HighIntervalsOrdered       11.55      (2.6%)       11.33      
(3.5%)   -1.9% (  -7% -    4%) 0.055
                      AndHighMed      447.74      (2.1%)      441.26      
(2.4%)   -1.4% (  -5% -    3%) 0.042
                        HighTerm     2397.60      (4.0%)     2367.10      
(2.4%)   -1.3% (  -7% -    5%) 0.223
                         LowTerm     3939.37      (2.7%)     3890.14      
(2.3%)   -1.2% (  -6% -    3%) 0.111
                   OrHighNotHigh     1917.21      (2.8%)     1893.94      
(3.2%)   -1.2% (  -6% -    4%) 0.198
                      HighPhrase       32.93      (1.9%)       32.55      
(1.1%)   -1.2% (  -4% -    1%) 0.022
                        PKLookup      340.11      (4.5%)      336.69      
(4.3%)   -1.0% (  -9% -    8%) 0.471
                      TermDTSort      145.39      (4.1%)      144.09      
(2.3%)   -0.9% (  -7% -    5%) 0.394
                    HighSpanNear       10.38      (3.7%)       10.32      
(1.9%)   -0.6% (  -5% -    5%) 0.531
                     MedSpanNear      206.69      (2.8%)      205.70      
(1.5%)   -0.5% (  -4% -    3%) 0.500
                          Fuzzy2       91.75      (2.5%)       91.41      
(1.4%)   -0.4% (  -4% -    3%) 0.562
                    OrHighNotMed     1975.22      (3.5%)     1968.91      
(2.7%)   -0.3% (  -6% -    6%) 0.744
                       OrHighMed       66.62      (3.9%)       66.45      
(4.8%)   -0.3% (  -8% -    8%) 0.850
                 LowSloppyPhrase       62.60      (2.1%)       62.44      
(2.5%)   -0.3% (  -4% -    4%) 0.726
                    OrHighNotLow     1876.16      (2.5%)     1871.56      
(2.4%)   -0.2% (  -5% -    4%) 0.756
                      OrHighHigh       55.70      (3.9%)       55.64      
(4.9%)   -0.1% (  -8% -    9%) 0.940
                          Fuzzy1      100.97      (2.2%)      100.88      
(2.1%)   -0.1% (  -4% -    4%) 0.898
             LowIntervalsOrdered       42.24      (0.7%)       42.21      
(1.0%)   -0.1% (  -1% -    1%) 0.766
                       MedPhrase      923.85      (1.3%)      923.14      
(1.6%)   -0.1% (  -2% -    2%) 0.867
                    OrNotHighMed     1427.45      (2.0%)     1428.11      
(2.5%)    0.0% (  -4% -    4%) 0.949
                         Respell       82.74      (2.6%)       82.81      
(1.9%)    0.1% (  -4% -    4%) 0.903
                     LowSpanNear      373.63      (2.6%)      373.97      
(1.6%)    0.1% (  -4% -    4%) 0.893
           HighTermDayOfYearSort      199.64      (1.7%)      199.83      
(2.5%)    0.1% (  -4% -    4%) 0.887
                   OrNotHighHigh     1523.02      (2.2%)     1526.12      
(2.0%)    0.2% (  -3% -    4%) 0.759
         AndHighMedDayTaxoFacets      185.23      (0.9%)      185.79      
(1.4%)    0.3% (  -1% -    2%) 0.416
                         MedTerm     3016.98      (3.4%)     3026.53      
(3.2%)    0.3% (  -6% -    7%) 0.761
                    OrNotHighLow     1867.65      (2.5%)     1876.63      
(2.4%)    0.5% (  -4% -    5%) 0.535
                      AndHighLow     1571.61      (3.1%)     1579.86      
(2.6%)    0.5% (  -5% -    6%) 0.564
                       OrHighLow     1485.93      (3.7%)     1494.56      
(2.5%)    0.6% (  -5% -    7%) 0.559
                     AndHighHigh       80.42      (2.8%)       81.06      
(1.7%)    0.8% (  -3% -    5%) 0.273
                HighSloppyPhrase       50.68      (4.0%)       51.14      
(4.7%)    0.9% (  -7% -    9%) 0.506
                 MedSloppyPhrase       40.76      (2.6%

[jira] [Created] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-15 Thread Chris Hegarty (Jira)
Chris Hegarty created LUCENE-10517:
--

 Summary: Improve performance of SortedSetDV faceting by iterating 
on class types
 Key: LUCENE-10517
 URL: https://issues.apache.org/jira/browse/LUCENE-10517
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 9.1
Reporter: Chris Hegarty


SortedSetDV faceting (and friends), can improve performance within tight loops 
by using _invokevirtual_ (rather than _invokeinterface_). The C2 JIT compiler 
can produce slightly more optimal code in this case, and since these loops are 
very hot, the impact can be significant (in the order of 10-20%).

The code change amounts to using `SortedDocValues` or `SortedSetDocValues` 
class types, rather than the `DocIdSetIterator` interface type, in loops 
(specifically for invocation of `nextDoc()`, when the iterator type is known 
and not wrapped. 

This issue is in some ways similar, and builds upon, prior optimisations in 
this area, like say LUCENE-5300.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-15 Thread GitBox


rmuir commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1100064836

   I think @mikemccand actually created most of this code and is most familiar 
with it. Mike, if you have time can you look too?
   
   For the special n-best class, is the issue that `nori` simply doesn't offer 
this n-best feature? For the future, I wonder if there is some reason it 
doesn't make sense there too? Maybe it was just overlooked before?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-15 Thread GitBox


rmuir commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1100062662

   Sure, sorry for the slow response.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #811: Add some basic tasks to help/workflow

2022-04-15 Thread GitBox


mocobeta commented on PR #811:
URL: https://github.com/apache/lucene/pull/811#issuecomment-1100018613

   @gautamworah96 thanks for your comments.
   
   > I sometimes use the -Ptests.iters= param for beasting out 
multiple runs of a single test to catch random edge cases that I might have 
missed (this was another trick that I just stumbled upon through JIRA). Maybe 
we could add this to the workflow file as well?
   
   I found `help/tests.txt` (the source for `gradlew helpTests` task) has a 
dedicated section about "Reiteration", where `-Ptests.iters` option is 
explained. 
   https://github.com/apache/lucene/blob/main/help/tests.txt#L87
   
   `test` task has many parameters, I think it'd be better to encourage devs to 
refer to that file than augment `workflow.txt`? (You can see a pointer says 
`run "gradlew :helpTests" for more` in L12 in workflow.txt.)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a diff in pull request #811: Add some basic tasks to help/workflow

2022-04-15 Thread GitBox


mocobeta commented on code in PR #811:
URL: https://github.com/apache/lucene/pull/811#discussion_r851184763


##
help/workflow.txt:
##
@@ -25,11 +25,22 @@ Assemble a single module's JAR (here for lucene-core):
 gradlew -p lucene/core assemble
 ls lucene/core/build/libs
 
+Assemble all JARs:

Review Comment:
   Updated in 
https://github.com/apache/lucene/pull/811/commits/9c41951601d2b07deae84e8700c7342385141754.
 I didn't touch the existing description for the same command with `-p`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a diff in pull request #811: Add some basic tasks to help/workflow

2022-04-15 Thread GitBox


mocobeta commented on code in PR #811:
URL: https://github.com/apache/lucene/pull/811#discussion_r851181043


##
help/workflow.txt:
##
@@ -25,11 +25,22 @@ Assemble a single module's JAR (here for lucene-core):
 gradlew -p lucene/core assemble
 ls lucene/core/build/libs
 
+Assemble all JARs:

Review Comment:
   ```suggestion
   Assemble all Lucene artifacts (JARs, and so on):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-04-15 Thread kkewwei (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kkewwei resolved LUCENE-10448.
--
Resolution: Not A Problem

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org