jpountz commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1223004317
##########
lucene/core/src/java/org/apache/lucene/document/FeatureField.java:
##########
@@ -622,10 +623,11 @@ public static Query newSigmoidQuery(
* @param featureField the field that stores features
* @param featureName the name of the feature
*/
- static float computePivotFeatureValue(IndexReader reader, String
featureField, String featureName)
+ static float computePivotFeatureValue(
+ IndexSearcher searcher, IndexReader reader, String featureField, String
featureName)
Review Comment:
Let's only pass the searcher, since the reader can be obtained from the
searcher? Also can you update javadocs?
##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +92,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contexts will be visited
up-front to collect
* term statistics. Otherwise, the {@link TermState} objects will be
built only when requested
*/
- public static TermStates build(IndexReaderContext context, Term term,
boolean needsStats)
+ public static TermStates build(
+ IndexSearcher indexSearcher, IndexReaderContext context, Term term,
boolean needsStats)
throws IOException {
assert context != null && context.isTopLevel;
final TermStates perReaderTermState = new TermStates(needsStats ? null :
term, context);
if (needsStats) {
- for (final LeafReaderContext ctx : context.leaves()) {
- // if (DEBUG) System.out.println(" r=" + leaves[i].reader);
- TermsEnum termsEnum = loadTermsEnum(ctx, term);
- if (termsEnum != null) {
- final TermState termState = termsEnum.termState();
- // if (DEBUG) System.out.println(" found");
- perReaderTermState.register(
- termState, ctx.ord, termsEnum.docFreq(),
termsEnum.totalTermFreq());
+ Executor executor = indexSearcher.getExecutor();
+ boolean isShutdown = false;
+ if (executor instanceof ExecutorService) {
+ isShutdown = ((ExecutorService) executor).isShutdown();
+ }
+ if (executor != null && isShutdown == false) {
+ // build term states concurrently
+ List<FutureTask<Integer>> tasks =
+ context.leaves().stream()
+ .map(
+ ctx ->
+ new FutureTask<>(
+ () -> {
+ TermsEnum termsEnum = loadTermsEnum(ctx, term);
+ if (termsEnum != null) {
+ final TermState termState =
termsEnum.termState();
+ perReaderTermState.register(
Review Comment:
Instead of making `perReaderTermState.register` thread-safe, I'd prefer to
make the task return a wrapper around the term state, ord and term statistics,
and then only register this information against `perReaderTermSTate` when the
task returns?
##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +92,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contexts will be visited
up-front to collect
* term statistics. Otherwise, the {@link TermState} objects will be
built only when requested
*/
- public static TermStates build(IndexReaderContext context, Term term,
boolean needsStats)
+ public static TermStates build(
+ IndexSearcher indexSearcher, IndexReaderContext context, Term term,
boolean needsStats)
throws IOException {
assert context != null && context.isTopLevel;
final TermStates perReaderTermState = new TermStates(needsStats ? null :
term, context);
if (needsStats) {
- for (final LeafReaderContext ctx : context.leaves()) {
- // if (DEBUG) System.out.println(" r=" + leaves[i].reader);
- TermsEnum termsEnum = loadTermsEnum(ctx, term);
- if (termsEnum != null) {
- final TermState termState = termsEnum.termState();
- // if (DEBUG) System.out.println(" found");
- perReaderTermState.register(
- termState, ctx.ord, termsEnum.docFreq(),
termsEnum.totalTermFreq());
+ Executor executor = indexSearcher.getExecutor();
+ boolean isShutdown = false;
+ if (executor instanceof ExecutorService) {
+ isShutdown = ((ExecutorService) executor).isShutdown();
+ }
Review Comment:
We shouldn't check whether the executor is shut down, it's the
responsibility of the user to pass an executor that is not shut down to the
IndexSearcher constructor?
##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +92,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contexts will be visited
up-front to collect
* term statistics. Otherwise, the {@link TermState} objects will be
built only when requested
*/
- public static TermStates build(IndexReaderContext context, Term term,
boolean needsStats)
+ public static TermStates build(
+ IndexSearcher indexSearcher, IndexReaderContext context, Term term,
boolean needsStats)
throws IOException {
assert context != null && context.isTopLevel;
final TermStates perReaderTermState = new TermStates(needsStats ? null :
term, context);
if (needsStats) {
- for (final LeafReaderContext ctx : context.leaves()) {
- // if (DEBUG) System.out.println(" r=" + leaves[i].reader);
- TermsEnum termsEnum = loadTermsEnum(ctx, term);
- if (termsEnum != null) {
- final TermState termState = termsEnum.termState();
- // if (DEBUG) System.out.println(" found");
- perReaderTermState.register(
- termState, ctx.ord, termsEnum.docFreq(),
termsEnum.totalTermFreq());
+ Executor executor = indexSearcher.getExecutor();
+ boolean isShutdown = false;
+ if (executor instanceof ExecutorService) {
+ isShutdown = ((ExecutorService) executor).isShutdown();
+ }
+ if (executor != null && isShutdown == false) {
Review Comment:
For simplicity, we could do something like `if (executor == null) { executor
= Runnable::run; }` and then have a single code path for building term states?
I wouldn't expect it to hurt performance as `TermStates#build` is not called in
hot loops.
##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -86,19 +92,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contexts will be visited
up-front to collect
* term statistics. Otherwise, the {@link TermState} objects will be
built only when requested
*/
- public static TermStates build(IndexReaderContext context, Term term,
boolean needsStats)
+ public static TermStates build(
+ IndexSearcher indexSearcher, IndexReaderContext context, Term term,
boolean needsStats)
Review Comment:
Let's remove the context from the signature since it can be obtained from
the IndexSearcher?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]