[jira] [Commented] (SOLR-15859) Add handler to dump filter cache
[ https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651088#comment-17651088 ] Shawn Heisey commented on SOLR-15859: - {quote}I think it might be possible (and preferable?) to implement this as a custom {{SolrCache}} implementation that wraps {{{}solr.CaffeineCache>{}}}. I think [~ben.manes] was alluding to something like this "MetadataWrapper" approach in his comment above. {quote} I have no idea how to go from those statements to actual usable code. And I don't want to ask you to write it for me, I'd like to do that myself. But if you can come up with back-of-the-envelope pseudocode very quickly that I can flesh out into actual code, that would be appreciated. If what you're describing would involve changes to the way that specific caches (like filterCache) get implemented, then I'm REALLY going to be out of my depth. I once tried to look at that and got completely lost trying to follow the code. In much the same way as what happened when I tried to understand SolrCloud cluster management with the overseer. > Add handler to dump filter cache > > > Key: SOLR-15859 > URL: https://issues.apache.org/jira/browse/SOLR-15859 > Project: Solr > Issue Type: Improvement >Reporter: Andy Lester >Assignee: Shawn Heisey >Priority: Major > Labels: FQ, cache, filtercache, metrics > Attachments: cacheinfo-1.patch, cacheinfo-2.patch, cacheinfo.patch, > fix_92_startup.patch > > > It would be very helpful to be able to inspect the contents of the > filterCache. > I'd like to be able to query something like > {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}} > nentries would be allowed to be -1 to get everything. > It would be nice to see these data items for each entry. I don't know which > are available, but I'm thinking blue sky here: > * cache key, exactly as stored > * Timestamp when the entry was inserted > * Whether the insertion of the entry evicted another entry, and if so which > one > * Timestamp of when this entry was last hit > * Number of hits on this entry forever > * Number of hits on this entry over some time period > * Number of documents matched by the filter > * Number of bytes of memory used by the filter > These are the sorts of questions I'd like to be able answer: > * "I just did a query that I expect will have added a cache entry. Did it?" > * "Are my queries hitting existing cache entries?" > * "How big should I set my filterCache size? Should I limit it by number of > entries or RAM usage?" > * "Which of my FQs are getting used the most? These are the ones I want in > my firstSearcher queries." (I currently determine this by processing my old > solr logs) > * "Which filters give me the most bang for the buck in terms of RAM usage?" > * "I have filter X and filter Y, but would it be beneficial if I made a > filter X AND Y?" > * "Which FQs are used more at certain times of the day? (Assuming I take > regular snapshots throughout the day)" > I imagine a response might look like: > {{{}} > {{ "responseHeader": {}} > {{ "status": 0,}} > {{ "QTime": 961}} > {{ },}} > {{ "response": {}} > {{ "numFound": 12104,}} > {{ "filterCacheKeys": {}} > {{ [}} > {{ "language:eng": {}} > {{ "inserted": "2021-12-04T07:34:16Z",}} > {{ "lastHit": "2021-12-04T18:17:43Z",}} > {{ "numHits": 15065,}} > {{ "numHitsInPastHour": 2319,}} > {{ "evictedKey": "agelevel:4 shippable:Y",}} > {{ "numRecordsMatchedByFilter": 24328753,}} > {{ "bytesUsed": 3041094}} > {{ }}} > {{ ],}} > {{ [}} > {{ "is_set:N": {}} > {{ ...}} > {{ }}} > {{ ],}} > {{ [}} > {{ "language:spa": {}} > {{ ...}} > {{ }}} > {{ ]}} > {{ }}} > {{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] dsmiley commented on a diff in pull request #1215: DocRouter: strengthen abstraction
dsmiley commented on code in PR #1215: URL: https://github.com/apache/solr/pull/1215#discussion_r105508 ## solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java: ## @@ -263,8 +263,9 @@ private void handleGetRanges(CoreAdminHandler.CallInfo it, String coreName) thro DocCollection collection = clusterState.getCollection(collectionName); String sliceName = parentCore.getCoreDescriptor().getCloudDescriptor().getShardId(); Slice slice = collection.getSlice(sliceName); -DocRouter router = -collection.getRouter() != null ? collection.getRouter() : DocRouter.DEFAULT; +CompositeIdRouter router = Review Comment: Correcting myself. This method, `handleGetRanges`, is only called for splitByPrefix (as its javadocs say), which is what depends on CompositeIdRouter. I suppose if there was some weird/unexpected code path (maybe in the future), a ClassCastException wouldn't be particularly unfriendly? ## solr/core/src/java/org/apache/solr/cloud/api/collections/MigrateCmd.java: ## @@ -253,7 +252,7 @@ private void migrateKey( SHARD_ID_PROP, sourceSlice.getName(), "routeKey", -SolrIndexSplitter.getRouteKey(splitKey) + "!", +sourceRouter.getRouteKeyNoSuffix(splitKey) + "!", Review Comment: Line 108 checks for CompositeIdRouter and throws a friendly exception if it isn't. ## solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java: ## @@ -765,18 +766,11 @@ static FixedBitSet[] split( return docSets; } - public static String getRouteKey(String idString) { -int idx = idString.indexOf(CompositeIdRouter.SEPARATOR); -if (idx <= 0) return null; -String part1 = idString.substring(0, idx); -int commaIdx = part1.indexOf(CompositeIdRouter.bitsSeparator); -if (commaIdx > 0 && commaIdx + 1 < part1.length()) { - char ch = part1.charAt(commaIdx + 1); - if (ch >= '0' && ch <= '9') { -part1 = part1.substring(0, commaIdx); - } + private static void checkRouterSupportsSplitKey(HashBasedRouter hashRouter, String splitKey) { Review Comment: SolrIndexSplitterTest tests the "plain" (hash) router. The test continues to pass. It's only the "splitKey" feature of shard splitting that requires CompositeIdRouter. The exception message tries to clarify that the expectation/requirement is tied to splitKey. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] noblepaul commented on pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
noblepaul commented on PR #1242: URL: https://github.com/apache/solr/pull/1242#issuecomment-1362371622 >Avoid making copies of DocCollections when copyWith is called (related to PRS updates) Yes > Avoid fetching PRS states until the state is actually queried (the Lazy PRS provider part) This behavior is not changed. But what is changed is all states are **always** queried just in time. Prior to this , the states were copied into the object when it was constructed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] noblepaul commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
noblepaul commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1055057270 ## solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java: ## @@ -139,30 +138,10 @@ public static String getCollectionPathRoot(String coll) { * only a replica is updated */ public DocCollection copyWith(PerReplicaStates newPerReplicaStates) { Review Comment: when per-replica states change the slices remain the exactly the same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054994623 ## solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java: ## @@ -139,30 +138,10 @@ public static String getCollectionPathRoot(String coll) { * only a replica is updated */ public DocCollection copyWith(PerReplicaStates newPerReplicaStates) { Review Comment: Based on the old code, the modified replicas would be read from the newPerReplicaStates provided to construct a list as `modifiedShards` (which each Slice value instead contains a map of replicas, which info relies on the input `newPerReplicaStates`). Since we are not constructing a new DocCollection here and instead returning the same `this` instance, how can we ensure the such DocCollection instance getSlice calls (`getSlices`, `getSliceMap` etc) returns the updated slice/replica info from `newPerReplicaStates`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054994623 ## solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java: ## @@ -139,30 +138,10 @@ public static String getCollectionPathRoot(String coll) { * only a replica is updated */ public DocCollection copyWith(PerReplicaStates newPerReplicaStates) { Review Comment: Based on the old code, the modified replicas would be read from the newPerReplicaStates provided to construct a list as `modifiedShards` (which each Slice value instead contains a map of replicas, which info relies on the input `newPerReplicaStates`). Since we are not constructing a new DocCollection here and instead returning the same `this` instance, how can we ensure the such DocCollection instance getSlice calls return the updated replica info from `newPerReplicaStates`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] noblepaul commented on pull request #1215: DocRouter: strengthen abstraction
noblepaul commented on PR #1215: URL: https://github.com/apache/solr/pull/1215#issuecomment-1362270740 ```To me this PR does not add a new type of DocRouter, as it is centered around CompositeIdRouter``` I was confused by the original description. thought this was trying to introduce a new `DocRouter` by enhancing `CompositeIdRouter` This PR is about moving all the logic of routing /splitting into `CompositeIdRouter` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] noblepaul commented on a diff in pull request #1215: DocRouter: strengthen abstraction
noblepaul commented on code in PR #1215: URL: https://github.com/apache/solr/pull/1215#discussion_r1054986541 ## solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java: ## @@ -765,18 +766,11 @@ static FixedBitSet[] split( return docSets; } - public static String getRouteKey(String idString) { -int idx = idString.indexOf(CompositeIdRouter.SEPARATOR); -if (idx <= 0) return null; -String part1 = idString.substring(0, idx); -int commaIdx = part1.indexOf(CompositeIdRouter.bitsSeparator); -if (commaIdx > 0 && commaIdx + 1 < part1.length()) { - char ch = part1.charAt(commaIdx + 1); - if (ch >= '0' && ch <= '9') { -part1 = part1.substring(0, commaIdx); - } + private static void checkRouterSupportsSplitKey(HashBasedRouter hashRouter, String splitKey) { Review Comment: the other one is PLAIN router. I don't think Split is possible for that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] noblepaul commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
noblepaul commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054985346 ## solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java: ## @@ -261,9 +259,10 @@ private static DocCollection collectionFromObjects( if (log.isDebugEnabled()) { log.debug("a collection {} has per-replica state", name); } - // this collection has replica states stored outside - ReplicaStatesProvider rsp = REPLICASTATES_PROVIDER.get(); - if (rsp instanceof StatesProvider) ((StatesProvider) rsp).isPerReplicaState = true; +} else { + // prior to this call, PRS provider is set. We should unset it before + // deserializing the replicas and slices + DocCollection.clearReplicaStateProvider(); Review Comment: I'm aware of this problem . The ideal solution would be to pass on the `PrsSupplier` with the constructor. I'm trying to do that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] noblepaul commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
noblepaul commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054984613 ## solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java: ## @@ -139,30 +138,10 @@ public static String getCollectionPathRoot(String coll) { * only a replica is updated */ public DocCollection copyWith(PerReplicaStates newPerReplicaStates) { Review Comment: Wy would slices change when PRS change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr-operator] HoustonPutman opened a new pull request, #509: Fix non-recurring backups
HoustonPutman opened a new pull request, #509: URL: https://github.com/apache/solr-operator/pull/509 https://github.com/apache/solr-operator/pull/455 introduced a bug for non-recurring backups that was unearthed while working on https://github.com/apache/solr-operator/pull/507 Basically we need to only update the `NextScheduledTimestamp` if `recurrence` is enabled. I also restructured the logic to hopefully make it more clear when backup logic should be run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-15859) Add handler to dump filter cache
[ https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650996#comment-17650996 ] Shawn Heisey commented on SOLR-15859: - [~magibney] I'm not going to rule anything out that hasn't been carefully considered. I fully admit that I am playing in a sandbox that has more complexity than I am used to thinking about, so I doubt I can actually get this done without some collaboration. Anything that reduces or eliminates the amount of synchronization that I have to worry about, especially if it actually makes the code simpler, is very welcome. I never feel confident about code for a threaded environment where I don't put some thought into thread safety issues, so I think I have a tendency to overthink it. I don't really mind if the cache dumper knows at least a little bit about the internals it is dealing with, but the more that can be abstracted, the better. I'm hoping to get to a point where it only knows about SolrCacheBase and doesn't care about CaffeineCache. But obviously some work will be required in the cache implementation to make that abstraction available. I expect where the dumper will be most connected to other Solr/Lucene internals is knowing how to dump each specific cache -- filterCache is very different than queryResultCache. What I envision with the dumper is initially making it an experimental feature. I think it might be useful to have a section of the ref guide dedicated to experimental features, where the API and internals of the feature may radically change from release to release if a better approach is found. Maybe treat it a little bit like the ASF does when incubating new projects. > Add handler to dump filter cache > > > Key: SOLR-15859 > URL: https://issues.apache.org/jira/browse/SOLR-15859 > Project: Solr > Issue Type: Improvement >Reporter: Andy Lester >Assignee: Shawn Heisey >Priority: Major > Labels: FQ, cache, filtercache, metrics > Attachments: cacheinfo-1.patch, cacheinfo-2.patch, cacheinfo.patch, > fix_92_startup.patch > > > It would be very helpful to be able to inspect the contents of the > filterCache. > I'd like to be able to query something like > {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}} > nentries would be allowed to be -1 to get everything. > It would be nice to see these data items for each entry. I don't know which > are available, but I'm thinking blue sky here: > * cache key, exactly as stored > * Timestamp when the entry was inserted > * Whether the insertion of the entry evicted another entry, and if so which > one > * Timestamp of when this entry was last hit > * Number of hits on this entry forever > * Number of hits on this entry over some time period > * Number of documents matched by the filter > * Number of bytes of memory used by the filter > These are the sorts of questions I'd like to be able answer: > * "I just did a query that I expect will have added a cache entry. Did it?" > * "Are my queries hitting existing cache entries?" > * "How big should I set my filterCache size? Should I limit it by number of > entries or RAM usage?" > * "Which of my FQs are getting used the most? These are the ones I want in > my firstSearcher queries." (I currently determine this by processing my old > solr logs) > * "Which filters give me the most bang for the buck in terms of RAM usage?" > * "I have filter X and filter Y, but would it be beneficial if I made a > filter X AND Y?" > * "Which FQs are used more at certain times of the day? (Assuming I take > regular snapshots throughout the day)" > I imagine a response might look like: > {{{}} > {{ "responseHeader": {}} > {{ "status": 0,}} > {{ "QTime": 961}} > {{ },}} > {{ "response": {}} > {{ "numFound": 12104,}} > {{ "filterCacheKeys": {}} > {{ [}} > {{ "language:eng": {}} > {{ "inserted": "2021-12-04T07:34:16Z",}} > {{ "lastHit": "2021-12-04T18:17:43Z",}} > {{ "numHits": 15065,}} > {{ "numHitsInPastHour": 2319,}} > {{ "evictedKey": "agelevel:4 shippable:Y",}} > {{ "numRecordsMatchedByFilter": 24328753,}} > {{ "bytesUsed": 3041094}} > {{ }}} > {{ ],}} > {{ [}} > {{ "is_set:N": {}} > {{ ...}} > {{ }}} > {{ ],}} > {{ [}} > {{ "language:spa": {}} > {{ ...}} > {{ }}} > {{ ]}} > {{ }}} > {{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-15859) Add handler to dump filter cache
[ https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650990#comment-17650990 ] Michael Gibney commented on SOLR-15859: --- I think it might be possible (and preferable?) to implement this as a custom {{SolrCache}} implementation that wraps {{solr.CaffeineCache>}}. I think [~ben.manes] was alluding to something like this "MetadataWrapper" approach in his [comment above|#comment-17633401]. I've actually done something similar, and it can work quite well. It can be a bit tricky, but I think the "per-entry stats" part would be pretty straightforward done this way, and I really like the idea of implementing this functionality without modifying the hot path of what's currently the default/only cache implementation bundled with Solr. I think the only necessary modification to the existing {{solr.CaffeineCache}} class would be to provide a hook to actually dump the values, e.g., add them to a provided map, or something (so as not to actually expose the internals)? I do think the functionality you're pursuing with this could be useful. One benefit of implementing as I'm suggesting above, I think this functionality would be almost entirely pluggable (as in, plugins) -- aside from some interface for actually dumping a snapshot of the contents of the cache, which I suspect would indeed need a public method added to {{solr.CaffeineCache}}. I would definitely recommend avoiding top-level {{synchronized (cache)}} -- and I don't think that would be necessary if pursuing the "wrapping" approach. Maybe a more tightly-scoped change that ignores for now the request handler and stats tracking, and instead focuses on figuring out a clean (if perhaps experimental?) method/interface for dumping the contents of {{solr.CaffeineCache}}? I suspect that would be easier to merge with confidence, and would open the door to iterate on different ways of achieving some of the more nuanced functionality. > Add handler to dump filter cache > > > Key: SOLR-15859 > URL: https://issues.apache.org/jira/browse/SOLR-15859 > Project: Solr > Issue Type: Improvement >Reporter: Andy Lester >Assignee: Shawn Heisey >Priority: Major > Labels: FQ, cache, filtercache, metrics > Attachments: cacheinfo-1.patch, cacheinfo-2.patch, cacheinfo.patch, > fix_92_startup.patch > > > It would be very helpful to be able to inspect the contents of the > filterCache. > I'd like to be able to query something like > {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}} > nentries would be allowed to be -1 to get everything. > It would be nice to see these data items for each entry. I don't know which > are available, but I'm thinking blue sky here: > * cache key, exactly as stored > * Timestamp when the entry was inserted > * Whether the insertion of the entry evicted another entry, and if so which > one > * Timestamp of when this entry was last hit > * Number of hits on this entry forever > * Number of hits on this entry over some time period > * Number of documents matched by the filter > * Number of bytes of memory used by the filter > These are the sorts of questions I'd like to be able answer: > * "I just did a query that I expect will have added a cache entry. Did it?" > * "Are my queries hitting existing cache entries?" > * "How big should I set my filterCache size? Should I limit it by number of > entries or RAM usage?" > * "Which of my FQs are getting used the most? These are the ones I want in > my firstSearcher queries." (I currently determine this by processing my old > solr logs) > * "Which filters give me the most bang for the buck in terms of RAM usage?" > * "I have filter X and filter Y, but would it be beneficial if I made a > filter X AND Y?" > * "Which FQs are used more at certain times of the day? (Assuming I take > regular snapshots throughout the day)" > I imagine a response might look like: > {{{}} > {{ "responseHeader": {}} > {{ "status": 0,}} > {{ "QTime": 961}} > {{ },}} > {{ "response": {}} > {{ "numFound": 12104,}} > {{ "filterCacheKeys": {}} > {{ [}} > {{ "language:eng": {}} > {{ "inserted": "2021-12-04T07:34:16Z",}} > {{ "lastHit": "2021-12-04T18:17:43Z",}} > {{ "numHits": 15065,}} > {{ "numHitsInPastHour": 2319,}} > {{ "evictedKey": "agelevel:4 shippable:Y",}} > {{ "numRecordsMatchedByFilter": 24328753,}} > {{ "bytesUsed": 3041094}} > {{ }}} > {{ ],}} > {{ [}} > {{ "is_set:N": {}} > {{ ...}} > {{ }}} > {{ ],}} > {{ [}} > {{ "language:spa": {}} > {{ ...}} > {{ }}} > {{ ]}} > {{ }}} > {{}}} -- This message was sent by Atlassian Jira (v8.20.10#820
[jira] [Commented] (SOLR-15859) Add handler to dump filter cache
[ https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650984#comment-17650984 ] Shawn Heisey commented on SOLR-15859: - [~ben.manes] Thank you for all the patience you have shown. I don't know if you remember, but you once helped me with a non-Solr question I had, where you pointed me at the Striped class in Guava ... which is exactly what I needed and extremely cool. https://github.com/google/guava/wiki/StripedExplained Do you think you could help me with the most efficient way to implement the extra stats in CaffeineCache that I need to make this cache dumper give really useful info, so I can be sure it uses the least memory possible and is completely bulletproof? I'm willing to put in the work writing it, I just need a little bit of a nudge to find the right way to go about it. > Add handler to dump filter cache > > > Key: SOLR-15859 > URL: https://issues.apache.org/jira/browse/SOLR-15859 > Project: Solr > Issue Type: Improvement >Reporter: Andy Lester >Assignee: Shawn Heisey >Priority: Major > Labels: FQ, cache, filtercache, metrics > Attachments: cacheinfo-1.patch, cacheinfo-2.patch, cacheinfo.patch, > fix_92_startup.patch > > > It would be very helpful to be able to inspect the contents of the > filterCache. > I'd like to be able to query something like > {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}} > nentries would be allowed to be -1 to get everything. > It would be nice to see these data items for each entry. I don't know which > are available, but I'm thinking blue sky here: > * cache key, exactly as stored > * Timestamp when the entry was inserted > * Whether the insertion of the entry evicted another entry, and if so which > one > * Timestamp of when this entry was last hit > * Number of hits on this entry forever > * Number of hits on this entry over some time period > * Number of documents matched by the filter > * Number of bytes of memory used by the filter > These are the sorts of questions I'd like to be able answer: > * "I just did a query that I expect will have added a cache entry. Did it?" > * "Are my queries hitting existing cache entries?" > * "How big should I set my filterCache size? Should I limit it by number of > entries or RAM usage?" > * "Which of my FQs are getting used the most? These are the ones I want in > my firstSearcher queries." (I currently determine this by processing my old > solr logs) > * "Which filters give me the most bang for the buck in terms of RAM usage?" > * "I have filter X and filter Y, but would it be beneficial if I made a > filter X AND Y?" > * "Which FQs are used more at certain times of the day? (Assuming I take > regular snapshots throughout the day)" > I imagine a response might look like: > {{{}} > {{ "responseHeader": {}} > {{ "status": 0,}} > {{ "QTime": 961}} > {{ },}} > {{ "response": {}} > {{ "numFound": 12104,}} > {{ "filterCacheKeys": {}} > {{ [}} > {{ "language:eng": {}} > {{ "inserted": "2021-12-04T07:34:16Z",}} > {{ "lastHit": "2021-12-04T18:17:43Z",}} > {{ "numHits": 15065,}} > {{ "numHitsInPastHour": 2319,}} > {{ "evictedKey": "agelevel:4 shippable:Y",}} > {{ "numRecordsMatchedByFilter": 24328753,}} > {{ "bytesUsed": 3041094}} > {{ }}} > {{ ],}} > {{ [}} > {{ "is_set:N": {}} > {{ ...}} > {{ }}} > {{ ],}} > {{ [}} > {{ "language:spa": {}} > {{ ...}} > {{ }}} > {{ ]}} > {{ }}} > {{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
alessandrobenedetti commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1362008815 here a rough pull request just to give you the idea @dsmiley : https://github.com/apache/solr/pull/1246/files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti opened a new pull request, #1246: Jira/solr 16567 tentative
alessandrobenedetti opened a new pull request, #1246: URL: https://github.com/apache/solr/pull/1246 just for @dsmiley to have the idea of what I meant -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
alessandrobenedetti commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361889301 Ok, I'll produce another branch with the example code assuming Lucene changes are there. So what I am trying to accomplish: 1) Knn Query has a Query filter as a constructor parameter (and instance variable). This filter is meant to be a pre-filter(in Approximate Nearest Neighbour search means a filter that is executed before the Top K nearest neighbors are returned). It is used internally in the approximate nearest neighbor search Lucene code to only accept certain neighbors from the graph(along with the alive bitSet) 2) In Apache Solr we need to make sure that all filter queries except explicit post-filters are processed and set in the Knn Query. 3) But parsing happens before the Searcher will process the filters. So if are able to modify the Lucene KnnQuery in org.apache.solr.search.QueryUtils#combineQueryAndFilter (or create a new one), we are done. Right now we process the filters at parsing time (ad do it again in the Searcher). Potentially we could process and remove the filters from the request in the query parser, but it seems nasty to me, hence my idea of modifying the combineQueryAndFilter. Because that method has the responsibility of building a new Query, combining the main Query and all filters(except post-filters). So it seems the perfect place for implementing the custom logic for the KnnQuery, that behaves differently when you combine it with filters. Hope this helps with context @dsmiley ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] dsmiley commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
dsmiley commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361863063 A special case nearly anywhere (except directly in KNN oriented code of course) is a design/maintenance issue. Some special cases like MatchAllDocs are understandable but a check for KNN in QueryUtils... eh... :-/Maybe you could show in a new PR what this would look like so I could see. Perhaps when I understand better what you are trying to accomplish, I'll see a better solution. > (me:) Are you trying to basically move certain FQs out of their top level position and into/embedded in a particular parsed query? Could you respond to that please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
alessandrobenedetti commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361848344 And yes, in the workaround I can use the getProcessedFilters and I will, but once we have the Lucene side it will go away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
alessandrobenedetti commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361847032 Hi @dsmiley , once we have the Lucene changes, the idea is to change this method: org.apache.solr.search.QueryUtils#combineQueryAndFilter with an additional if clause: `else { return new BooleanQuery.Builder() .add(scoreQuery, Occur.MUST) .add(filterQuery, Occur.FILTER) .build(); }` will become: `else if{scoreQuery instanceof KnnVectorQuery}{ return new KnnVectorQuery(scoreQuery,filterQuery);} else{ return new BooleanQuery.Builder() .add(scoreQuery, Occur.MUST) .add(filterQuery, Occur.FILTER) .build(); }` Just to give an idea, the final code will look different as we will have to create a new instance of KnnVectorQuery using the getters of the old one. With this change, we will be able to simplify the KnnQueryParser removing all the stuff for pre-filters and post-filters. GetProcessedFilters will be called just once as usual and we'll get the benefit of caching and post-0filter separation automatically. SolrIndexSearcher won't be touched at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16567) java.lang.StackOverflowError when combining KnnQParser and FunctionRangeQParser
[ https://issues.apache.org/jira/browse/SOLR-16567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650966#comment-17650966 ] Alessandro Benedetti commented on SOLR-16567: - Just a mistake from intellj! Removed the branch from the upstream repo! > java.lang.StackOverflowError when combining KnnQParser and > FunctionRangeQParser > --- > > Key: SOLR-16567 > URL: https://issues.apache.org/jira/browse/SOLR-16567 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 > Environment: Solr Cloud with `solr:9.1` Docker image >Reporter: Gabriel Magno >Priority: Major > Attachments: create_example-solr_9_0.sh, create_example-solr_9_1.sh, > error_full.txt, response-error.json, run_query.sh > > Time Spent: 1.5h > Remaining Estimate: 0h > > Hello there! > I had a Solr 9.0 cluster running, using the new Dense Vector feature. > Recently I have migrated to Solr 9.1. Most of the things are working fine, > except for a special case I have here. > *Error Description* > The problem happens when I try making an Edismax query with a KNN sub-query > and a Function Range filter. For example, I try making this query. > * defType=edismax > * df=name > * q=the > * similarity_vector=\{!knn f=vector topK=10}[1.1,2.2,3.3,4.4] > * {!frange l=0.99}$similarity_vector > In other words, I want all the documents matching the term "the" in the > "name" field, and I filter to return only documents having a vector > similarity of at least 0.99. This query was working fine on Solr 9.0, but on > Solr 9.1, I get his error: > > {code:java} > java.lang.RuntimeException: java.lang.StackOverflowErrorat > org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:840)at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:641)at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:250) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda/usr/bin/zsh(SolrDispatchFilter.java:218) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:213) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) >... (manually supressed for brevity)at > java.base/java.lang.Thread.run(Unknown Source)Caused by: > java.lang.StackOverflowErrorat > org.apache.solr.search.StrParser.getId(StrParser.java:172)at > org.apache.solr.search.StrParser.getId(StrParser.java:168)at > org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:100) > at > org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:65) > at org.apache.solr.search.QParser.getParser(QParser.java:364)at > org.apache.solr.search.QParser.getParser(QParser.java:334)at > org.apache.solr.search.QParser.getParser(QParser.java:321)at > org.apache.solr.search.QueryUtils.parseFilterQueries(QueryUtils.java:244) > at > org.apache.solr.search.neural.KnnQParser.getFilterQuery(KnnQParser.java:93) > at org.apache.solr.search.neural.KnnQParser.parse(KnnQParser.java:83)at > org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384) > at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionRangeQParserPlugin.parse(FunctionRangeQParserPlugin.java:53) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.QueryUtils.parseFilterQueries(QueryUtils.java:246) > at > org.apache.solr.search.neural.KnnQParser.getFilterQuery(KnnQParser.java:93) > at org.apache.solr.search.neural.KnnQParser.parse(KnnQParser.java:83)at > org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384) > at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionRangeQParserPlugin.parse(FunctionRangeQParserPlugin.java:53) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)... > (manually supressed for brevity){code} > > The backtrace is much bigger, I'm attaching the raw Solr response in JS
[jira] [Commented] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650967#comment-17650967 ] Rudi Seitz commented on SOLR-16594: --- This is a rough outline of the code changes that might be needed to implement the proposal in this ticket: # Create a subclass of org.apache.lucene.index.Term that is capable of holding a startOffset. Possibly name it TermWithOffset # Update or subclass org.apache.lucene.util.QueryBuilder so that so that createFieldQuery() returns a Query that contains one or more TermWithOffset instead of simple Terms, where appropriate. This is the place where we iterate through the token stream and have access to the offsets to potentially store them on the generated Terms. # Update org.apache.solr.search.ExtendedDismaxQParser so that getAliasedMultiTermQuery() builds clauses based on startOffset instead of the current approach of calling allSameQueryStructure() and then doing "{color:#808080}Make a dismax query for each clause position in the boolean per-field queries"{color} > eDismax should use startOffset when converting per-field to per-term queries > > > Key: SOLR-16594 > URL: https://issues.apache.org/jira/browse/SOLR-16594 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Rudi Seitz >Priority: Major > > When parsing a multi-term query that spans multiple fields, edismax sometimes > switches from a "term-centric" to a "field-centric" approach. This creates > inconsistent semantics for the {{mm}} or "min should match" parameter and may > have an impact on scoring. The goal of this ticket is to improve the approach > that edismax uses for generating term-centric queries so that edismax would > less frequently "give up" and resort to the field-centric approach. > Specifically, we propose that edismax should create a dismax query for each > distinct startOffset found among the tokens emitted by the field analyzers. > Since the relevant code in edismax works with Query objects that contain > Terms, and since Terms do not hold the startOffset of the Token from which > Term was derived, some plumbing work would need to be done to make the > startOffsets available to edismax. > > BACKGROUND: > > If a user searches for "foo bar" with {{{}qf=f1 f2{}}}, a field-centric > interpretation of the query would contain a clause for each field: > {{ (f1:foo f1:bar) (f2:foo f2:bar)}} > while a term-centric interpretation would contain a clause for each term: > {{ (f1:foo f2:foo) (f1:bar f2:bar)}} > The challenge in generating a term-centric query is that we need to take the > tokens that emerge from each field's analysis chain and group them according > to the terms in the user's original query. However, the tokens that emerge > from an analysis chain do not store a reference to their corresponding input > terms. For example, if we pass "foo bar" through an ngram analyzer we would > get a token stream containing "f", "fo", "foo", "b", "ba", "bar". While it > may be obvious to a human that "f", "fo", and "foo" all come from the "foo" > input term, and that "b", "ba", and "bar" come from the "bar" input term, > there is not always an easy way for edismax to see this connection. When > {{{}sow=true{}}}, edismax passes each whitespace-separated term through each > analysis chain separately, and therefore edismax "knows" that the output > tokens from any given analysis chain are all derived from the single input > term that was passed into that chain. However, when {{{}sow=false{}}}, > edismax passes the entire multi-term query through each analysis chain as a > whole, resulting in multiple output tokens that are not "connected" to their > source term. > Edismax still tries to generate a term-centric query when {{sow=false}} by > first generating a boolean query for each field, and then checking whether > all of these per-field queries have the same structure. The structure will > generally be uniform if each analysis chain emits the same number of tokens > for the given input. If one chain has a synonym filter and another doesn’t, > this uniformity may depend on whether a synonym rule happened to match a term > in the user's input. > Assuming the per-field boolean queries _do_ have the same structure, edismax > reorganizes them into a new boolean query. The new query contains a dismax > for each clause position in the original queries. If the original queries are > {{(f1:foo f1:bar)}} and {{(f2:foo f2:bar)}} we can see they have two clauses > each, so we would get a dismax containing all the first position clauses > {{(f1:foo f1:bar)}} and another dismax containing all the
[GitHub] [solr] alessandrobenedetti closed pull request #129: SOLR-15407 untokenized field type with sow=false fix + tests
alessandrobenedetti closed pull request #129: SOLR-15407 untokenized field type with sow=false fix + tests URL: https://github.com/apache/solr/pull/129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16567) java.lang.StackOverflowError when combining KnnQParser and FunctionRangeQParser
[ https://issues.apache.org/jira/browse/SOLR-16567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650961#comment-17650961 ] David Smiley commented on SOLR-16567: - The above comment by ASF's git integration shows a commit was done to a branch {{jira/SOLR-16567}}. [~abenedetti] feature/bug branches on our main repo is generally not needed unless there's going to be broad collaboration. Even repo committers (like me) can contribute to a PR branch you keep on your fork (a GitHub feature). Extraneous branches pollute the view and ultimately need grooming. I see you already have [a fork with a branch there for this PR|https://github.com/SeaseLtd/solr/tree/jira/SOLR-16567] so I'm confused why this {{jira/SOLR-16567}} branch is here as well (redundant). > java.lang.StackOverflowError when combining KnnQParser and > FunctionRangeQParser > --- > > Key: SOLR-16567 > URL: https://issues.apache.org/jira/browse/SOLR-16567 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 > Environment: Solr Cloud with `solr:9.1` Docker image >Reporter: Gabriel Magno >Priority: Major > Attachments: create_example-solr_9_0.sh, create_example-solr_9_1.sh, > error_full.txt, response-error.json, run_query.sh > > Time Spent: 1.5h > Remaining Estimate: 0h > > Hello there! > I had a Solr 9.0 cluster running, using the new Dense Vector feature. > Recently I have migrated to Solr 9.1. Most of the things are working fine, > except for a special case I have here. > *Error Description* > The problem happens when I try making an Edismax query with a KNN sub-query > and a Function Range filter. For example, I try making this query. > * defType=edismax > * df=name > * q=the > * similarity_vector=\{!knn f=vector topK=10}[1.1,2.2,3.3,4.4] > * {!frange l=0.99}$similarity_vector > In other words, I want all the documents matching the term "the" in the > "name" field, and I filter to return only documents having a vector > similarity of at least 0.99. This query was working fine on Solr 9.0, but on > Solr 9.1, I get his error: > > {code:java} > java.lang.RuntimeException: java.lang.StackOverflowErrorat > org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:840)at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:641)at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:250) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda/usr/bin/zsh(SolrDispatchFilter.java:218) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:213) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) >... (manually supressed for brevity)at > java.base/java.lang.Thread.run(Unknown Source)Caused by: > java.lang.StackOverflowErrorat > org.apache.solr.search.StrParser.getId(StrParser.java:172)at > org.apache.solr.search.StrParser.getId(StrParser.java:168)at > org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:100) > at > org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:65) > at org.apache.solr.search.QParser.getParser(QParser.java:364)at > org.apache.solr.search.QParser.getParser(QParser.java:334)at > org.apache.solr.search.QParser.getParser(QParser.java:321)at > org.apache.solr.search.QueryUtils.parseFilterQueries(QueryUtils.java:244) > at > org.apache.solr.search.neural.KnnQParser.getFilterQuery(KnnQParser.java:93) > at org.apache.solr.search.neural.KnnQParser.parse(KnnQParser.java:83)at > org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384) > at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionRangeQParserPlugin.parse(FunctionRangeQParserPlugin.java:53) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.QueryUtils.parseFilterQueries(QueryUtils.java:246) > at > org.apache.solr.search.neural.KnnQParser.getFilterQuery(KnnQParser.java:93) > at org.apache.solr.search.neural.KnnQParser.parse(KnnQParser.java:83)at > org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apac
[GitHub] [solr] dsmiley commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
dsmiley commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361766495 Thanks for your complements. Can't you simply use SolrIndexSearcher#getProcessedFilter now? As to your proposal, I am confused as to exactly where you propose inserting the logic you provided a snippet of. If you propose SolrIndexSearcher somewhere then I don't like it because it's clearly special casing a specific query which is a design problem. Are you trying to basically *move* certain FQs out of their top level position and into/embedded in a particular parsed query? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054663494 ## solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java: ## @@ -261,9 +259,10 @@ private static DocCollection collectionFromObjects( if (log.isDebugEnabled()) { log.debug("a collection {} has per-replica state", name); } - // this collection has replica states stored outside - ReplicaStatesProvider rsp = REPLICASTATES_PROVIDER.get(); - if (rsp instanceof StatesProvider) ((StatesProvider) rsp).isPerReplicaState = true; +} else { + // prior to this call, PRS provider is set. We should unset it before + // deserializing the replicas and slices + DocCollection.clearReplicaStateProvider(); Review Comment: To my understand this is required as otherwise the Provider might interfere and overrides the input values here? I agree that using ThreadLocals could avoid modification of method signatures as you pointed out, but I also share similar concern as @hiteshk25 that it's a bit hard to track code flow with ThreadLocal as it requires "internal knowledge" of the code in order to know where things get added/modified. (since the method signature/contract no longer suggest the "full input", and we might start doing fetching at places that used to be only assigning fields locally etc) This invocation of `clearReplicateStateProvider` could be one of the places that could be hard for devs that are not familiar with the ThreadLocal to reason. I do understand the goal of this PR is NOT the removal of threadlocal usage 😊 , it would be nice though to consider other designs as a replacement of Threadlocal (in the future!). That could include bigger refactoring (subclassing ClusterState that includes PRS, or adding overloading method etc). For the moment, more comments like these to explain the rational could be very helpful (which this comment has already done a pretty good job, but perhaps it could also mention how the threadlocal PRS provider could override the values if not cleared ?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054663494 ## solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java: ## @@ -261,9 +259,10 @@ private static DocCollection collectionFromObjects( if (log.isDebugEnabled()) { log.debug("a collection {} has per-replica state", name); } - // this collection has replica states stored outside - ReplicaStatesProvider rsp = REPLICASTATES_PROVIDER.get(); - if (rsp instanceof StatesProvider) ((StatesProvider) rsp).isPerReplicaState = true; +} else { + // prior to this call, PRS provider is set. We should unset it before + // deserializing the replicas and slices + DocCollection.clearReplicaStateProvider(); Review Comment: To my understand this is required as otherwise the Provider might interfere and overrides the input values here? I agree that using ThreadLocals could avoid modification of method signatures as you pointed out, but I also share similar concern as @hiteshk25 that it's a bit hard to track code flow with ThreadLocal as it requires "internal knowledge" of the code in order to know where things get added/modified. (since the method signature/contract no longer suggest the "full input", and we might start doing fetching at places that used to be only assigning fields locally etc) This invocation of `clearReplicateStateProvider` could be one of the places that could be hard for devs that are not familiar with the ThreadLocal to understand. I do understand the goal of this PR is NOT the removal of threadlocal usage 😊 , it would be nice though to consider other designs as a replacement of Threadlocal (in the future!). That could include bigger refactoring (subclassing ClusterState that includes PRS, or adding overloading method etc). For the moment, more comments like these to explain the rational could be very helpful (which this comment has already done a pretty good job, but perhaps it could also mention how the threadlocal PRS provider could override the values if not cleared ?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on PR #1242: URL: https://github.com/apache/solr/pull/1242#issuecomment-1361754709 > @justinrsweeney @patsonluk this is still WIP , reviews are welcome Thanks for the work @noblepaul ! Just want to confirm that there are mainly 2 goals here: 1. Avoid making copies of DocCollections when `copyWith` is called (related to PRS updates) 2. Avoid fetching PRS states until the state is actually queried (the Lazy PRS provider part) Do we know the general overhead of the current designs and whether they are causing issues? I agree that both changes will for sure reduce resource usage! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054663494 ## solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java: ## @@ -261,9 +259,10 @@ private static DocCollection collectionFromObjects( if (log.isDebugEnabled()) { log.debug("a collection {} has per-replica state", name); } - // this collection has replica states stored outside - ReplicaStatesProvider rsp = REPLICASTATES_PROVIDER.get(); - if (rsp instanceof StatesProvider) ((StatesProvider) rsp).isPerReplicaState = true; +} else { + // prior to this call, PRS provider is set. We should unset it before + // deserializing the replicas and slices + DocCollection.clearReplicaStateProvider(); Review Comment: To my understand this is required as otherwise the Provider might interfere and overrides the input values here? I agree that using ThreadLocals could avoid modification of method signatures as you pointed out, but I also share similar concern as @hiteshk25 that it's a bit hard to track code flow with ThreadLocal as it requires "internal knowledge" of the code in order to know where things get added/modified. This invocation of `clearReplicateStateProvider` could be one of the places that could be hard for devs that are not familiar with the ThreadLocal to understand. I do understand the goal of this PR is NOT the removal of threadlocal usage 😊 , it would be nice though to consider other designs as a replacement of Threadlocal (in the future!). That could include bigger refactoring (subclassing ClusterState that includes PRS, or adding overloading method etc). For the moment, more comments like these to explain the rational could be very helpful (which this comment has already done a pretty good job, but perhaps it could also mention how the threadlocal PRS provider could override the values if not cleared ?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054663494 ## solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java: ## @@ -261,9 +259,10 @@ private static DocCollection collectionFromObjects( if (log.isDebugEnabled()) { log.debug("a collection {} has per-replica state", name); } - // this collection has replica states stored outside - ReplicaStatesProvider rsp = REPLICASTATES_PROVIDER.get(); - if (rsp instanceof StatesProvider) ((StatesProvider) rsp).isPerReplicaState = true; +} else { + // prior to this call, PRS provider is set. We should unset it before + // deserializing the replicas and slices + DocCollection.clearReplicaStateProvider(); Review Comment: To my understand this is required as otherwise the Provider might interfere and overrides the input values here? I agree that using ThreadLocals could avoid modification of method signatures as you pointed out, but I also share similar concern as @hiteshk25 that it's a bit hard to track code flow with ThreadLocal as it requires "internal knowledge" of the code in order to know where things get added/modified. This invocation of `clearReplicateStateProvider` could be one of the places that could be hard for dev that are not familiar with the ThreadLocal to understand. I do understand the goal of this PR is NOT the removal of threadlocal usage 😊 , it would be nice though to consider other designs as a replacement of Threadlocal. That probably would include bigger changes (subclassing ClusterState that includes PRS, or adding overloading method etc). For the moment, more comments to explain the rational would be very helpful (which this comment also does a pretty good job, but perhaps it could also mention how the threadlocal PRS provider could override the values if not cleared ?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] patsonluk commented on a diff in pull request #1242: SOLR-16580: Avoid making copies of DocCollection for PRS updates
patsonluk commented on code in PR #1242: URL: https://github.com/apache/solr/pull/1242#discussion_r1054654494 ## solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java: ## @@ -139,30 +138,10 @@ public static String getCollectionPathRoot(String coll) { * only a replica is updated */ public DocCollection copyWith(PerReplicaStates newPerReplicaStates) { Review Comment: I assume we will need to modify `getSlices()` so it would return the correct "view" of slices from replica states too? Anyway, this is probably still WIP ! 😊 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
alessandrobenedetti commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361661204 Ok, I updated the PR, this is what I have done: 1) workaround to fix the bug and renames suggested by @dsmiley 2) opened a pull request in Lucene to implement getters in the KnnVectorQuery: https://github.com/apache/lucene/pull/12029/files Unless any additional good ideas, I would go with this now. Then as soon as the Lucene side is sorted out and in Solr, I would implement the optimal approach, removing all the redundant code and just managing the filters in the org.apache.solr.search.QueryUtils#combineQueryAndFilter ( it will be so easy and clean, it's a shame I can't do it immediately) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16567) java.lang.StackOverflowError when combining KnnQParser and FunctionRangeQParser
[ https://issues.apache.org/jira/browse/SOLR-16567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650929#comment-17650929 ] ASF subversion and git services commented on SOLR-16567: Commit e683af1c7bdece1d7100852323c0a32b12bfd566 in solr's branch refs/heads/jira/SOLR-16567 from Elia Porciani [ https://gitbox.apache.org/repos/asf?p=solr.git;h=e683af1c7bd ] SOLR-16567: KnnQueryParser support for both pre-filters and post-filters(cost>0) > java.lang.StackOverflowError when combining KnnQParser and > FunctionRangeQParser > --- > > Key: SOLR-16567 > URL: https://issues.apache.org/jira/browse/SOLR-16567 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 > Environment: Solr Cloud with `solr:9.1` Docker image >Reporter: Gabriel Magno >Priority: Major > Attachments: create_example-solr_9_0.sh, create_example-solr_9_1.sh, > error_full.txt, response-error.json, run_query.sh > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Hello there! > I had a Solr 9.0 cluster running, using the new Dense Vector feature. > Recently I have migrated to Solr 9.1. Most of the things are working fine, > except for a special case I have here. > *Error Description* > The problem happens when I try making an Edismax query with a KNN sub-query > and a Function Range filter. For example, I try making this query. > * defType=edismax > * df=name > * q=the > * similarity_vector=\{!knn f=vector topK=10}[1.1,2.2,3.3,4.4] > * {!frange l=0.99}$similarity_vector > In other words, I want all the documents matching the term "the" in the > "name" field, and I filter to return only documents having a vector > similarity of at least 0.99. This query was working fine on Solr 9.0, but on > Solr 9.1, I get his error: > > {code:java} > java.lang.RuntimeException: java.lang.StackOverflowErrorat > org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:840)at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:641)at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:250) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda/usr/bin/zsh(SolrDispatchFilter.java:218) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:213) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) >... (manually supressed for brevity)at > java.base/java.lang.Thread.run(Unknown Source)Caused by: > java.lang.StackOverflowErrorat > org.apache.solr.search.StrParser.getId(StrParser.java:172)at > org.apache.solr.search.StrParser.getId(StrParser.java:168)at > org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:100) > at > org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:65) > at org.apache.solr.search.QParser.getParser(QParser.java:364)at > org.apache.solr.search.QParser.getParser(QParser.java:334)at > org.apache.solr.search.QParser.getParser(QParser.java:321)at > org.apache.solr.search.QueryUtils.parseFilterQueries(QueryUtils.java:244) > at > org.apache.solr.search.neural.KnnQParser.getFilterQuery(KnnQParser.java:93) > at org.apache.solr.search.neural.KnnQParser.parse(KnnQParser.java:83)at > org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384) > at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionRangeQParserPlugin.parse(FunctionRangeQParserPlugin.java:53) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.QueryUtils.parseFilterQueries(QueryUtils.java:246) > at > org.apache.solr.search.neural.KnnQParser.getFilterQuery(KnnQParser.java:93) > at org.apache.solr.search.neural.KnnQParser.parse(KnnQParser.java:83)at > org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384) > at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94) > at org.apache.solr.search.QParser.getQuery(QParser.java:188)at > org.apache.solr.search.FunctionRangeQParserPlugin.parse(FunctionRangeQParse
[jira] [Updated] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rudi Seitz updated SOLR-16594: -- Description: When parsing a multi-term query that spans multiple fields, edismax sometimes switches from a "term-centric" to a "field-centric" approach. This creates inconsistent semantics for the {{mm}} or "min should match" parameter and may have an impact on scoring. The goal of this ticket is to improve the approach that edismax uses for generating term-centric queries so that edismax would less frequently "give up" and resort to the field-centric approach. Specifically, we propose that edismax should create a dismax query for each distinct startOffset found among the tokens emitted by the field analyzers. Since the relevant code in edismax works with Query objects that contain Terms, and since Terms do not hold the startOffset of the Token from which Term was derived, some plumbing work would need to be done to make the startOffsets available to edismax. BACKGROUND: If a user searches for "foo bar" with {{{}qf=f1 f2{}}}, a field-centric interpretation of the query would contain a clause for each field: {{ (f1:foo f1:bar) (f2:foo f2:bar)}} while a term-centric interpretation would contain a clause for each term: {{ (f1:foo f2:foo) (f1:bar f2:bar)}} The challenge in generating a term-centric query is that we need to take the tokens that emerge from each field's analysis chain and group them according to the terms in the user's original query. However, the tokens that emerge from an analysis chain do not store a reference to their corresponding input terms. For example, if we pass "foo bar" through an ngram analyzer we would get a token stream containing "f", "fo", "foo", "b", "ba", "bar". While it may be obvious to a human that "f", "fo", and "foo" all come from the "foo" input term, and that "b", "ba", and "bar" come from the "bar" input term, there is not always an easy way for edismax to see this connection. When {{{}sow=true{}}}, edismax passes each whitespace-separated term through each analysis chain separately, and therefore edismax "knows" that the output tokens from any given analysis chain are all derived from the single input term that was passed into that chain. However, when {{{}sow=false{}}}, edismax passes the entire multi-term query through each analysis chain as a whole, resulting in multiple output tokens that are not "connected" to their source term. Edismax still tries to generate a term-centric query when {{sow=false}} by first generating a boolean query for each field, and then checking whether all of these per-field queries have the same structure. The structure will generally be uniform if each analysis chain emits the same number of tokens for the given input. If one chain has a synonym filter and another doesn’t, this uniformity may depend on whether a synonym rule happened to match a term in the user's input. Assuming the per-field boolean queries _do_ have the same structure, edismax reorganizes them into a new boolean query. The new query contains a dismax for each clause position in the original queries. If the original queries are {{(f1:foo f1:bar)}} and {{(f2:foo f2:bar)}} we can see they have two clauses each, so we would get a dismax containing all the first position clauses {{(f1:foo f1:bar)}} and another dismax containing all the second position clauses {{{}(f2:foo f2:bar){}}}. We can see that edismax is using clause position as a heuristic to reorganize the per-field boolean queries into per-term ones, even though it doesn't know for sure which clauses inside those per-field boolean queries are related to which input terms. We propose that a better way of reorganizing the per-field boolean queries is to create a dismax for each distinct startOffset seen among the tokens in the token streams emitted by each field analyzer. The startOffset of a token (rather, a PackedTokenAttributeImpl) is "the position of the first character corresponding to this token in the source text". We propose that startOffset is a resonable way of matching output tokens up with the input terms that gave rise to them. For example, if we pass "foo bar" through an ngram analysis chain we see that the foo-related tokens all have startOffset=0 while the bar-related tokens all have startOffset=4. Likewise, tokens that are generated via synonym expansion have a startOffset that points to the beginning of the matching input term. For example, if the query "GB" generates "GB gib gigabyte gigabytes" via synonym expansion, all of those four tokens would have startOffset=0. Here's an example of how the proposed edismax logic would work. Let's say a user searches for "foo bar" across two fields, f1 and f2, where f1 uses a standard text analysis chain while f2 generates ngrams. We would get field-centric queries {{(f1:foo f1:bar)}} and ({{{}f
[jira] [Updated] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rudi Seitz updated SOLR-16594: -- Description: When parsing a multi-term query that spans multiple fields, edismax sometimes switches from a "term-centric" to a "field-centric" approach. This creates inconsistent semantics for the {{mm}} or "min should match" parameter and may have an impact on scoring. The goal of this ticket is to improve the approach that edismax uses for generating term-centric queries so that edismax would less frequently "give up" and resort to the field-centric approach. Specifically, we propose that edismax should create a dismax query for each distinct startOffset found among the tokens emitted by the field analyzers. Since the relevant code in edismax works with Query objects that contain Terms, and since Terms do not hold the startOffset of the Token from which Term was derived, some plumbing work would need to be done to make the startOffsets available to edismax. BACKGROUND: If a user searches for "foo bar" with {{{}qf=f1 f2{}}}, a field-centric interpretation of the query would contain a clause for each field: {{ (f1:foo f1:bar) (f2:foo f2:bar)}} while a term-centric interpretation would contain a clause for each term: {{ (f1:foo f2:foo) (f1:bar f2:bar)}} The challenge in generating a term-centric query is that we need to take the tokens that emerge from each field's analysis chain and group them according to the terms in the user's original query. However, the tokens that emerge from an analysis chain do not store a reference to their corresponding input terms. For example, if we pass "foo bar" through an ngram analyzer we would get a token stream containing "f", "fo", "foo", "b", "ba", "bar". While it may be obvious to a human that "f", "fo", and "foo" all come from the "foo" input term, and that "b", "ba", and "bar" come from the "bar" input term, there is not always an easy way for edismax to see this connection. When {{{}sow=true{}}}, edismax passes each whitespace-separated term through each analysis chain separately, and therefore edismax "knows" that the output tokens from any given analysis chain are all derived from the single input term that was passed into that chain. However, when {{{}sow=false{}}}, edismax passes the entire multi-term query through each analysis chain as a whole, resulting in multiple output tokens that are not "connected" to their source term. Edismax still tries to generate a term-centric query when {{sow=false}} by first generating a boolean query for each field, and then checking whether all of these per-field queries have the same structure. The structure will generally be uniform if each analysis chain emits the same number of tokens for the given input. If one chain has a synonym filter and another doesn’t, this uniformity may depend on whether a synonym rule happened to match a term in the user's input. Assuming the per-field boolean queries _do_ have the same structure, edismax reorganizes them into a new boolean query. The new query contains a dismax for each clause position in the original queries. If the original queries are {{(f1:foo f1:bar) }}and{{ (f2:foo f2:bar)}} we can see they have two clauses each, so we would get a dismax containing all the first position clauses {{(f1:foo f1:bar)}} and another dismax containing all the second position clauses {{{}(f2:foo f2:bar){}}}. We can see that edismax is using clause position as a heuristic to reorganize the per-field boolean queries into per-term ones, even though it doesn't know for sure which clauses inside those per-field boolean queries are related to which input terms. We propose that a better way of reorganizing the per-field boolean queries is to create a dismax for each distinct startOffset seen among the tokens in the token streams emitted by each field analyzer. The startOffset of a token (rather, a PackedTokenAttributeImpl) is "the position of the first character corresponding to this token in the source text". We propose that startOffset is a resonable way of matching output tokens up with the input terms that gave rise to them. For example, if we pass "foo bar" through an ngram analysis chain we see that the foo-related tokens all have startOffset=0 while the bar-related tokens all have startOffset=4. Likewise, tokens that are generated via synonym expansion have a startOffset that points to the beginning of the matching input term. For example, if the query "GB" generates "GB gib gigabyte gigabytes" via synonym expansion, all of those four tokens would have startOffset=0. Here's an example of how the proposed edismax logic would work. Let's say a user searches for "foo bar" across two fields, f1 and f2, where f1 uses a standard text analysis chain while f2 generates ngrams. We would get field-centric queries {{(f1:foo f1:bar)}} and ({{{}f
[jira] [Resolved] (SOLR-16585) All docs query with any nonzero positive start value throws NPE with "this.docs is null"
[ https://issues.apache.org/jira/browse/SOLR-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney resolved SOLR-16585. --- Fix Version/s: main (10.0) 9.2 Resolution: Fixed > All docs query with any nonzero positive start value throws NPE with > "this.docs is null" > > > Key: SOLR-16585 > URL: https://issues.apache.org/jira/browse/SOLR-16585 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 >Reporter: Shawn Heisey >Assignee: Michael Gibney >Priority: Major > Fix For: main (10.0), 9.2, 9.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > An all docs query that has a nonzero positive value in the start parameter > will throw an NPE. Below is a slightly redacted query sent by the admin UI > and the exception. This is from 9.2.0-SNAPSHOT installed as a service on > Ubuntu, a user reported the problem on solr-user with the 9.1.0 docker image. > {code:none} > http://server:port/solr/corename/select?indent=true&q.op=OR&q=*%3A*&rows=10&start=1&useParams={code} > {code:none} > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.search.DocList.iterator()" because "this.docs" is null at > org.apache.solr.response.DocsStreamer.(DocsStreamer.java:74) at > org.apache.solr.response.ResultContext.getProcessedDocuments(ResultContext.java:55) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:196) > at org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:47) at > org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:117) at > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:30) > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:71) > at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:980) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:585) at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:251) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:219) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) > at > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:301) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822) > at > org.eclip
[jira] [Updated] (SOLR-16585) All docs query with any nonzero positive start value throws NPE with "this.docs is null"
[ https://issues.apache.org/jira/browse/SOLR-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated SOLR-16585: -- Fix Version/s: 9.1.1 > All docs query with any nonzero positive start value throws NPE with > "this.docs is null" > > > Key: SOLR-16585 > URL: https://issues.apache.org/jira/browse/SOLR-16585 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 >Reporter: Shawn Heisey >Assignee: Michael Gibney >Priority: Major > Fix For: 9.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > An all docs query that has a nonzero positive value in the start parameter > will throw an NPE. Below is a slightly redacted query sent by the admin UI > and the exception. This is from 9.2.0-SNAPSHOT installed as a service on > Ubuntu, a user reported the problem on solr-user with the 9.1.0 docker image. > {code:none} > http://server:port/solr/corename/select?indent=true&q.op=OR&q=*%3A*&rows=10&start=1&useParams={code} > {code:none} > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.search.DocList.iterator()" because "this.docs" is null at > org.apache.solr.response.DocsStreamer.(DocsStreamer.java:74) at > org.apache.solr.response.ResultContext.getProcessedDocuments(ResultContext.java:55) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:196) > at org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:47) at > org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:117) at > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:30) > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:71) > at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:980) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:585) at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:251) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:219) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) > at > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:301) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) >
[GitHub] [solr] magibney commented on pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination
magibney commented on PR #1236: URL: https://github.com/apache/solr/pull/1236#issuecomment-1361455049 Thanks everyone; committed and backported to `branch_9x` and `branch_9_1`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16585) All docs query with any nonzero positive start value throws NPE with "this.docs is null"
[ https://issues.apache.org/jira/browse/SOLR-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650906#comment-17650906 ] ASF subversion and git services commented on SOLR-16585: Commit 8093e782eeef212f2d978aaf79e2a8d0aacba6bb in solr's branch refs/heads/branch_9_1 from Michael Gibney [ https://gitbox.apache.org/repos/asf?p=solr.git;h=8093e782eee ] SOLR-16585: Fix NPE in MatchAllDocs pagination (#1236) (cherry picked from commit ced26f7132a4162dd7eaa96de2c87712bd8525fa) > All docs query with any nonzero positive start value throws NPE with > "this.docs is null" > > > Key: SOLR-16585 > URL: https://issues.apache.org/jira/browse/SOLR-16585 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 >Reporter: Shawn Heisey >Assignee: Michael Gibney >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > An all docs query that has a nonzero positive value in the start parameter > will throw an NPE. Below is a slightly redacted query sent by the admin UI > and the exception. This is from 9.2.0-SNAPSHOT installed as a service on > Ubuntu, a user reported the problem on solr-user with the 9.1.0 docker image. > {code:none} > http://server:port/solr/corename/select?indent=true&q.op=OR&q=*%3A*&rows=10&start=1&useParams={code} > {code:none} > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.search.DocList.iterator()" because "this.docs" is null at > org.apache.solr.response.DocsStreamer.(DocsStreamer.java:74) at > org.apache.solr.response.ResultContext.getProcessedDocuments(ResultContext.java:55) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:196) > at org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:47) at > org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:117) at > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:30) > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:71) > at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:980) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:585) at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:251) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:219) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) > at > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.je
[jira] [Commented] (SOLR-16585) All docs query with any nonzero positive start value throws NPE with "this.docs is null"
[ https://issues.apache.org/jira/browse/SOLR-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650905#comment-17650905 ] ASF subversion and git services commented on SOLR-16585: Commit ced26f7132a4162dd7eaa96de2c87712bd8525fa in solr's branch refs/heads/branch_9x from Michael Gibney [ https://gitbox.apache.org/repos/asf?p=solr.git;h=ced26f7132a ] SOLR-16585: Fix NPE in MatchAllDocs pagination (#1236) (cherry picked from commit bfccca2837e3f1625145454e75e2d602689f3781) > All docs query with any nonzero positive start value throws NPE with > "this.docs is null" > > > Key: SOLR-16585 > URL: https://issues.apache.org/jira/browse/SOLR-16585 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 >Reporter: Shawn Heisey >Assignee: Michael Gibney >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > An all docs query that has a nonzero positive value in the start parameter > will throw an NPE. Below is a slightly redacted query sent by the admin UI > and the exception. This is from 9.2.0-SNAPSHOT installed as a service on > Ubuntu, a user reported the problem on solr-user with the 9.1.0 docker image. > {code:none} > http://server:port/solr/corename/select?indent=true&q.op=OR&q=*%3A*&rows=10&start=1&useParams={code} > {code:none} > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.search.DocList.iterator()" because "this.docs" is null at > org.apache.solr.response.DocsStreamer.(DocsStreamer.java:74) at > org.apache.solr.response.ResultContext.getProcessedDocuments(ResultContext.java:55) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:196) > at org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:47) at > org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:117) at > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:30) > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:71) > at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:980) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:585) at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:251) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:219) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) > at > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jet
[jira] [Commented] (SOLR-16585) All docs query with any nonzero positive start value throws NPE with "this.docs is null"
[ https://issues.apache.org/jira/browse/SOLR-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650903#comment-17650903 ] ASF subversion and git services commented on SOLR-16585: Commit bfccca2837e3f1625145454e75e2d602689f3781 in solr's branch refs/heads/main from Michael Gibney [ https://gitbox.apache.org/repos/asf?p=solr.git;h=bfccca2837e ] SOLR-16585: Fix NPE in MatchAllDocs pagination (#1236) > All docs query with any nonzero positive start value throws NPE with > "this.docs is null" > > > Key: SOLR-16585 > URL: https://issues.apache.org/jira/browse/SOLR-16585 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query >Affects Versions: 9.1 >Reporter: Shawn Heisey >Assignee: Michael Gibney >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > An all docs query that has a nonzero positive value in the start parameter > will throw an NPE. Below is a slightly redacted query sent by the admin UI > and the exception. This is from 9.2.0-SNAPSHOT installed as a service on > Ubuntu, a user reported the problem on solr-user with the 9.1.0 docker image. > {code:none} > http://server:port/solr/corename/select?indent=true&q.op=OR&q=*%3A*&rows=10&start=1&useParams={code} > {code:none} > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.search.DocList.iterator()" because "this.docs" is null at > org.apache.solr.response.DocsStreamer.(DocsStreamer.java:74) at > org.apache.solr.response.ResultContext.getProcessedDocuments(ResultContext.java:55) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:196) > at org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:47) at > org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:117) at > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:30) > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:71) > at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:980) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:585) at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:251) > at > org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:219) > at > org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257) > at > org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210) at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) > at > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) > at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:301) > at >
[GitHub] [solr] magibney merged pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination
magibney merged PR #1236: URL: https://github.com/apache/solr/pull/1236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Comment Edited] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650895#comment-17650895 ] Rudi Seitz edited comment on SOLR-16594 at 12/21/22 2:37 PM: - Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create a collection using the default schema and index the following documents: {{"id":"1", "field1_ws":"XY GB"}} {{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}} {{"id":"3", "field1_ws":"XY GC"}} {{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws]}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt]}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents 3 and 4 are returned for both versions of qf – there is no change in recall when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though ws and txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow=true}} we would see that the term-centric approach is used throughout and there is no change in behavior when we add field2_txt to qf, whether we are searching for "XY GB" or "XY GC". was (Author: JIRAUSER297477): Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create a collection using the default schema and index the following documents: {{"id":"1", "field1_ws":"XY GB"}} {{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}} {{"id":"3", "field1_ws":"XY GC"}} {{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws]}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt]}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents are returned for both versions of qf – there is no change when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though ws and txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow=true}} we would
[jira] [Comment Edited] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650895#comment-17650895 ] Rudi Seitz edited comment on SOLR-16594 at 12/21/22 2:34 PM: - Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create a collection using the default schema and index the following documents: {{"id":"1", "field1_ws":"XY GB"}} {{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}} {{"id":"3", "field1_ws":"XY GC"}} {{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws]}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt]}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents are returned for both versions of qf – there is no change when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though ws and txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow=true}} we would see that the term-centric approach is used throughout and there is no change in behavior when we add field2_txt to qf, whether we are searching for "XY GB" or "XY GC". was (Author: JIRAUSER297477): Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create a collection using the default schema and index the following documents: {{{"id":"1", "field1_ws":"XY GB"}}} {{{}{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}{}}}{{{}{}}} {{{"id":"3", "field1_ws":"XY GC"}}} {{{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws]}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt]}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents are returned for both versions of qf – there is no change when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though ws and txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow=true
[GitHub] [solr-operator] tiimbz commented on issue #483: Servicemonitor for prometheus exporter is referring to cluster port instead of metrics pod port
tiimbz commented on issue #483: URL: https://github.com/apache/solr-operator/issues/483#issuecomment-1361389034 Looking at the code, it looks like the `prometheus.io/port` value is set from `ExtSolrMetricsPort`, not `SolrMetricsPort` which would have fixed the problem. Any attempts to overwrite this by using custom `serviceAnnotations` is not working, as custom annotations can only supplement the default ones, not overwrite them: https://github.com/apache/solr-operator/blob/main/controllers/util/prometheus_exporter_util.go#L400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Comment Edited] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650895#comment-17650895 ] Rudi Seitz edited comment on SOLR-16594 at 12/21/22 2:32 PM: - Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create a collection using the default schema and index the following documents: {{{"id":"1", "field1_ws":"XY GB"}}} {{{}{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}{}}}{{{}{}}} {{{"id":"3", "field1_ws":"XY GC"}}} {{{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws]}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{[http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt]}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents are returned for both versions of qf – there is no change when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though ws and txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow=true}} we would see that the term-centric approach is used throughout and there is no change in behavior when we add field2_txt to qf, whether we are searching for "XY GB" or "XY GC". was (Author: JIRAUSER297477): Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create collection using the default schema and index the following documents: {{{"id":"1", "field1_ws":"XY GB"}}} {{{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}}} {{{"id":"3", "field1_ws":"XY GC"}}} {{{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the _ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents are returned for both versions of qf – there is no change when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though _ws and _txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow
[jira] [Commented] (SOLR-16594) eDismax should use startOffset when converting per-field to per-term queries
[ https://issues.apache.org/jira/browse/SOLR-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650895#comment-17650895 ] Rudi Seitz commented on SOLR-16594: --- Steps to reproduce inconsistent {{mm}} behavior caused by term-centric to field-centric shift. Tested in Solr 9.1. Create collection using the default schema and index the following documents: {{{"id":"1", "field1_ws":"XY GB"}}} {{{"id":"2", "field1_ws":"XY", "field2_ws":"GB", "field2_txt":"GB"}}} {{{"id":"3", "field1_ws":"XY GC"}}} {{{"id":"4", "field1_ws":"XY", "field2_ws":"GC", "field2_txt":"GC"}}} Note that default schema contains a synonym rule for GB which will be applied in _txt fields: {{GB,gib,gigabyte,gigabytes}} Now try the following edismax query for "GB MB" with "minimum should match" set to 100%: {{q=XY GB}} {{mm=100%}} {{qf=field1_ws field2_ws}} {{defType=edismax}} {{http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws}} Notice that BOTH document 1 and document 2 are returned. This is because edismax is generating a term-centric query which allows the terms "XY" and "GB" to match in any of the qf fields. Now add the txt version of field2 to the qf: {{qf=field1_ws field2_ws field2_txt}} {{http://localhost:8983/solr/test/select?defType=edismax&indent=true&mm=100%25&q.op=OR&q=XY%20GB&qf=field1_ws%20field2_ws%20field2_txt}} Rerun the query and notice that ONLY document 1 is returned. This is because field2_txt expands synonyms, which leads to a different number of tokens from the _ws fields, which causes edismax to generate a field-centric query, which requires that the terms "XY" and "GB" must both match in _one_ of the provided qf fields. It is counterintuitive that expanding the range of the search to include more fields actually _reduces_ recall here, but not elsewhere: Repeat this experiment with {{q=XY GC}} In this case, notice that BOTH documents are returned for both versions of qf – there is no change when we add field2_txt to qf. That is because there is no synonym rule for GC, so even though _ws and _txt fields have "incompatible" analysis chains they happen to generate the same number of tokens for this particular query and edismax is able to stay with the term-centric approach. In these experiments we have been assuming the default {{{}sow=false{}}}. If we set {{sow=true}} we would see that the term-centric approach is used throughout and there is no change in behavior when we add field2_txt to qf, whether we are searching for "XY GB" or "XY GC". > eDismax should use startOffset when converting per-field to per-term queries > > > Key: SOLR-16594 > URL: https://issues.apache.org/jira/browse/SOLR-16594 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Rudi Seitz >Priority: Major > > When parsing a multi-term query that spans multiple fields, edismax sometimes > switches from a "term-centric" to a "field-centric" approach. This creates > inconsistent semantics for the {{mm}} or "min should match" parameter and may > have an impact on scoring. The goal of this ticket is to improve the approach > that edismax uses for generating term-centric queries so that edismax would > less frequently "give up" and resort to the field-centric approach. > Specifically, we propose that edismax should create a dismax query for each > distinct startOffset found among the tokens emitted by the field analyzers. > Since the relevant code in edismax works with Query objects that contain > Terms, and since Terms do not hold the startOffset of the Token from which > Term was derived, some plumbing work would need to be done to make the > startOffsets available to edismax. > > BACKGROUND: > > If a user searches for "foo bar" with {{{}qf=f1 f2{}}}, a field-centric > interpretation of the query would contain a clause for each field: > {{ (f1:foo f1:bar) (f2:foo f2:bar)}} > while a term-centric interpretation would contain a clause for each term: > {{ (f1:foo f2:foo) (f1:bar f2:bar)}} > The challenge in generating a term-centric query is that we need to take the > tokens that emerge from each field's analysis chain and group them according > to the terms in the user's original query. However, the tokens that emerge > from an analysis chain do not store a reference to their corresponding input > terms. For example, if we pass "foo bar" through an ngram analyzer we would > get a token stream containing "f", "fo", "foo", "b", "ba", "bar". While it > may be obvious to a human that "f", "fo", and "foo" all come from the "foo" > input term, and that "b", "ba", and "bar" come from t
[GitHub] [solr-operator] tiimbz commented on issue #483: Servicemonitor for prometheus exporter is referring to cluster port instead of metrics pod port
tiimbz commented on issue #483: URL: https://github.com/apache/solr-operator/issues/483#issuecomment-1361368911 We are having the same issue. The `prometheus.io/port` annotation is set to port `80`, which doesn't correspond with the port of the pod. This causes Prometheus to fail to scrape the service endpoint. We've also bypassed the problem by enabling scraping of the pods directly: ``` customKubeOptions: podOptions: annotations: prometheus.io/port: "8080" prometheus.io/path: /metrics prometheus.io/scrape: "true" prometheus.io/scheme: http ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16556) Solr stream expression: Implement Page Streaming Decorator to allow results to be displayed with pagination.
[ https://issues.apache.org/jira/browse/SOLR-16556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650816#comment-17650816 ] Maulin commented on SOLR-16556: --- Hi [~jbernste] , Thanks for reviewing this and providing feedback. Your understanding is correct about the start parameter. tuples will start flowing from the start param. But line 229 has nothing to do with the start parameter. This for loop (line 229 to 232) polls top rows records. If you notice its using poll method (line 230) to poll records from "top" priority queue. Here is the logic. /* 1. Read the stream and add N (rows+start) tuples into priority Queue. * 2. If new tuple from stream is greater than Tuple in 'top' priority Queue, replace tuple in priority Queue by new tuple. * 3. Add required (specified by rows param) into 'topList' Queue. */ Please let me know if you need more clarification on this. Regards, Maulin > Solr stream expression: Implement Page Streaming Decorator to allow results > to be displayed with pagination. > > > Key: SOLR-16556 > URL: https://issues.apache.org/jira/browse/SOLR-16556 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Maulin >Priority: Major > Labels: Streamingexpression, decorator, paging > Attachments: Page Decorator Performance Reading.xlsx > > Time Spent: 10m > Remaining Estimate: 0h > > Solr stream expression: Implement Page Streaming Decorator to allow results > to be displayed with pagination. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] bruno-roustant commented on pull request #1215: DocRouter: strengthen abstraction
bruno-roustant commented on PR #1215: URL: https://github.com/apache/solr/pull/1215#issuecomment-1361142463 @noblepaul I don't fully understand your point RE _"Introducing a new type of Router needs more reviews"_. To me this PR does not add a new type of DocRouter, as it is centered around CompositeIdRouter. It makes it clearer that only CompositeIdRouter is supported for split operations in SplitOp, SolrIndexSplitter, MigrateCmd. I see no new code, only some refactoring to move key parsing logic inside CompositeIdRouter. Do I miss something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] bruno-roustant commented on a diff in pull request #1215: DocRouter: strengthen abstraction
bruno-roustant commented on code in PR #1215: URL: https://github.com/apache/solr/pull/1215#discussion_r1054213209 ## solr/core/src/java/org/apache/solr/cloud/api/collections/MigrateCmd.java: ## @@ -253,7 +252,7 @@ private void migrateKey( SHARD_ID_PROP, sourceSlice.getName(), "routeKey", -SolrIndexSplitter.getRouteKey(splitKey) + "!", +sourceRouter.getRouteKeyNoSuffix(splitKey) + "!", Review Comment: Above, in the existing code, there is a cast to CompositeIdRouter. It is not clear to me if it is known that the router must be of type CompositeIdRouter. Should we add a check like checkRouterSupportsSplitKey()? ## solr/core/src/java/org/apache/solr/update/SolrIndexSplitter.java: ## @@ -765,18 +766,11 @@ static FixedBitSet[] split( return docSets; } - public static String getRouteKey(String idString) { -int idx = idString.indexOf(CompositeIdRouter.SEPARATOR); -if (idx <= 0) return null; -String part1 = idString.substring(0, idx); -int commaIdx = part1.indexOf(CompositeIdRouter.bitsSeparator); -if (commaIdx > 0 && commaIdx + 1 < part1.length()) { - char ch = part1.charAt(commaIdx + 1); - if (ch >= '0' && ch <= '9') { -part1 = part1.substring(0, commaIdx); - } + private static void checkRouterSupportsSplitKey(HashBasedRouter hashRouter, String splitKey) { Review Comment: The expectation is much clearer with this method. I'm not familiar with the other DocRouter. Does this mean that split is not supported at all with other DocRouter types? ## solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java: ## @@ -263,8 +263,9 @@ private void handleGetRanges(CoreAdminHandler.CallInfo it, String coreName) thro DocCollection collection = clusterState.getCollection(collectionName); String sliceName = parentCore.getCoreDescriptor().getCloudDescriptor().getShardId(); Slice slice = collection.getSlice(sliceName); -DocRouter router = -collection.getRouter() != null ? collection.getRouter() : DocRouter.DEFAULT; +CompositeIdRouter router = Review Comment: As I understand, the router was 'expected' to be a CompositeIdRouter, even if not clear here, because of the code below manipulating the terms and expecting to find a CompositeIdRouter.SEPARATOR. It becomes clearer. Should we add a check like checkRouterSupportsSplitKey() with a clearer exception? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] alessandrobenedetti commented on pull request #1245: SOLR-16567: KnnQueryParser support for both pre-filters and post-filter
alessandrobenedetti commented on PR #1245: URL: https://github.com/apache/solr/pull/1245#issuecomment-1361107904 > I'm looking closer at KnnQParser; ... David, your help has been invaluable! You are absolutely right and this is an oversight of mine when I did the original review of the pre-filtering work(this naming is quite common in the neural search community). I think that I have found the perfect place for this fix and it literally would require few lines of code, rather than the complicate methods that are in place now: org.apache.solr.search.QueryUtils#combineQueryAndFilter ` ... } else if(scoreQuery instanceof KnnVectorQuery){ (KnnVectorQuery)scoreQuery.setFilter(filterQuery); } else { return new BooleanQuery.Builder() .add(scoreQuery, Occur.MUST) .add(filterQuery, Occur.FILTER) .build(); } ` Basically when we combine the query with the filter, we manage the thing differently for KNN queries, and the filter(excluding post filters) is set in the Lucene KnnVectorQuery. So far so good, elegant and minimal code change, filters are processed ones , everyone is happy... BUT currently KnnVectorQuery in Lucene has no getters and setters and has all variable as final!! This is extremely annoying but we are where we are (to be honest, for a library class I would have gone with private variables with getters/setters since the beginning). Also, now Lucene is a separate project, so I basically should do the change in Lucene, then wait for a Lucene release, include it in Solr ect ect So, long story short, my suggestion: 1) we proceed with the current hack, renaming where necessary to make it nicer, but no massive change 2) contribute the change Lucene side, probably removing final, and adding getters/setters or if the community disagree, just adding getters so that It's possible to create a new class from the input one 3) once Lucene releases and we have the code in Solr, do the nice and clean implementation let me know what do you think! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[GitHub] [solr] bszabo97 commented on pull request #1196: SOLR-11029 Create a v2 API equivalent for DELETENODE API
bszabo97 commented on PR #1196: URL: https://github.com/apache/solr/pull/1196#issuecomment-1361092894 Hello @gerlowskija Thanks for the heads up and for the great description of what should be changed in the tests. I have added a commit which changes the test and implementation according to your suggestions. If you think there is anything around the performance blocker in which I can help with I am more than happy to do so! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org