[jira] [Resolved] (SOLR-17150) Create MemQueryLimit implementation

2024-10-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-17150.
-
Resolution: Fixed

> Create MemQueryLimit implementation
> ---
>
> Key: SOLR-17150
> URL: https://issues.apache.org/jira/browse/SOLR-17150
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.8
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> An implementation of {{QueryTimeout}} that terminates misbehaving queries 
> that allocate too much memory for their execution.
> This is a bit more complicated than {{CpuQueryLimits}} because the first time 
> a query is submitted it may legitimately allocate many sizeable objects 
> (caches, field values, etc). So we want to catch and terminate queries that 
> either exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
> time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17150) Create MemQueryLimit implementation

2024-10-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17150:

Fix Version/s: 9.8

> Create MemQueryLimit implementation
> ---
>
> Key: SOLR-17150
> URL: https://issues.apache.org/jira/browse/SOLR-17150
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.8
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> An implementation of {{QueryTimeout}} that terminates misbehaving queries 
> that allocate too much memory for their execution.
> This is a bit more complicated than {{CpuQueryLimits}} because the first time 
> a query is submitted it may legitimately allocate many sizeable objects 
> (caches, field values, etc). So we want to catch and terminate queries that 
> either exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
> time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17430) Redesign ExportWriter / ExportBuffers to work better with large batchSizes and slow consumption

2024-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878556#comment-17878556
 ] 

Andrzej Bialecki commented on SOLR-17430:
-

Originally this design was an evolution of a single buffer-based older design, 
where the "filler" and "writer" phases ran sequentially in the same thread. I 
agree that something we initially thought would be a simple extension ended up 
quite complicated :) 

[~jbernste] and I ran several benchmarks using the old and the current design, 
which showed big performance improvements in the current design. I think that 
these speedups benefited from the bulk (buffer-based) operations for both read 
and write sides of the process. Using a queue definitely simplifies the design 
but I'm worried we may lose some of these performance gains when processing is 
done item-by-item and not in bulk. OTOH this may not be such a huge factor 
overall, and if it allows us to simplify the code and better control the flow, 
then it may be worth it even with some performance penalty.

> Redesign ExportWriter / ExportBuffers to work better with large batchSizes 
> and slow consumption
> ---
>
> Key: SOLR-17430
> URL: https://issues.apache.org/jira/browse/SOLR-17430
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by 
> the {{ExportHandler}} is brittle and the absolutely time limit on how long 
> the buffer swapping threads will wait for eachother isn't suitable for very 
> long running streaming expressions...
> {quote}The problem however is that this 600 second timeout may not be enough 
> to account for really slow downstream consumption of the data.  With really 
> large collections, and really complicated streaming expressions, this can 
> happen even when well behaved clients that are actively trying to consume 
> data.
> {quote}
> ...but another sub-optimal aspect of this buffer swapping design is that the 
> "writer" thread is initially completely blocked, and can't write out a single 
> document, until the "filler" thread has read the full {{batchSize}} of 
> documents into it's buffer and opted to swap.  Likewise, after buffer 
> swapping has occured at least once, any document in the {{outputBuffer}} that 
> the writer has already processed hangs around, taking up ram, until the next 
> swap, while one of the threads is idle.  If {{{}batchSize=3{}}}, and the 
> "filler" thread is ready to go with a full {{fillBuffer}} while the "writer" 
> has only been able to emit 2 of the documents in it's {{outputBuffer}} 
> documents before being blocked and forced to wait (due to the downstream 
> consumer of the output bytes) before it can emit the last document in it's 
> batch – that means both the "writer" thread and the "filler" thread are 
> stalled, taking up 2x the batchSize of ram, even though half of that is data 
> that is no longer needed.
> The bigger the {{batchSize}} the worse the initial delay (and steady state 
> wasted RAM) is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17416) Streaming Expressions: Exception swallowed and not propagated back to the client leading to inconsistent results

2024-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878554#comment-17878554
 ] 

Andrzej Bialecki commented on SOLR-17416:
-

+1 for the proposed immediate fix.

> Streaming Expressions:  Exception swallowed and not propagated back to the 
> client leading to inconsistent results
> -
>
> Key: SOLR-17416
> URL: https://issues.apache.org/jira/browse/SOLR-17416
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Export Writer, streaming expressions
>Reporter: Lamine
>Priority: Major
> Attachments: SOLR-17416.patch
>
>
> There appears to be a bug in the _ExportWriter/ExportBuffers_ implementation 
> within the Streaming Expressions plugin. Specifically, when an 
> InterruptedException occurs due to an ExportBuffers timeout, the exception is 
> swallowed and not propagated back to the client (still logged on the server 
> side though).
> As a result, the client receives an EOF marker, thinking that it has received 
> the full set of results, when in fact it has only received partial results. 
> This leads to inconsistent search results, as the client is unaware that the 
> export process was interrupted and terminated prematurely.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-13350) Explore collector managers for multi-threaded search

2024-05-22 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848668#comment-17848668
 ] 

Andrzej Bialecki commented on SOLR-13350:
-

{quote}As of now, the timeAllowed requests are anyway executed without 
multithreading
{quote}
This is based on a {{QueryCommand.timeAllowed}} flag that is set only from the 
{{timeAllowed}} param. However, this concept was extended in SOLR-17138 to 
{{QueryLimits}} that is now initialized also using other params. There is 
indeed some inconsistency here that's a left-over from that change, in the 
sense that `QueryCommand.timeAllowed` should have been either removed 
completely or replaced with something like {{{}queryLimits{}}}, to make sure to 
check the current SolrRequestInfo for QueryLimits.

In any case, the minimal workaround for this could be to check 
{{QueryLimits.getCurrentLimits().isLimitsEnabled()}} instead of 
{{{}QueryCommand.timeAllowed{}}}. But a better fix would be to properly unbreak 
the tracking of the parent {{SolrRequestInfo}} in MT search.

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13350.patch, SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-13350) Explore collector managers for multi-threaded search

2024-05-10 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845258#comment-17845258
 ] 

Andrzej Bialecki commented on SOLR-13350:
-

This is caused by breaking the end-to-end tracking of request context in 
{{{}SolrRequestInfo{}}}, which uses a thread-local deque to provide the same 
context for both the main and all sub-requests. This tracking is needed to 
setup the correct query timeout instance on the searcher ( {{QueryLimits}} ) 
for time-limited searches in the {{SolrIndexSearcher:727}} . However, now that 
this method is executed in a separate "searcherCollector" thread the 
{{SolrRequestInfo}} instance it obtains is empty because it doesn't match the 
original thread that set it.

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13350.patch, SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17150) Create MemQueryLimit implementation

2024-04-30 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842339#comment-17842339
 ] 

Andrzej Bialecki commented on SOLR-17150:
-

After discussing this with other people it looks like the dynamic limits would 
be tricky to properly set and the interaction between the occasional legitimate 
heavier query traffic, updates (which would trigger searcher re-open and a mem 
usage spike) and other factors could cause too many failures.

Still, having support for a hard limit to prevent a total run-away that would 
result in OOM seems useful. I'll prepare another patch that contains just the 
hard limit.

> Create MemQueryLimit implementation
> ---
>
> Key: SOLR-17150
> URL: https://issues.apache.org/jira/browse/SOLR-17150
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> An implementation of {{QueryTimeout}} that terminates misbehaving queries 
> that allocate too much memory for their execution.
> This is a bit more complicated than {{CpuQueryLimits}} because the first time 
> a query is submitted it may legitimately allocate many sizeable objects 
> (caches, field values, etc). So we want to catch and terminate queries that 
> either exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
> time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-04-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833944#comment-17833944
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

[~dsmiley] these are not exactly equivalent - when a limit is reached it 
doesn't have to be related in any way to per-shard processing.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17199) EnvUtils in solr-solrj is missing EnvToSyspropMappings.properties from solr-core

2024-03-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824384#comment-17824384
 ] 

Andrzej Bialecki commented on SOLR-17199:
-

I didn't see it - thanks for fixing it!

> EnvUtils in solr-solrj is missing EnvToSyspropMappings.properties from 
> solr-core
> 
>
> Key: SOLR-17199
> URL: https://issues.apache.org/jira/browse/SOLR-17199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 9.5.0
>Reporter: Andrzej Bialecki
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 9.6.0
>
>
> Initially in SOLR-15960 {{EnvUtils}} was located in solr-core, together with 
> its configuration resource {{EnvToSyspropMappings.properties}}. Then it has 
> been moved from solr-core to solr-solrj but the configuration resource has 
> been left in solr-core.
> This unfortunately means that {{EnvUtils}} cannot be used without dependency 
> on solr-core, unless user adds their own copy of the configuration resource 
> to the classpath. Right now trying to use it (or using {{PropertiesUtil}} for 
> property substitution) results in an exception from the static initializer:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:209)
>   at org.apache.solr.common.util.EnvUtils.(EnvUtils.java:51)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17199) EnvUtils in solr-solrj is missing EnvToSyspropMappings.properties from solr-core

2024-03-07 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17199:
---

 Summary: EnvUtils in solr-solrj is missing 
EnvToSyspropMappings.properties from solr-core
 Key: SOLR-17199
 URL: https://issues.apache.org/jira/browse/SOLR-17199
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki


Initially in SOLR-15960 {{EnvUtils}} was located in solr-core, together with 
its configuration resource {{EnvToSyspropMappings.properties}}. Then it has 
been moved from solr-core to solr-solrj but the configuration resource has been 
left in solr-core.

This unfortunately means that {{EnvUtils}} cannot be used without dependency on 
solr-core, unless user adds their own copy of the configuration resource to the 
classpath. Right now trying to use it (or using {{PropertiesUtil}} for property 
substitution) results in an exception from the static initializer:
{code}
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:209)
at org.apache.solr.common.util.EnvUtils.(EnvUtils.java:51)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17182) Eliminate the need for 'solr.useExitableDirectoryReader' sysprop

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17182:

Component/s: Query Limits

> Eliminate the need for 'solr.useExitableDirectoryReader' sysprop
> 
>
> Key: SOLR-17182
> URL: https://issues.apache.org/jira/browse/SOLR-17182
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Chris M. Hostetter
>Priority: Major
>
> As the {{QueryLimit}} functionality in Solr gets beefed up, and supports 
> multiple types of limits, it would be nice if we could find a way to 
> eliminate the need for the {{solr.useExitableDirectoryReader}} sysprop, and 
> instead just have codepaths that use the underlying IndexReader  (like 
> faceting, spellcheck, etc...)  automatically get a reader that enforces the 
> limits if/when limits are in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-17172.
-
Resolution: Fixed

> Add QueryLimits termination to existing heavy SearchComponent-s
> ---
>
> Key: SOLR-17172
> URL: https://issues.apache.org/jira/browse/SOLR-17172
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to review the existing {{SearchComponent}}-s 
> that perform intensive tasks to see if they could be modified to check the 
> {{QueryLimits.shouldExit()}} inside their execution.
> This is not meant to be included in tight loops but to prevent individual 
> components from completing multiple stages of costly work that will be 
> discarded anyway on the exit from the component due to the exceeded limits 
> (SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17172:

Fix Version/s: 9.6.0

> Add QueryLimits termination to existing heavy SearchComponent-s
> ---
>
> Key: SOLR-17172
> URL: https://issues.apache.org/jira/browse/SOLR-17172
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to review the existing {{SearchComponent}}-s 
> that perform intensive tasks to see if they could be modified to check the 
> {{QueryLimits.shouldExit()}} inside their execution.
> This is not meant to be included in tight loops but to prevent individual 
> components from completing multiple stages of costly work that will be 
> discarded anyway on the exit from the component due to the exceeded limits 
> (SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17172:

Component/s: Query Limits

> Add QueryLimits termination to existing heavy SearchComponent-s
> ---
>
> Key: SOLR-17172
> URL: https://issues.apache.org/jira/browse/SOLR-17172
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The purpose of this ticket is to review the existing {{SearchComponent}}-s 
> that perform intensive tasks to see if they could be modified to check the 
> {{QueryLimits.shouldExit()}} inside their execution.
> This is not meant to be included in tight loops but to prevent individual 
> components from completing multiple stages of costly work that will be 
> discarded anyway on the exit from the component due to the exceeded limits 
> (SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-26 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820781#comment-17820781
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

I'm not convinced we need a sysprop here... why shouldn't we use request 
handler's {{defaults}} and {{invariants}} sections in {{solrconfig.xml}} ? 
Using a sysprop effectively enforces the same default behavior for all replicas 
of all collections managed by this Solr node.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820030#comment-17820030
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

FYI, it was necessary to add this parameter in SOLR-17172, I used 
{{partialResults=true}} to mean that we should stop processing and return 
partial results with "success" code and "partialResults" flag in the response, 
and {{partialResults=false}} to mean that we should throw an exception and 
discard any partial results.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819319#comment-17819319
 ] 

Andrzej Bialecki commented on SOLR-17158:
-

Adding some observations from reading the code in {{{}SolrIndexSearcher}} and 
{{HttpShardHandler}}.

It appears that currently when {{timeAllowed}} is reached it doesn’t cause 
termination of all other pending shard requests. I found this section in 
{{SolrIndexSearcher:284}}:

{{  try {}}
{{      super.search(query, collector);}}
{{    } catch (TimeLimitingCollector.TimeExceededException}}
{{        | ExitableDirectoryReader.ExitingReaderException}}
{{        | CancellableCollector.QueryCancelledException x) {}}
{{      log.warn("Query: [{}]; ", query, x);}}
{{      qr.setPartialResults(true);}}

In the case when it reaches {{timeAllowed}} limit (and our new {{QueryLimits}}, 
too) it simply sets {{partialResults=true}} and does NOT throw any exception, 
so all the layers above think that the result is a success.

I suspect the reason for this was that when {{timeAllowed}} was set we still 
wanted to retrieve partial results when the limit was hit, and throwing an 
exception here would prevent that.

OTOH, if we had a request param saying “discard everything when you reach a 
limit and cancel any ongoing requests” then we could throw an exception here, 
and {{ShardHandler}} would recognize this as an error and cancel all other 
shard requests that are still pending, so that replicas could avoid sending 
back their results that would be discarded anyway.

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-17151) Review current usage of QueryLimits to ensure complete coverage

2024-02-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819299#comment-17819299
 ] 

Andrzej Bialecki edited comment on SOLR-17151 at 2/21/24 3:49 PM:
--

Let's focus here on improving the checking between components as opposed to 
SOLR-17172.


was (Author: ab):
Let's focus here on improving the checking between components as opposed on 
SOLR-17172.

> Review current usage of QueryLimits to ensure complete coverage
> ---
>
> Key: SOLR-17151
> URL: https://issues.apache.org/jira/browse/SOLR-17151
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Resource usage by a query is not limited to the actual search within 
> {{QueryComponent}}. Other components invoked by {{SearchHandler}} may 
> significantly contribute to this usage, either before or after the 
> {{QueryComponent}}.
> Those components that already use {{QueryTimeout}} either directly or 
> indirectly will properly observe the limits and terminate if needed. However, 
> other components may be expensive or misbehaving but fail to observe the 
> limits imposed on the end-to-end query processing.
> One such obvious place where we could add this check is where the 
> {{SearchHandler}} loops over {{SearchComponent}-s - it should call explicitly 
> {{QueryLimits.shouldExit()}} to ensure that even if previously executed 
> component ignored the limits they will be still enforced at the 
> {{SearchHandler}} level. There may be other places like this, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17151) Review current usage of QueryLimits to ensure complete coverage

2024-02-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819299#comment-17819299
 ] 

Andrzej Bialecki commented on SOLR-17151:
-

Let's focus here on improving the checking between components as opposed on 
SOLR-17172.

> Review current usage of QueryLimits to ensure complete coverage
> ---
>
> Key: SOLR-17151
> URL: https://issues.apache.org/jira/browse/SOLR-17151
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Gus Heck
>Priority: Major
>
> Resource usage by a query is not limited to the actual search within 
> {{QueryComponent}}. Other components invoked by {{SearchHandler}} may 
> significantly contribute to this usage, either before or after the 
> {{QueryComponent}}.
> Those components that already use {{QueryTimeout}} either directly or 
> indirectly will properly observe the limits and terminate if needed. However, 
> other components may be expensive or misbehaving but fail to observe the 
> limits imposed on the end-to-end query processing.
> One such obvious place where we could add this check is where the 
> {{SearchHandler}} loops over {{SearchComponent}-s - it should call explicitly 
> {{QueryLimits.shouldExit()}} to ensure that even if previously executed 
> component ignored the limits they will be still enforced at the 
> {{SearchHandler}} level. There may be other places like this, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17172) Add QueryLimits termination to existing heavy SearchComponent-s

2024-02-21 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17172:
---

 Summary: Add QueryLimits termination to existing heavy 
SearchComponent-s
 Key: SOLR-17172
 URL: https://issues.apache.org/jira/browse/SOLR-17172
 Project: Solr
  Issue Type: Sub-task
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


The purpose of this ticket is to review the existing {{SearchComponent}}-s that 
perform intensive tasks to see if they could be modified to check the 
{{QueryLimits.shouldExit()}} inside their execution.

This is not meant to be included in tight loops but to prevent individual 
components from completing multiple stages of costly work that will be 
discarded anyway on the exit from the component due to the exceeded limits 
(SOLR-17151).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-17141) Create CpuAllowedLimit implementation

2024-02-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-17141.
-
Resolution: Fixed

> Create CpuAllowedLimit implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17141) Create CpuAllowedLimit implementation

2024-02-19 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17141:

Fix Version/s: 9.6.0

> Create CpuAllowedLimit implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 9.6.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17141) Create CpuAllowedLimit implementation

2024-02-19 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17141:

Summary: Create CpuAllowedLimit implementation  (was: Create CpuQueryLimit 
implementation)

> Create CpuAllowedLimit implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-17141) Create CpuQueryLimit implementation

2024-02-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816644#comment-17816644
 ] 

Andrzej Bialecki edited comment on SOLR-17141 at 2/12/24 3:29 PM:
--

[~gus] and I discussed this issue - the way {{ThreadStats}} is used in 
SOLR-16986 gives incomplete results because it ignores nested queries (which 
use the stack in {{SolrRequestInfo}}). We would like to fix this as part of the 
SOLR-17138 refactoring, and to avoid potential confusion when logged CPU time 
is different than the CPU time limit set here. This can be done when both the 
{{CpuQueryTimeLimit}} and {{ThreadStats}} use the same starting point but keep 
track of nested requests.


was (Author: ab):
[~gus] and I discussed this issue - the way {{ThreadStats}} is used in 
SOLR-16986 gives incomplete results because it ignores nested queries (which 
use the stack in {{{}SolrRequestInfo{}}}. We would like to fix this as part of 
the SOLR-17138 refactoring, and to avoid potential confusion when logged CPU 
time is different than the CPU time limit set here. This can be done when both 
the {{CpuQueryTimeLimit}} and {{ThreadStats}} use the same starting point but 
keep track of nested requests.

> Create CpuQueryLimit implementation
> ---
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17141) Create CpuQueryLimit implementation

2024-02-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816644#comment-17816644
 ] 

Andrzej Bialecki commented on SOLR-17141:
-

[~gus] and I discussed this issue - the way {{ThreadStats}} is used in 
SOLR-16986 gives incomplete results because it ignores nested queries (which 
use the stack in {{{}SolrRequestInfo{}}}. We would like to fix this as part of 
the SOLR-17138 refactoring, and to avoid potential confusion when logged CPU 
time is different than the CPU time limit set here. This can be done when both 
the {{CpuQueryTimeLimit}} and {{ThreadStats}} use the same starting point but 
keep track of nested requests.

> Create CpuQueryLimit implementation
> ---
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-16986) Measure and aggregate thread CPU time in distributed search

2024-02-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816424#comment-17816424
 ] 

Andrzej Bialecki edited comment on SOLR-16986 at 2/11/24 2:16 PM:
--

Ok. In any case, this has a bug in that it ignores all but the first time 
measure when there are nested requests. [~gus] and I will look into reusing 
{{ThreadStats}} if possible and fixing this in SOLR-17140 so that the CPU time 
logged and the CPU time limit enforced by {{CpuQueryTimeLimit}} are consistent.


was (Author: ab):
Ok. In any case, this has a bug in that it ignores all but the first time 
measure when there are nested requests. [~gus] and I will look into reusing 
{{ThreadStats}} if possible and fixing this in SOLR-17140.

> Measure and aggregate thread CPU time in distributed search
> ---
>
> Key: SOLR-16986
> URL: https://issues.apache.org/jira/browse/SOLR-16986
> Project: Solr
>  Issue Type: New Feature
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Solr responses include "QTime", which in retrospect might have been better 
> named "elapsedTime".  We propose adding here a "cpuTime" to return the amount 
> of time consumed by 
> ManagementFactory.getThreadMXBean().[getThreadCpuTime|https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/ThreadMXBean.html]().
>   Unlike QTime, this will need to be aggregated across distributed requests.  
> This work item will only do the aggregation work for distributed search, 
> although it could be extended for other scenarios in future work items.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16986) Measure and aggregate thread CPU time in distributed search

2024-02-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816424#comment-17816424
 ] 

Andrzej Bialecki commented on SOLR-16986:
-

Ok. In any case, this has a bug in that it ignores all but the first time 
measure when there are nested requests. [~gus] and I will look into reusing 
{{ThreadStats}} if possible and fixing this in SOLR-17140.

> Measure and aggregate thread CPU time in distributed search
> ---
>
> Key: SOLR-16986
> URL: https://issues.apache.org/jira/browse/SOLR-16986
> Project: Solr
>  Issue Type: New Feature
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Solr responses include "QTime", which in retrospect might have been better 
> named "elapsedTime".  We propose adding here a "cpuTime" to return the amount 
> of time consumed by 
> ManagementFactory.getThreadMXBean().[getThreadCpuTime|https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/ThreadMXBean.html]().
>   Unlike QTime, this will need to be aggregated across distributed requests.  
> This work item will only do the aggregation work for distributed search, 
> although it could be extended for other scenarios in future work items.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Assigned] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-09 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-17158:
---

Assignee: Andrzej Bialecki

> Terminate distributed processing quickly when query limit is reached
> 
>
> Key: SOLR-17158
> URL: https://issues.apache.org/jira/browse/SOLR-17158
> Project: Solr
>  Issue Type: Sub-task
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr should make sure that when query limits are reached and partial results 
> are not needed (and not wanted) then both the processing in shards and in the 
> query coordinator should be terminated as quickly as possible, and Solr 
> should minimize wasted resources spent on eg. returning data from the 
> remaining shards, merging responses in the coordinator, or returning any data 
> back to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17138) Support other QueryTimeout criteria

2024-02-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816106#comment-17816106
 ] 

Andrzej Bialecki commented on SOLR-17138:
-

Here are results from a set of simple JMH benchmarks where the code would call 
the respective method for 100 threads. Results are in nanoseconds / call / 
thread. Both methods are supported and enabled by default on all tested JVMs.
 * Results on Macbook M1 Max, MacOS Sonoma.

||*Java version*||*getThreadAllocatedBytes*||*getCpuThreadTime*||
|Azul Zulu 11|95|757|
|OpenJDK 17|72|730|
|OpenJDK 21|83|819|

 
 *  Results on a Linux VM (on a Kubernetes cluster) running Ubuntu 22.04.

||*Java version*||*getThreadAllocatedBytes*||*getCpuThreadTime*||
|OpenJDK 11|40|238|
|OpenJDK 17|36|239|
|OpenJDK 21|41|236|
 * Results on a Windows VM (on a Kubernetes cluster) running Windows Server 
Core 10.

||*Java version*||*getThreadAllocatedBytes*||*getCpuThreadTime*||
|OpenJDK 11|108|440|
|Oracle Java 17|103|426|
|Oracle Java 21|105|447|

> Support other QueryTimeout criteria
> ---
>
> Key: SOLR-17138
> URL: https://issues.apache.org/jira/browse/SOLR-17138
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Priority: Major
>
> Complex Solr queries can consume significant memory and CPU while being 
> processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
> which further compounds the problem. Often such “killer queries” are not 
> written to logs, which makes them difficult to diagnose. This happens even 
> with best practices in place.
> It should be possible to set limits in Solr that cannot be exceeded by 
> individual queries. This mechanism would monitor an accumulating “cost” of a 
> query while it’s being executed and compare it to the configured maximum cost 
> (budget), expressed in terms of CPU and/or memory usage that can be 
> attributed to this query. Should these limits be exceeded the individual 
> query execution should be terminated, without affecting other concurrently 
> executing queries.
> The CircuitBreakers functionality doesn't distinguish the source of the load 
> and can't protect other query executions from a particular runaway query. We 
> need a more fine-grained mechanism.
> The existing {{QueryTimeout}} API enables such termination of individual 
> queries. However, the existing implementation ({{SolrQueryTimeoutImpl}} used 
> with {{timeAllowed}} query param) only uses elapsed wall-clock time as the 
> termination criterion. This is insufficient - in case of resource contention 
> the wall-clock time doesn’t represent correctly the actual CPU cost of 
> executing a particular query. A query may produce results after a long time 
> not because of its complexity or bad behavior but because of the general 
> resource contention caused by other concurrently executing queries. OTOH a 
> single runaway query may consume all resources and cause all other valid 
> queries to fail if they exceed the wall-clock {{timeAllowed}}.
> I propose adding two additional criteria for limiting the maximum "query 
> budget":
>  * per-thread CPU time: using {{getThreadCpuTime}} to periodically check 
> ({{QueryTimeout.shouldExit()}}) the current CPU consumption since the start 
> of the query execution.
>  * per-thread memory allocation: using {{getThreadAllocatedBytes}}.
> I ran some JMH microbenchmarks to ensure that these two methods are available 
> on modern OS/JVM combinations and their cost is negligible (less than 0.5 
> us/call). This means that the initial implementation may call these methods 
> directly for every {{shouldExit()}} call without undue burden. If we decide 
> that this still adds too much overhead we can change this to periodic updates 
> in a background thread.
> These two "query budget" constraints can be implemented as subclasses of 
> {{QueryTimeout}}. Initially we can use a similar configuration mechanism as 
> with {{timeAllowed}}, i.e. pass the max value as a query param, or add it to 
> the search handler's invariants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17158) Terminate distributed processing quickly when query limit is reached

2024-02-09 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17158:
---

 Summary: Terminate distributed processing quickly when query limit 
is reached
 Key: SOLR-17158
 URL: https://issues.apache.org/jira/browse/SOLR-17158
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Limits
Reporter: Andrzej Bialecki


Solr should make sure that when query limits are reached and partial results 
are not needed (and not wanted) then both the processing in shards and in the 
query coordinator should be terminated as quickly as possible, and Solr should 
minimize wasted resources spent on eg. returning data from the remaining 
shards, merging responses in the coordinator, or returning any data back to the 
user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17150) Create MemQueryLimit implementation

2024-02-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815673#comment-17815673
 ] 

Andrzej Bialecki commented on SOLR-17150:
-

Here's the proposed approach to implement two thresholds:
 * an absolute max limit to terminate any query that exceeds this allocation
 * a relative dynamic limit to terminate queries that exceed "typical" 
allocation

For the absolute limit: as with other implementations, {{memAllowed}} would set 
the absolute limit per query (float value in megabytes?). In order to 
accommodate initial queries this should be set to a relatively high value, 
which isn't optimal later for typical queries - this higher limit will 
eventually catch runaway queries but not before they consume significant memory.

For the dynamic limit: a histogram would be added to the metrics to track the 
recent memory usage per query (using exponentially decaying reservoir). The 
life-cycle of the histogram could be tied either to SolrCore or to 
SolrIndexSearcher (the latter seems more appropriate because of the warmup 
queries that would skew the longer-term stats in SolrCore's life-cycle).

After collecting sufficient number of data points (eg. {{{}N = 100{}}}) the 
component could start enforcing a dynamic limit based on a formula that takes 
into account the "typical" recent queries. For example: {{{}dynamicThreshold = 
X * p99{}}}, where {{X = 2.0}} by default.

Open issues:
 * does the dynamic threshold make sense? does the formula make sense?
 * I think that both the static and dynamic limits should be optional, ie. some 
combination of query params should allow user to skip the enforcement of either 
/ both. 
 * since the dynamic limit involves parameters (at least N and X above) that 
determine long-term tracking it can no longer be expressed just as short-lived 
query params, it needs a configuration with a life-cycle of SolrCore or longer. 
Where should we put this configuration?

> Create MemQueryLimit implementation
> ---
>
> Key: SOLR-17150
> URL: https://issues.apache.org/jira/browse/SOLR-17150
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Query Limits
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> An implementation of {{QueryTimeout}} that terminates misbehaving queries 
> that allocate too much memory for their execution.
> This is a bit more complicated than {{CpuQueryLimits}} because the first time 
> a query is submitted it may legitimately allocate many sizeable objects 
> (caches, field values, etc). So we want to catch and terminate queries that 
> either exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
> time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17151) Review current usage of QueryLimits to ensure complete coverage

2024-02-05 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17151:
---

 Summary: Review current usage of QueryLimits to ensure complete 
coverage
 Key: SOLR-17151
 URL: https://issues.apache.org/jira/browse/SOLR-17151
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Budget
Reporter: Andrzej Bialecki


Resource usage by a query is not limited to the actual search within 
{{QueryComponent}}. Other components invoked by {{SearchHandler}} may 
significantly contribute to this usage, either before or after the 
{{QueryComponent}}.

Those components that already use {{QueryTimeout}} either directly or 
indirectly will properly observe the limits and terminate if needed. However, 
other components may be expensive or misbehaving but fail to observe the limits 
imposed on the end-to-end query processing.

One such obvious place where we could add this check is where the 
{{SearchHandler}} loops over {{SearchComponent}-s - it should call explicitly 
{{QueryLimits.shouldExit()}} to ensure that even if previously executed 
component ignored the limits they will be still enforced at the 
{{SearchHandler}} level. There may be other places like this, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17138) Support other QueryTimeout criteria

2024-02-05 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17138:

Description: 
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing {{QueryTimeout}} API enables such termination of individual 
queries. However, the existing implementation ({{SolrQueryTimeoutImpl}} used 
with {{timeAllowed}} query param) only uses elapsed wall-clock time as the 
termination criterion. This is insufficient - in case of resource contention 
the wall-clock time doesn’t represent correctly the actual CPU cost of 
executing a particular query. A query may produce results after a long time not 
because of its complexity or bad behavior but because of the general resource 
contention caused by other concurrently executing queries. OTOH a single 
runaway query may consume all resources and cause all other valid queries to 
fail if they exceed the wall-clock {{timeAllowed}}.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using {{getThreadCpuTime}} to periodically check 
({{QueryTimeout.shouldExit()}}) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using {{getThreadAllocatedBytes}}.

I ran some JMH microbenchmarks to ensure that these two methods are available 
on modern OS/JVM combinations and their cost is negligible (less than 0.5 
us/call). This means that the initial implementation may call these methods 
directly for every {{shouldExit()}} call without undue burden. If we decide 
that this still adds too much overhead we can change this to periodic updates 
in a background thread.

These two "query budget" constraints can be implemented as subclasses of 
{{QueryTimeout}}. Initially we can use a similar configuration mechanism as 
with {{timeAllowed}}, i.e. pass the max value as a query param, or add it to 
the search handler's invariants.

  was:
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAlloca

[jira] [Updated] (SOLR-17141) Create CpuQueryLimit implementation

2024-02-05 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17141:

Summary: Create CpuQueryLimit implementation  (was: Create CpuQueryTimeout 
implementation)

> Create CpuQueryLimit implementation
> ---
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17150) Create MemQueryLimit implementation

2024-02-05 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17150:
---

 Summary: Create MemQueryLimit implementation
 Key: SOLR-17150
 URL: https://issues.apache.org/jira/browse/SOLR-17150
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Budget
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


An implementation of {{QueryTimeout}} that terminates misbehaving queries that 
allocate too much memory for their execution.

This is a bit more complicated than {{CpuQueryLimits}} because the first time a 
query is submitted it may legitimately allocate many sizeable objects (caches, 
field values, etc). So we want to catch and terminate queries that either 
exceed any reasonable threshold (eg. 2GB), or significantly exceed a 
time-weighted percentile of the recent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Assigned] (SOLR-17141) Create CpuQueryTimeout implementation

2024-01-30 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-17141:
---

Assignee: Andrzej Bialecki

> Create CpuQueryTimeout implementation
> -
>
> Key: SOLR-17141
> URL: https://issues.apache.org/jira/browse/SOLR-17141
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This class will use `getThreadCpuTime` to determine when to signal 
> `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17141) Create CpuQueryTimeout implementation

2024-01-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17141:
---

 Summary: Create CpuQueryTimeout implementation
 Key: SOLR-17141
 URL: https://issues.apache.org/jira/browse/SOLR-17141
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki


This class will use `getThreadCpuTime` to determine when to signal `shouldExit`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17140) Refactor SolrQueryTimeoutImpl to support other implementations

2024-01-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17140:
---

 Summary: Refactor SolrQueryTimeoutImpl to support other 
implementations
 Key: SOLR-17140
 URL: https://issues.apache.org/jira/browse/SOLR-17140
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17138) Support other QueryTimeout criteria

2024-01-30 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-17138:

Description: 
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAllocatedBytes`.

I ran some JMH microbenchmarks to ensure that these two methods are available 
on modern OS/JVM combinations and their cost is negligible (less than 0.5 
us/call). This means that the initial implementation may call these methods 
directly for every `shouldExit()` call without undue burden. If we decide that 
this still adds too much overhead we can change this to periodic updates in a 
background thread.

These two "query budget" constraints can be implemented as subclasses of 
`QueryTimeout`. Initially we can use a similar configuration mechanism as with 
`timeAllowed`, i.e. pass the max value as a query param, or add it to the 
search handler's invariants.

  was:
Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAllocatedBytes`.

I ran so

[jira] [Created] (SOLR-17138) Support other QueryTimeout criteria

2024-01-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-17138:
---

 Summary: Support other QueryTimeout criteria
 Key: SOLR-17138
 URL: https://issues.apache.org/jira/browse/SOLR-17138
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Query Budget
Reporter: Andrzej Bialecki


Complex Solr queries can consume significant memory and CPU while being 
processed. When OOM or CPU saturation is reached Solr becomes unresponsive, 
which further compounds the problem. Often such “killer queries” are not 
written to logs, which makes them difficult to diagnose. This happens even with 
best practices in place.

It should be possible to set limits in Solr that cannot be exceeded by 
individual queries. This mechanism would monitor an accumulating “cost” of a 
query while it’s being executed and compare it to the configured maximum cost 
(budget), expressed in terms of CPU and/or memory usage that can be attributed 
to this query. Should these limits be exceeded the individual query execution 
should be terminated, without affecting other concurrently executing queries.

The CircuitBreakers functionality doesn't distinguish the source of the load 
and can't protect other query executions from a particular runaway query. We 
need a more fine-grained mechanism.

The existing `QueryTimeout` API enables such termination of individual queries. 
However, the existing implementation (`SolrQueryTimeoutImpl` used with 
`timeAllowed` query param) only uses elapsed wall-clock time as the termination 
criterion. This is insufficient - in case of resource contention the wall-clock 
time doesn’t represent correctly the actual CPU cost of executing a particular 
query. A query may produce results after a long time not because of its 
complexity or bad behavior but because of the general resource contention 
caused by other concurrently executing queries. OTOH a single runaway query may 
consume all resources and cause all other valid queries to fail if they exceed 
the wall-clock `timeAllowed`.

I propose adding two additional criteria for limiting the maximum "query 
budget":
 * per-thread CPU time: using `getThreadCpuTime` to periodically check 
(`QueryTimeout.shouldExit()`) the current CPU consumption since the start of 
the query execution.
 * per-thread memory allocation: using `getThreadAllocatedBytes`.

I ran some JMH microbenchmarks to ensure that these two methods are available 
on modern OS/JVM combinations and their cost is negligible (less than 0.5 
us/call). This means that the initial implementation may call these methods 
directly for every `shouldExist()` call without undue burden. If we decide that 
this still adds too much overhead we can change this to periodic updates in a 
background thread.

These two "query budget" constraints can be implemented as subclasses of 
`QueryTimeout`. Initially we can use a similar configuration mechanism as with 
`timeAllowed`, i.e. pass the max value as a query param, or add it to the 
search handler's invariants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16507) Remove NodeStateProvider & Snitch

2023-03-22 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703746#comment-17703746
 ] 

Andrzej Bialecki commented on SOLR-16507:
-

bq. Do you think there's a point to SplitShardCmd using NodeStateProvider vs 
just going to the metrics API?

You're making my point for me ;) you could do that, but the SplitShardCmd would 
become very complex with a lot of non-reusable code - because you would still 
have to bake-in making HTTP requests to other nodes and parsing / extracting 
metrics values. So it's better to hide this complexity in a high-level utility 
API.

IMHO NodeStateProvider is a good abstraction, just its implementation needs to 
be cleaned up.

> Remove NodeStateProvider & Snitch
> -
>
> Key: SOLR-16507
> URL: https://issues.apache.org/jira/browse/SOLR-16507
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
>Priority: Major
>  Labels: newdev
>
> The NodeStateProvider is a relic relating to the old autoscaling framework 
> that was removed in Solr 9.  The only remaining usage of it is for 
> SplitShardCmd to check the disk space.  For this, it could use the metrics 
> api.
> I think we'll observe that Snitch and other classes in 
> org.apache.solr.common.cloud.rule can be removed as well, as it's related to 
> NodeStateProvider.
> Only 
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl#getMetricSnitchTag
>  and org.apache.solr.cluster.placement.impl.NodeMetricImpl refer to some 
> constants in the code to be removed.  Those constants could move out, 
> consolidated somewhere we think is appropriate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16507) Remove NodeStateProvider & Snitch

2023-03-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703331#comment-17703331
 ] 

Andrzej Bialecki commented on SOLR-16507:
-

Thanks [~dsmiley] for bringing this to my attention.
 
The reason we used the existing NodeStateProvider abstraction in the replica 
placement code was that retrieving per-node metrics is messy and quirky, all of 
which is hidden in SolrClientNodeStateProvider.
 
The internal structure (snitches and co) can and should be refactored and 
simplified because these concepts are not used anywhere else anymore, they are 
legacy abstractions from the time when they were used for collection rules DSL.
 
However, IMHO something like NodeStateProvider still has its place. No matter 
what you replace it with, the complexity of retrieving per-node attributes will 
still be present somewhere- and hiding it in a NodeStateProvider (or similar 
concept) as a high-level API at least gives us a possibility of reuse. If we 
were to put all this nasty code into AttributeFetcherImpl then we would pretty 
much limit its usefulness only to the placement code.
 
 
SolrCloudManager is perhaps no longer useful anymore and can be factored out, 
but IMHO something equivalent to NodeStateProvider is still needed.
 
Re. "snitchSession" - this is now used only in `ImplicitSnitch` for caching the 
node roles, in order to avoid loading this data from ZK for every node.

> Remove NodeStateProvider & Snitch
> -
>
> Key: SOLR-16507
> URL: https://issues.apache.org/jira/browse/SOLR-16507
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
>Priority: Major
>  Labels: newdev
>
> The NodeStateProvider is a relic relating to the old autoscaling framework 
> that was removed in Solr 9.  The only remaining usage of it is for 
> SplitShardCmd to check the disk space.  For this, it could use the metrics 
> api.
> I think we'll observe that Snitch and other classes in 
> org.apache.solr.common.cloud.rule can be removed as well, as it's related to 
> NodeStateProvider.
> Only 
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl#getMetricSnitchTag
>  and org.apache.solr.cluster.placement.impl.NodeMetricImpl refer to some 
> constants in the code to be removed.  Those constants could move out, 
> consolidated somewhere we think is appropriate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-08 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685747#comment-17685747
 ] 

Andrzej Bialecki commented on SOLR-16649:
-

Oops, right - I attached the new patch.

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649-1.patch, SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-08 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-16649:

Attachment: SOLR-16649-1.patch

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649-1.patch, SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685416#comment-17685416
 ] 

Andrzej Bialecki commented on SOLR-16649:
-

Simple patch with a test case - it fails with stock code, succeeds with the fix.

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-16649:

Attachment: SOLR-16649.patch

> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-16649.patch
>
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-16649:

Description: 
{{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
argument to it - instead of passing the local {{processor}} arg it uses the 
instance field {{parser}}.

Throughout this class there's a repeated pattern that easily leads to this 
confusion - in many methods a local var {{parser}} is created that overshadows 
the instance field, and then this local {{parser}} is passed around as argument 
to various operations. However, in this particular method the argument passed 
from the caller is named differently  ({{processor}}) and thus does not 
overshadow the instance field, which leads to this mistake.

  was:
`Http2SolrClient:800` calls `wantStream(...)` method but passes the wrong 
argument to it - instead of passing the local `processor` arg it uses the 
instance field `parser`.

Throughout this class there's a repeated pattern that easily leads to this 
confusion - in many methods a local var `parser` is created that overshadows 
the instance field, and then this local `parser` is passed around as argument 
to various operations. However, in this method the argument passed from the 
caller is named differently  (`processor`) and thus does not overshadow the 
instance field, which leads to this mistake.


> Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser
> --
>
> Key: SOLR-16649
> URL: https://issues.apache.org/jira/browse/SOLR-16649
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: main (10.0), 9.1.1
>Reporter: Andrzej Bialecki
>Priority: Major
>
> {{Http2SolrClient:800}} calls {{wantStream(...)}} method but passes the wrong 
> argument to it - instead of passing the local {{processor}} arg it uses the 
> instance field {{parser}}.
> Throughout this class there's a repeated pattern that easily leads to this 
> confusion - in many methods a local var {{parser}} is created that 
> overshadows the instance field, and then this local {{parser}} is passed 
> around as argument to various operations. However, in this particular method 
> the argument passed from the caller is named differently  ({{processor}}) and 
> thus does not overshadow the instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-16649) Http2SolrClient.processErrorsAndResponse uses wrong instance of ResponseParser

2023-02-07 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-16649:
---

 Summary: Http2SolrClient.processErrorsAndResponse uses wrong 
instance of ResponseParser
 Key: SOLR-16649
 URL: https://issues.apache.org/jira/browse/SOLR-16649
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: clients - java
Affects Versions: 9.1.1, main (10.0)
Reporter: Andrzej Bialecki


`Http2SolrClient:800` calls `wantStream(...)` method but passes the wrong 
argument to it - instead of passing the local `processor` arg it uses the 
instance field `parser`.

Throughout this class there's a repeated pattern that easily leads to this 
confusion - in many methods a local var `parser` is created that overshadows 
the instance field, and then this local `parser` is passed around as argument 
to various operations. However, in this method the argument passed from the 
caller is named differently  (`processor`) and thus does not overshadow the 
instance field, which leads to this mistake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15616) Allow thread metrics to be cached

2023-01-17 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677899#comment-17677899
 ] 

Andrzej Bialecki commented on SOLR-15616:
-

LGTM, thanks for seeing this through, Ishan!

One minor suggestion: since the interval is expressed in seconds (whereas often 
other intervals are expressed in millis) maybe we should use 
`threadIntervalSec` or something like that? I leave it up to you - the docs say 
it's in seconds but if it's in the name then it's self-explanatory.

> Allow thread metrics to be cached
> -
>
> Key: SOLR-15616
> URL: https://issues.apache.org/jira/browse/SOLR-15616
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15616-2.patch, SOLR-15616-9x.patch, 
> SOLR-15616.patch, SOLR-15616.patch
>
>
> Computing JVM metrics for threads can be expensive, and we should provide 
> option to users to avoid doing so on every call to the metrics API 
> (group=jvm).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16305) MODIFYCOLLECTION with 'property.*' changes can't change values used in config file variables (even though they can be set during collection CREATE)

2022-10-17 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618812#comment-17618812
 ] 

Andrzej Bialecki commented on SOLR-16305:
-

{quote}Should this Jira  have a linked "converse" issue: "CREATE collection 
with property.* doesn't set values in DocCollection (even though 
MODIFYCOLLECTION can cahnge them)" ?
{quote}
I think so.
{quote}WTF the {{COLLECTIONPROP}} command's purpose / expected usage is?
{quote}
AFAIK they are currently used only for maintaining routed aliases. We could 
extend it to cover a use case of "I want to maintain arbitrary props per 
collection" but then we would have to add the reading API and document it. And 
probably do some other work too, because this API is isolated from the main 
DocCollection model.

(For me one reason for ab-using DocCollection to keep properties was that 
there's currently no connection between props that you can set with 
COLLECTIONPROP and the replica placement API model, which purposely uses API 
disconnected from Solr internals. So if I want to mark some collection as 
having this or other replica placement properties, the 
SolrCollection.getCustomProperty ONLY returns props set in DocCollection and 
not those set with COLLECTIONPROP. Of course, I can always keep these special 
props in a config file specific to the placement plugin ... but this 
complicates the lifecycle of these properties as you create / delete 
collections, so keeping them in DocCollection is convenient).

> MODIFYCOLLECTION with 'property.*' changes can't change values used in config 
> file variables (even though they can be set during collection CREATE)
> ---
>
> Key: SOLR-16305
> URL: https://issues.apache.org/jira/browse/SOLR-16305
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-16305_test.patch
>
>
> Consider a configset with a  {{solrconfig.xml}} that includes a snippet like 
> this...
> {code:java}
> ${custom.prop:customDefVal}
> {code}
> ...this {{custom.prop}} can be set when doing a {{CREATE}} command for a 
> collection that uses this configset, using the {{property.*}} prefix as noted 
> in the reg-guide...
> {quote}{{property.{_}name{_}={_}value{_}}}
> |Optional|Default: none|
> Set core property _name_ to {_}value{_}. See the section [Core 
> Discovery|https://solr.apache.org/guide/solr/latest/configuration-guide/core-discovery.html]
>  for details on supported properties and values.
> {quote}
> ...BUT
> These values can *not* be changed by using the {{MODIFYCOLLECTION}} command, 
> in spite of the ref-guide indicating that it can be used to modify custom 
> {{property.*}} attributes...
> {quote}The attributes that can be modified are:
>  * {{replicationFactor}}
>  * {{collection.configName}}
>  * {{readOnly}}
>  * other custom properties that use a {{property.}} prefix
> See the [CREATE 
> action|https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create]
>  section above for details on these attributes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16305) MODIFYCOLLECTION with 'property.*' changes can't change values used in config file variables (even though they can be set during collection CREATE)

2022-10-13 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617242#comment-17617242
 ] 

Andrzej Bialecki commented on SOLR-16305:
-

{quote}I think you mean the exact opposite of what you just said?
{quote}
No, I meant that they cannot be set as DocCollection properties, they are 
silently skipped there (wihle they are indeed propagated to cores). If you want 
to set a DocCollection property you have to use MODIFYCOLLECTION, and while 
this works for setting `property.*` in DocCollection it indeed does not 
propagate these custom props to cores.

Whichever way you look at it it's a mess.

> MODIFYCOLLECTION with 'property.*' changes can't change values used in config 
> file variables (even though they can be set during collection CREATE)
> ---
>
> Key: SOLR-16305
> URL: https://issues.apache.org/jira/browse/SOLR-16305
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-16305_test.patch
>
>
> Consider a configset with a  {{solrconfig.xml}} that includes a snippet like 
> this...
> {code:java}
> ${custom.prop:customDefVal}
> {code}
> ...this {{custom.prop}} can be set when doing a {{CREATE}} command for a 
> collection that uses this configset, using the {{property.*}} prefix as noted 
> in the reg-guide...
> {quote}{{property.{_}name{_}={_}value{_}}}
> |Optional|Default: none|
> Set core property _name_ to {_}value{_}. See the section [Core 
> Discovery|https://solr.apache.org/guide/solr/latest/configuration-guide/core-discovery.html]
>  for details on supported properties and values.
> {quote}
> ...BUT
> These values can *not* be changed by using the {{MODIFYCOLLECTION}} command, 
> in spite of the ref-guide indicating that it can be used to modify custom 
> {{property.*}} attributes...
> {quote}The attributes that can be modified are:
>  * {{replicationFactor}}
>  * {{collection.configName}}
>  * {{readOnly}}
>  * other custom properties that use a {{property.}} prefix
> See the [CREATE 
> action|https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create]
>  section above for details on these attributes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16305) MODIFYCOLLECTION with 'property.*' changes can't change values used in config file variables (even though they can be set during collection CREATE)

2022-10-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615955#comment-17615955
 ] 

Andrzej Bialecki commented on SOLR-16305:
-

AFAIK the propagation of `property.*` values to cores is accidental, the 
original purpose (again, AFAIK) was to be able to set aux properties in the 
`state.json`, to keep additional per-collection state that could be used by 
other components. The advantage of this is that they would automatically appear 
in DocCollection at the API level (unlike COLLECTIONPROP API which is 
incomplete, because only the "write" part is supported but not "read", without 
going directly to ZK. AFAIK the COLLECTIONPROP was added because routed aliases 
needed some place to keep additional state, potentially too large / 
inconvenient to stick into state.json.)

However, even using these `property.*` values is half-broken, as I recently 
discovered - it's supported in MODIFYCOLLECTION but not in CREATE, due to 
`ClusterStateMutator.createCollection()` copying only the predefined properties 
and ignoring anything else.

This should be fixed in some way - I'm inclined to say in both ways ;) that is, 
the COLLECTIONPROP API should be completed so that it includes the reading 
part, and the CREATE should be fixed to accept `property.*`. And I don't see 
the purpose of propagating these collection-level props to individual cores, so 
this part could be removed until it's needed.

> MODIFYCOLLECTION with 'property.*' changes can't change values used in config 
> file variables (even though they can be set during collection CREATE)
> ---
>
> Key: SOLR-16305
> URL: https://issues.apache.org/jira/browse/SOLR-16305
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-16305_test.patch
>
>
> Consider a configset with a  {{solrconfig.xml}} that includes a snippet like 
> this...
> {code:java}
> ${custom.prop:customDefVal}
> {code}
> ...this {{custom.prop}} can be set when doing a {{CREATE}} command for a 
> collection that uses this configset, using the {{property.*}} prefix as noted 
> in the reg-guide...
> {quote}{{property.{_}name{_}={_}value{_}}}
> |Optional|Default: none|
> Set core property _name_ to {_}value{_}. See the section [Core 
> Discovery|https://solr.apache.org/guide/solr/latest/configuration-guide/core-discovery.html]
>  for details on supported properties and values.
> {quote}
> ...BUT
> These values can *not* be changed by using the {{MODIFYCOLLECTION}} command, 
> in spite of the ref-guide indicating that it can be used to modify custom 
> {{property.*}} attributes...
> {quote}The attributes that can be modified are:
>  * {{replicationFactor}}
>  * {{collection.configName}}
>  * {{readOnly}}
>  * other custom properties that use a {{property.}} prefix
> See the [CREATE 
> action|https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#create]
>  section above for details on these attributes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16073) totalTime metric should be milliseconds (not nano)

2022-03-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503586#comment-17503586
 ] 

Andrzej Bialecki commented on SOLR-16073:
-

Removing the conversion may have been a mistake, we should consistently report 
time intervals using the same units - currently we report the intervals inside 
histograms in milliseconds, and the elapsed times of Timers we report in 
nanoseconds.

Changing the units may have some back-compat consequences, not sure how to 
address them. Also, I can't say whether this metric is useful to be included by 
default in the exporter - generally speaking, since exporting the metrics via 
Prometheus exporter is a relatively heavyweight process IMHO we should attempt 
to cut down the number of exported metrics to a bare minimum (whatever that 
means ;) ).

> totalTime metric should be milliseconds (not nano)
> --
>
> Key: SOLR-16073
> URL: https://issues.apache.org/jira/browse/SOLR-16073
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: David Smiley
>Priority: Minor
>
> I observed that the "totalTime" metric has been a nanosecond number in recent 
> years, yet once upon a time it was milliseconds. This change was very likely 
> inadvertent. Our prometheus solr-exporter-config.xml shows that it thinks 
> it's milliseconds. It's not; RequestHandlerBase increments this counter by 
> "elapsed", the response of timer.stop() -- nanoseconds. Years ago it had 
> invoked {{MetricUtils.nsToMs}} but it appears [~ab] removed this as a part of 
> other changes in 2017 sometime -- 
> https://github.com/apache/solr/commit/d8df9f8c9963c2fc1718fd471316bf5d964125ba
> Also, I question the value/purpose of this metric.  Is it so useful that it 
> deserves to be among our relatively few metrics exported in our default 
> prometheus exporter config?  It's been there since the initial config but I 
> wonder why anyone wants it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-16013) Overseer gives up election node before closing - inflight commands can be processed twice

2022-02-16 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493279#comment-17493279
 ] 

Andrzej Bialecki edited comment on SOLR-16013 at 2/16/22, 3:30 PM:
---

Additionally, `OverseerElectionContext.close()` has this implementation:
{code:java}
@Override
public synchronized void close() {
  this.isClosed = true;
  overseer.close();
} {code}
So it marks itself as closed before the Overseer is closed, and I agree that it 
seems to me it should do it the other way around, and then simply check in 
`runLeaderProcess:76` if the Overseer is not closed.

Edit: I think the idea in `OverseerElectionContext` was to primarily avoid 
re-electing this Overseer and then wait until all its tasks are completed. But 
this allows other overseer to be elected and keep processing the in-flight 
tasks as new.


was (Author: ab):
Additionally, `OverseerElectionContext.close()` has this implementation:
{code:java}
@Override
public synchronized void close() {
  this.isClosed = true;
  overseer.close();
} {code}
So it marks itself as closed before the Overseer is closed, and I agree that it 
seems to me it should do it the other way around, and then simply check in 
`runLeaderProcess:76` if the Overseer is not closed.

> Overseer gives up election node before closing - inflight commands can be 
> processed twice
> -
>
> Key: SOLR-16013
> URL: https://issues.apache.org/jira/browse/SOLR-16013
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> {{ZkController}} shutdown currently has these two lines (in this order)...
> {code:java}
> customThreadPool.submit(() -> 
> IOUtils.closeQuietly(overseerElector.getContext()));
> customThreadPool.submit(() -> IOUtils.closeQuietly(overseer));
> {code}
> AFAICT this means that means that the overseer nodeX will give up it's 
> election node (via overseerElector) allowing some other nodeY to be elected a 
> new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object, 
> which waits for the {{OverseerThread}} to finish processing any tasks in 
> process.
> In practice, this seems to make it possible for a single command in the 
> overseer queue to get processed twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-16013) Overseer gives up election node before closing - inflight commands can be processed twice

2022-02-16 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493279#comment-17493279
 ] 

Andrzej Bialecki commented on SOLR-16013:
-

Additionally, `OverseerElectionContext.close()` has this implementation:
{code:java}
@Override
public synchronized void close() {
  this.isClosed = true;
  overseer.close();
} {code}
So it marks itself as closed before the Overseer is closed, and I agree that it 
seems to me it should do it the other way around, and then simply check in 
`runLeaderProcess:76` if the Overseer is not closed.

> Overseer gives up election node before closing - inflight commands can be 
> processed twice
> -
>
> Key: SOLR-16013
> URL: https://issues.apache.org/jira/browse/SOLR-16013
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> {{ZkController}} shutdown currently has these two lines (in this order)...
> {code:java}
> customThreadPool.submit(() -> 
> IOUtils.closeQuietly(overseerElector.getContext()));
> customThreadPool.submit(() -> IOUtils.closeQuietly(overseer));
> {code}
> AFAICT this means that means that the overseer nodeX will give up it's 
> election node (via overseerElector) allowing some other nodeY to be elected a 
> new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object, 
> which waits for the {{OverseerThread}} to finish processing any tasks in 
> process.
> In practice, this seems to make it possible for a single command in the 
> overseer queue to get processed twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15858) ConfigSetsHandler requires DIR entries in the uploaded ZIPs

2021-12-20 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15858:

Description: 
If you try uploading a configset zip that contains resources in sub-folders - 
but doesn't contain explicit DIR entries in the zip file - the upload will fail 
with {{{}NoNodeException{}}}.

This is caused by {{ConfigSetsHandler.createZkNodeIfNotExistsAndSetData}} which 
assumes the entry path doesn't contain sub-path elements. If the corresponding 
DIR entries are present (and they occur earlier in the zip than their child 
resource entries!) the handler will work properly because it recognizes DIR 
entries and creates ZK paths as needed.

The fix would be to always check for the presence of `/` characters in the 
entry name and make sure the ZK path already exists.

  was:
If you try uploading a configset zip that contains resources in sub-folders - 
but doesn't contain explicit DIR entries in the zip file - the upload will fail 
with `NoNodeException`.

This is caused by `ConfigSetsHandler.createZkNodeIfNotExistsAndSetData` which 
assumes the entry path doesn't contain sub-path elements. If the corresponding 
DIR entries are present (and they occur earlier in the zip than their child 
resource entries!) the handler will work properly because it recognizes DIR 
entries and creates ZK paths as needed.

The fix would be to always check for the presence of `/` characters in the 
entry name and make sure the ZK path already exists.


> ConfigSetsHandler requires DIR entries in the uploaded ZIPs
> ---
>
> Key: SOLR-15858
> URL: https://issues.apache.org/jira/browse/SOLR-15858
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: configset-api
>Affects Versions: 8.11.1
>Reporter: Andrzej Bialecki
>Priority: Major
>
> If you try uploading a configset zip that contains resources in sub-folders - 
> but doesn't contain explicit DIR entries in the zip file - the upload will 
> fail with {{{}NoNodeException{}}}.
> This is caused by {{ConfigSetsHandler.createZkNodeIfNotExistsAndSetData}} 
> which assumes the entry path doesn't contain sub-path elements. If the 
> corresponding DIR entries are present (and they occur earlier in the zip than 
> their child resource entries!) the handler will work properly because it 
> recognizes DIR entries and creates ZK paths as needed.
> The fix would be to always check for the presence of `/` characters in the 
> entry name and make sure the ZK path already exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15858) ConfigSetsHandler requires DIR entries in the uploaded ZIPs

2021-12-20 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15858:
---

 Summary: ConfigSetsHandler requires DIR entries in the uploaded 
ZIPs
 Key: SOLR-15858
 URL: https://issues.apache.org/jira/browse/SOLR-15858
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: configset-api
Affects Versions: 8.11.1
Reporter: Andrzej Bialecki


If you try uploading a configset zip that contains resources in sub-folders - 
but doesn't contain explicit DIR entries in the zip file - the upload will fail 
with `NoNodeException`.

This is caused by `ConfigSetsHandler.createZkNodeIfNotExistsAndSetData` which 
assumes the entry path doesn't contain sub-path elements. If the corresponding 
DIR entries are present (and they occur earlier in the zip than their child 
resource entries!) the handler will work properly because it recognizes DIR 
entries and creates ZK paths as needed.

The fix would be to always check for the presence of `/` characters in the 
entry name and make sure the ZK path already exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-10-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15572.
-
Resolution: Fixed

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0), 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-10-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15572:

Fix Version/s: main (9.0)

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: main (9.0), 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15708) Add ConfigSetAdminRequest.Upload

2021-10-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432646#comment-17432646
 ] 

Andrzej Bialecki commented on SOLR-15708:
-

This looks good overall. A couple minor nits:
 * there's {{ModifiableParams.setNonNull}} that we can use instead of explicit 
if-s, makes the code cleaner.
 * I prefer using {{something == null}} instead of {{null == something}} as it 
feels more natural to me, and I think this is also prevalent in Solr code.

> Add ConfigSetAdminRequest.Upload
> 
>
> Key: SOLR-15708
> URL: https://issues.apache.org/jira/browse/SOLR-15708
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15708.patch
>
>
> There's currently no SolrJ support for the {{/admin/configs?action=UPLOAD}} 
> API
> The workaround is to use a {{ContentStreamUpdateRequest}} and set the 
> appropriate path and params



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15701) ConfigSet SolrJ API is incomplete

2021-10-20 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431268#comment-17431268
 ] 

Andrzej Bialecki commented on SOLR-15701:
-

Well, what would make most sense in 9.0 would be to re-purpose CREATE so that 
it actually means CREATE - ie. it corresponds to the user's action of uploading 
a new configset to Solr. But that may be too confusing, so COPY and UPDATE 
feels somewhat less disruptive...

Deprecation should go into 8.11, together with the new COPY as a simple alias 
to CREATE, and I think the UPDATE should go both to 9.0 and 8.11.

> ConfigSet SolrJ API is incomplete
> -
>
> Key: SOLR-15701
> URL: https://issues.apache.org/jira/browse/SOLR-15701
> Project: Solr
>  Issue Type: Bug
>  Components: configset-api
>Affects Versions: 8.10
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> SolrJ ConfigSet API allows users to manage configsets but it's curiously 
> incomplete (and somewhat misleading):
>  * CREATE action doesn't actually allow users to create a wholly new 
> configset, it only allows to copy one of the existing configsets under a new 
> name. It should really be renamed to COPY.
>  * there's no support for UPLOAD action in the SolrJ API, which is a strange 
> omission - there's support for every other action but there's no (easy) way 
> to upload a new configset. {{ConfigSetHandler}} supports this action so the 
> problem lies only in the missing client classes.
> I'll work on adding support for UPLOAD, but I'm not sure what to do with the 
> mis-named CREATE - I'd love to rename it but this would cause back-compat 
> issues. Maybe deprecate it, rename it only in 9.0 and keep supporting CREATE 
> -> COPY until 9.1 release?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15701) ConfigSet SolrJ API is incomplete

2021-10-19 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15701:
---

 Summary: ConfigSet SolrJ API is incomplete
 Key: SOLR-15701
 URL: https://issues.apache.org/jira/browse/SOLR-15701
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: configset-api
Affects Versions: 8.10
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


SolrJ ConfigSet API allows users to manage configsets but it's curiously 
incomplete (and somewhat misleading):
 * CREATE action doesn't actually allow users to create a wholly new configset, 
it only allows to copy one of the existing configsets under a new name. It 
should really be renamed to COPY.
 * there's no support for UPLOAD action in the SolrJ API, which is a strange 
omission - there's support for every other action but there's no (easy) way to 
upload a new configset. {{ConfigSetHandler}} supports this action so the 
problem lies only in the missing client classes.

I'll work on adding support for UPLOAD, but I'm not sure what to do with the 
mis-named CREATE - I'd love to rename it but this would cause back-compat 
issues. Maybe deprecate it, rename it only in 9.0 and keep supporting CREATE -> 
COPY until 9.1 release?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15301) Eliminate repetitive index size calculation for Solr metrics

2021-10-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423905#comment-17423905
 ] 

Andrzej Bialecki commented on SOLR-15301:
-

We should not use two different implementations that try to address the same 
problem. In this particular case we could either add a utility method in 
{{SolrMetricsContext.cachedGauge}} to wrap gauges that are expensive (although 
here this would still cause multiple calculations when the value expires), or 
add a utility method to {{SolrCore}} to create a single instance of this gauge 
and register this instance directly in these three places.

> Eliminate repetitive index size calculation for Solr metrics
> 
>
> Key: SOLR-15301
> URL: https://issues.apache.org/jira/browse/SOLR-15301
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andras Salamon
>Assignee: Andras Salamon
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During metrics calculation Solr asks for core indexSize three times. Twice in 
> SolrCore and once in ReplicationHandler. It slows down metrics calculation 
> and it is also possible that these three reported values are not exactly the 
> same if size changes during calculation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15301) Eliminate repetitive index size calculation for Solr metrics

2021-10-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423896#comment-17423896
 ] 

Andrzej Bialecki commented on SOLR-15301:
-

I agree, using {{CachedGauge}} is the optimal solution here. This Jira issue 
can address just the index size metric, but we should audit other metrics and 
wrap them in a CachedGauge-s too, anywhere where the computation is anything 
other than trivial.

> Eliminate repetitive index size calculation for Solr metrics
> 
>
> Key: SOLR-15301
> URL: https://issues.apache.org/jira/browse/SOLR-15301
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andras Salamon
>Assignee: Andras Salamon
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During metrics calculation Solr asks for core indexSize three times. Twice in 
> SolrCore and once in ReplicationHandler. It slows down metrics calculation 
> and it is also possible that these three reported values are not exactly the 
> same if size changes during calculation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-09-29 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422030#comment-17422030
 ] 

Andrzej Bialecki commented on SOLR-15572:
-

[~thelabdude] this actually ended up in 8.10. I found the reason for test 
failures, it's a problem with the test (incorrect parsing when spaces are 
present in the metrics' item), I'll fix it shortly.

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-09-28 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15572:

Fix Version/s: 8.10

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-09-08 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15572:

Fix Version/s: 8.10

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15616) Allow thread metrics to be cached

2021-09-06 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410478#comment-17410478
 ] 

Andrzej Bialecki commented on SOLR-15616:
-

The XML config changes are ok, but the way it is passed to MetricConfig bothers 
me ... it adds a very specific arg to the constructor, which is (presumably) 
only one of a potentially larger class of "caching" params we may want to add 
later. Are we going to keep extending the list of args as we add the caching of 
other gauge types?

I would probably model the caching parameters as a separate bean, or at least a 
Map.

> Allow thread metrics to be cached
> -
>
> Key: SOLR-15616
> URL: https://issues.apache.org/jira/browse/SOLR-15616
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15616.patch, SOLR-15616.patch
>
>
> Computing JVM metrics for threads can be expensive, and we should provide 
> option to users to avoid doing so on every call to the metrics API 
> (group=jvm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-15616) Allow thread metrics to be cached

2021-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408710#comment-17408710
 ] 

Andrzej Bialecki edited comment on SOLR-15616 at 9/2/21, 9:39 AM:
--

Maybe something like this?
{code:java}


  
2
  
..


{code}

Adding more and more attributes to the main element would eventually look very 
ugly.


was (Author: ab):
Maybe something like this?
{code}


  
2
  
..


{code}

> Allow thread metrics to be cached
> -
>
> Key: SOLR-15616
> URL: https://issues.apache.org/jira/browse/SOLR-15616
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15616.patch
>
>
> Computing JVM metrics for threads can be expensive, and we should provide 
> option to users to avoid doing so on every call to the metrics API 
> (group=jvm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15616) Allow thread metrics to be cached

2021-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408710#comment-17408710
 ] 

Andrzej Bialecki commented on SOLR-15616:
-

Maybe something like this?
{code}


  
2
  
..


{code}

> Allow thread metrics to be cached
> -
>
> Key: SOLR-15616
> URL: https://issues.apache.org/jira/browse/SOLR-15616
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15616.patch
>
>
> Computing JVM metrics for threads can be expensive, and we should provide 
> option to users to avoid doing so on every call to the metrics API 
> (group=jvm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15616) Allow thread metrics to be cached

2021-09-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408686#comment-17408686
 ] 

Andrzej Bialecki commented on SOLR-15616:
-

This makes sense, and the patch looks good - however, I'm not sure about the 
configuration mechanism...

I know that the story about Solr configuration is messy and contentious, and we 
have several competing mechanisms for configuring various subsystems. For 
better or worse, for metrics the main config source is in 
{{solr.xml:/solr/metrics}}. I don't feel comfortable adding another config 
mechanism via sysprops.

I'm of a split mind here, but I think I would prefer to put this as a property 
somewhere in {{solr.xml:/solr/metrics}}, and it still can be parameterized by 
sysprops using the existing var substitution mechanism.

> Allow thread metrics to be cached
> -
>
> Key: SOLR-15616
> URL: https://issues.apache.org/jira/browse/SOLR-15616
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15616.patch
>
>
> Computing JVM metrics for threads can be expensive, and we should provide 
> option to users to avoid doing so on every call to the metrics API 
> (group=jvm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-12628) Support multiple ranges in a slice

2021-08-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403246#comment-17403246
 ] 

Andrzej Bialecki commented on SOLR-12628:
-

Correct. However ... after dealing with {{SplitShardCmd}} and it's convoluted 
logic and error handling - I think the merge Cmd would introduce a similar 
level of complexity and fragility. So yeah, the routing logic could remain the 
same but the command itself would still be complex to write / test / debug & 
harden so that it's useful in production.

> Support multiple ranges in a slice
> --
>
> Key: SOLR-12628
> URL: https://issues.apache.org/jira/browse/SOLR-12628
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Currently {{Slice}} supports only a single {{Range}}. In order to effectively 
> merge shards with non-adjacent ranges we need to support multiple ranges per 
> slice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-12628) Support multiple ranges in a slice

2021-08-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403246#comment-17403246
 ] 

Andrzej Bialecki edited comment on SOLR-12628 at 8/23/21, 3:38 PM:
---

Correct. However ... after dealing with {{SplitShardCmd}} and its convoluted 
logic and error handling - I think the merge Cmd would introduce a similar 
level of complexity and fragility. So yeah, the routing logic could remain the 
same but the command itself would still be complex to write / test / debug & 
harden so that it's useful in production.


was (Author: ab):
Correct. However ... after dealing with {{SplitShardCmd}} and it's convoluted 
logic and error handling - I think the merge Cmd would introduce a similar 
level of complexity and fragility. So yeah, the routing logic could remain the 
same but the command itself would still be complex to write / test / debug & 
harden so that it's useful in production.

> Support multiple ranges in a slice
> --
>
> Key: SOLR-12628
> URL: https://issues.apache.org/jira/browse/SOLR-12628
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Currently {{Slice}} supports only a single {{Range}}. In order to effectively 
> merge shards with non-adjacent ranges we need to support multiple ranges per 
> slice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-9407) New collection action MERGESHARDS

2021-08-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403152#comment-17403152
 ] 

Andrzej Bialecki commented on SOLR-9407:


[~noble.paul] I don't think there's no longer any strong motivation to pursue 
this, perhaps you should close this as Won't Do?

> New collection action MERGESHARDS
> -
>
> Key: SOLR-9407
> URL: https://issues.apache.org/jira/browse/SOLR-9407
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Fix For: 6.1
>
>
> This would join 2 or more shards to form a single shard
> command parameters
> * collection: collection name
> * shard : multivalued parameters {{shard=shard2&shard=shard3}} etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-12628) Support multiple ranges in a slice

2021-08-23 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-12628.
-
  Assignee: Andrzej Bialecki
Resolution: Won't Do

Initial investigation shows that this would introduce a lot of complexity in 
the routing and shard handling - it seems that for re-sharding it's easier to 
simply re-index from scratch.

> Support multiple ranges in a slice
> --
>
> Key: SOLR-12628
> URL: https://issues.apache.org/jira/browse/SOLR-12628
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Currently {{Slice}} supports only a single {{Range}}. In order to effectively 
> merge shards with non-adjacent ranges we need to support multiple ranges per 
> slice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-23 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403122#comment-17403122
 ] 

Andrzej Bialecki commented on SOLR-15572:
-

This is a list of tight selection expressions that tries to strike a balance 
between minimum amount of data exported and maximum usefulness:
{code:xml}



solr\.jetty:.*\.DefaultHandler\.(dispatches|.*-requests|.*xx-responses):count

solr\.jvm:(buffers|gc).*
solr\.jvm:memory\.(heap|non-heap|pools)\.*\.usage
solr\.jvm:memory\.total
solr\.jvm:os\..*(FileDescriptorCount|Load.*|Size|processCpuTime)
solr\.jvm:threads\..*count

solr\.node:CONTAINER\.(cores|fs).*

solr\.core\..*:CORE\.fs\..*Space
solr\.core\..*:INDEX\.sizeInBytes
solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\..*requestTimes:(count|1minRate|5minRate|median_ms|meanRate|p75_ms|p95_ms|p99_ms)
solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\.totalTime
solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\..*rrors:(count!1minRate)
solr\.core\..*:SEARCHER\.searcher\..*Doc.*
solr\.core\..*:UPDATE\.updateHandler\..*
solr\core\..*:CACHE\..*
{code}

This list completely removes the core ADMIN, REPLICATION and HIGHLIGHTER 
metrics, and selects only the most common QUERY handlers.

I think that for the best back-compat experience we should use the list I 
included in the previous comment, but I'd like to put this optimized list 
somewhere, too. Any suggestions?

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15572:

Attachment: SOLR-15572.patch

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15572:

Fix Version/s: 8.10

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15572.patch
>
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15577) Avoid ClusterState.getCollectionsMap() when not necessary

2021-08-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15577:

Summary: Avoid ClusterState.getCollectionsMap() when not necessary  (was: 
Avoid {{ClusterState.getCollectionsMap()}} when not necessary)

> Avoid ClusterState.getCollectionsMap() when not necessary
> -
>
> Key: SOLR-15577
> URL: https://issues.apache.org/jira/browse/SOLR-15577
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> I noticed that in a few places (eg. CollectionsHandler LIST_OP, 
> ClusterStatus) when a simple list of available collections is needed the code 
> invokes {{clusterState.getCollectionsMap()}}, which has to retrieve all 
> collections information into memory. This should be replaced with something 
> like {{clusterState.getCollectionStates().keySet()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15577) Avoid ClusterState.getCollectionsMap() when not necessary

2021-08-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15577:

Fix Version/s: 8.10

> Avoid ClusterState.getCollectionsMap() when not necessary
> -
>
> Key: SOLR-15577
> URL: https://issues.apache.org/jira/browse/SOLR-15577
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
>
> I noticed that in a few places (eg. CollectionsHandler LIST_OP, 
> ClusterStatus) when a simple list of available collections is needed the code 
> invokes {{clusterState.getCollectionsMap()}}, which has to retrieve all 
> collections information into memory. This should be replaced with something 
> like {{clusterState.getCollectionStates().keySet()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15577) Avoid {{ClusterState.getCollectionsMap()}} when not necessary

2021-08-04 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15577:
---

 Summary: Avoid {{ClusterState.getCollectionsMap()}} when not 
necessary
 Key: SOLR-15577
 URL: https://issues.apache.org/jira/browse/SOLR-15577
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


I noticed that in a few places (eg. CollectionsHandler LIST_OP, ClusterStatus) 
when a simple list of available collections is needed the code invokes 
{{clusterState.getCollectionsMap()}}, which has to retrieve all collections 
information into memory. This should be replaced with something like 
{{clusterState.getCollectionStates().keySet()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-03 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392436#comment-17392436
 ] 

Andrzej Bialecki edited comment on SOLR-15572 at 8/3/21, 5:34 PM:
--

These are the equivalent expressions that select the metrics that the default 
config knows how to process:
{code:xml}
solr\.jetty:.*DefaultHandler.*
solr\.jvm:.*
solr\.node:.*
solr\.overseer:.*
solr\.core\..*:QUERY\..*
solr\.core\..*:ADMIN\..*
solr\.core\..*:CACHE\..*
solr\.core\..*:UPDATE\.updateHandler\..*
solr\.core\..*:CORE\.fs\..*
solr\.core\..*:HIGHLIGHTER\..*
solr\.core\..*:INDEX\..*
solr\.core\..*:REPLICATION\.replication\..*
solr\.core\..*:SEARCHER\.searcher\..*
{code}
These metrics populate various panels in the default Grafana dashboard, but 
IMHO they are overkill for routine monitoring - OTOH they may be handy if you 
want to collect the history of these metrics for post-mortem analysis.

I'm open to suggestions how to modify this list - this change at least avoids 
exporting exotic metrics such as TLOG, and makes it much easier to trim down 
those that are not needed.

 


was (Author: ab):
These are the equivalent expressions that select the metrics that the default 
config knows how to process:
{code:xml}
solr\.jetty:.*DefaultHandler.*
solr\.jvm:.*
solr\.overseer:.*
solr\.core\..*:QUERY\..*
solr\.core\..*:ADMIN\..*
solr\.core\..*:CACHE\..*
solr\.core\..*:UPDATE\.updateHandler\..*
solr\.core\..*:CORE\.fs\..*
solr\.core\..*:HIGHLIGHTER\..*
solr\.core\..*:INDEX\..*
solr\.core\..*:REPLICATION\.replication\..*
solr\.core\..*:SEARCHER\.searcher\..*
{code}
These metrics populate various panels in the default Grafana dashboard, but 
IMHO they are overkill for routine monitoring - OTOH they may be handy if you 
want to collect the history of these metrics for post-mortem analysis.

I'm open to suggestions how to modify this list - this change at least avoids 
exporting exotic metrics such as TLOG, and makes it much easier to trim down 
those that are not needed.

 

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-03 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392436#comment-17392436
 ] 

Andrzej Bialecki commented on SOLR-15572:
-

These are the equivalent expressions that select the metrics that the default 
config knows how to process:
{code:xml}
solr\.jetty:.*DefaultHandler.*
solr\.jvm:.*
solr\.overseer:.*
solr\.core\..*:QUERY\..*
solr\.core\..*:ADMIN\..*
solr\.core\..*:CACHE\..*
solr\.core\..*:UPDATE\.updateHandler\..*
solr\.core\..*:CORE\.fs\..*
solr\.core\..*:HIGHLIGHTER\..*
solr\.core\..*:INDEX\..*
solr\.core\..*:REPLICATION\.replication\..*
solr\.core\..*:SEARCHER\.searcher\..*
{code}
These metrics populate various panels in the default Grafana dashboard, but 
IMHO they are overkill for routine monitoring - OTOH they may be handy if you 
want to collect the history of these metrics for post-mortem analysis.

I'm open to suggestions how to modify this list - this change at least avoids 
exporting exotic metrics such as TLOG, and makes it much easier to trim down 
those that are not needed.

 

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Reopened] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reopened SOLR-15572:
-

Mistakenly resolved as duplicate, re-opening.

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-14506) COLSTATUS Null Pointer Exception

2021-08-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14506.
-
Resolution: Fixed

> COLSTATUS Null Pointer Exception
> 
>
> Key: SOLR-14506
> URL: https://issues.apache.org/jira/browse/SOLR-14506
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI, JSON Request API, Schema and Analysis
>Affects Versions: 8.5, 8.5.1
> Environment: *"Incidents" collection setup* 
>  
>  "incidents": {
> "stateFormat": 2,
> "znodeVersion": 5,
> "properties": {
>   "autoAddReplicas": "false",
>   "maxShardsPerNode": "-1",
>   "nrtReplicas": "1",
>   "pullReplicas": "0",
>   "replicationFactor": "1",
>   "router": {
> "field": "slug",
> "name": "implicit"
>   },
>   "tlogReplicas": "0"
> },
> "activeShards": 1,
> "inactiveShards": 0
>   },
>Reporter: Austin Weidler
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 8.10
>
>
> When querying for collection status, a null pointer exception is returned. I 
> believe it is caused by the use of "implicit" routing for the shards and the 
> Admin Handler trying to set the "Range" attribute of a shard (when one 
> doesn't exist).
> {code:java}
> // org.apache.solr.handler.admin.ColStatus.getColStatus(ColStatus.java:152)
> sliceMap.add("range", s.getRange().toString());
> {code}
> I believe "getRange()" is NULL since implicit routing is used.
>  
> {code:java}
> "trace": "java.lang.NullPointerException\n\tat 
> org.apache.solr.handler.admin.ColStatus.getColStatus(ColStatus.java:152)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$1(CollectionsHandler.java:547)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:1326)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:266)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:254)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:842)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:808)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:559)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>  
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
>  org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat 
> o

[jira] [Resolved] (SOLR-15565) Optimize CLUSTERSTATUS when filtering by collection

2021-08-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15565.
-
Resolution: Invalid

It turns out this is already the case, ie. when {{collection}} param is used 
then only this collection is retrieved from {{ClusterState}}.

> Optimize CLUSTERSTATUS when filtering by collection
> ---
>
> Key: SOLR-15565
> URL: https://issues.apache.org/jira/browse/SOLR-15565
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> When user specifies {{collection}} parameter in the {{CLUSTERSTATUS}} command 
> one would expect that Solr retrieves and materializes only that collection 
> state. Unfortunately, {{ClusterStatus.getClusterStatus}} retrieves the 
> complete map, which in turn materializes all collection statuses. On large 
> clusters with many large collections this is wasteful.
> In this case, when user requests state information for some collection only, 
> we should iterate just on the collection names and materialize only these 
> collections' information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-08-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15564.
-
Resolution: Fixed

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15574) NPE in ColStatus when collection uses routing other than hashing

2021-08-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15574.
-
Resolution: Duplicate

> NPE in ColStatus when collection uses routing other than hashing
> 
>
> Key: SOLR-15574
> URL: https://issues.apache.org/jira/browse/SOLR-15574
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.9
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> The code in {{ColStatus:152}} wrongly assumes the slice range is always 
> present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-08-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15564:

Attachment: (was: SOLR-15564.patch)

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-08-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15564:

Attachment: SOLR-15564.patch

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-14506) COLSTATUS Null Pointer Exception

2021-08-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14506:

Fix Version/s: 8.10

> COLSTATUS Null Pointer Exception
> 
>
> Key: SOLR-14506
> URL: https://issues.apache.org/jira/browse/SOLR-14506
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI, JSON Request API, Schema and Analysis
>Affects Versions: 8.5, 8.5.1
> Environment: *"Incidents" collection setup* 
>  
>  "incidents": {
> "stateFormat": 2,
> "znodeVersion": 5,
> "properties": {
>   "autoAddReplicas": "false",
>   "maxShardsPerNode": "-1",
>   "nrtReplicas": "1",
>   "pullReplicas": "0",
>   "replicationFactor": "1",
>   "router": {
> "field": "slug",
> "name": "implicit"
>   },
>   "tlogReplicas": "0"
> },
> "activeShards": 1,
> "inactiveShards": 0
>   },
>Reporter: Austin Weidler
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 8.10
>
>
> When querying for collection status, a null pointer exception is returned. I 
> believe it is caused by the use of "implicit" routing for the shards and the 
> Admin Handler trying to set the "Range" attribute of a shard (when one 
> doesn't exist).
> {code:java}
> // org.apache.solr.handler.admin.ColStatus.getColStatus(ColStatus.java:152)
> sliceMap.add("range", s.getRange().toString());
> {code}
> I believe "getRange()" is NULL since implicit routing is used.
>  
> {code:java}
> "trace": "java.lang.NullPointerException\n\tat 
> org.apache.solr.handler.admin.ColStatus.getColStatus(ColStatus.java:152)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$1(CollectionsHandler.java:547)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:1326)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:266)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:254)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:842)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:808)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:559)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>  
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
>  org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat 
> o

[jira] [Assigned] (SOLR-14506) COLSTATUS Null Pointer Exception

2021-08-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-14506:
---

Assignee: Andrzej Bialecki

> COLSTATUS Null Pointer Exception
> 
>
> Key: SOLR-14506
> URL: https://issues.apache.org/jira/browse/SOLR-14506
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI, JSON Request API, Schema and Analysis
>Affects Versions: 8.5, 8.5.1
> Environment: *"Incidents" collection setup* 
>  
>  "incidents": {
> "stateFormat": 2,
> "znodeVersion": 5,
> "properties": {
>   "autoAddReplicas": "false",
>   "maxShardsPerNode": "-1",
>   "nrtReplicas": "1",
>   "pullReplicas": "0",
>   "replicationFactor": "1",
>   "router": {
> "field": "slug",
> "name": "implicit"
>   },
>   "tlogReplicas": "0"
> },
> "activeShards": 1,
> "inactiveShards": 0
>   },
>Reporter: Austin Weidler
>Assignee: Andrzej Bialecki
>Priority: Critical
>
> When querying for collection status, a null pointer exception is returned. I 
> believe it is caused by the use of "implicit" routing for the shards and the 
> Admin Handler trying to set the "Range" attribute of a shard (when one 
> doesn't exist).
> {code:java}
> // org.apache.solr.handler.admin.ColStatus.getColStatus(ColStatus.java:152)
> sliceMap.add("range", s.getRange().toString());
> {code}
> I believe "getRange()" is NULL since implicit routing is used.
>  
> {code:java}
> "trace": "java.lang.NullPointerException\n\tat 
> org.apache.solr.handler.admin.ColStatus.getColStatus(ColStatus.java:152)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$1(CollectionsHandler.java:547)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:1326)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:266)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:254)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:842)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:808)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:559)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>  
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
>  org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat 
> org.eclipse.jetty

[jira] [Resolved] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-08-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15572.
-
Resolution: Duplicate

> Improve the default Prometheus exporter config performance
> --
>
> Key: SOLR-15572
> URL: https://issues.apache.org/jira/browse/SOLR-15572
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
> efficiently pick only the metrics we are interested in. This should 
> drastically reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15574) NPE in ColStatus when collection uses routing other than hashing

2021-08-02 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15574:
---

 Summary: NPE in ColStatus when collection uses routing other than 
hashing
 Key: SOLR-15574
 URL: https://issues.apache.org/jira/browse/SOLR-15574
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.9
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


The code in {{ColStatus:152}} wrongly assumes the slice range is always present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15572) Improve the default Prometheus exporter config performance

2021-07-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15572:
---

 Summary: Improve the default Prometheus exporter config performance
 Key: SOLR-15572
 URL: https://issues.apache.org/jira/browse/SOLR-15572
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - prometheus-exporter
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


Follow-up to SOLR-15564. Using {{expr}} parameters it's possible to very 
efficiently pick only the metrics we are interested in. This should drastically 
reduce the load on Solr nodes and on the exporter process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-07-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15564:

Fix Version/s: 8.10

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-07-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15564:

Component/s: metrics

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.10
>
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-07-29 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389951#comment-17389951
 ] 

Andrzej Bialecki commented on SOLR-15564:
-

This patch adds support for  {{expr}} filtering, which is essentially the same 
as the {{key}} but it supports regex expressions in each part of the key.

For example, where previously it was basically impossible to select just a 
subset of per-core metrics without knowing the core names now you can do it as 
follows:
{code:java}
http://localhost:8983/solr/admin/metrics?expr=solr\.core\..*:QUERY\..*\.requestTimes:.*Rate
 {code}
This expression selects metrics from any replica located on the node, and 
returns only the 1-, 5, 15-min request rates for QUERY handlers.

This implementation is fully back-compatible because it uses a different 
parameter name than the existing ones. Similarly to the {{key}} parameter, when 
{{expr}} is used any other filtering parameters are ignored.

If there are no objections I'll commit this shortly. I'm also planning to 
back-port this to 8x.

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-15564) Improve filtering expressions in /admin/metrics

2021-07-29 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15564:

Attachment: SOLR-15564.patch

> Improve filtering expressions in /admin/metrics
> ---
>
> Key: SOLR-15564
> URL: https://issues.apache.org/jira/browse/SOLR-15564
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-15564.patch
>
>
> There are currently several ways to filter what metrics should be returned by 
> /admin/metrics but they have one important limitation: all types of filtering 
> criteria (group, type, prefix, regex, key) use implicit OR, so it's only ever 
> possible to broaden the filters but not to narrow them down.
> This issue came up while I was reviewing the default Prometheus exporter 
> config and I noticed that it pulls ALL metrics. This is extremely wasteful 
> and puts unnecessary load on the nodes, especially since only some of the 
> metrics are then used, either by the exporter itself or by the default 
> dashboard. In addition to that, the exporter needs to build an object tree 
> from all these metrics in order to apply the export rules, which leads to 
> excessive memory / cpu consumption in the reporter process.
> We should come up with a way to make these filters more expressive so that 
> it's possible e.g. to select only some metrics from a particular registry.
> The simplest way to implement this would be to extend the syntax of the 
> {{key}} parameter to support regex expressions in the parts of the key that 
> specify the metric name and the property name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-15565) Optimize CLUSTERSTATUS when filtering by collection

2021-07-26 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15565:
---

 Summary: Optimize CLUSTERSTATUS when filtering by collection
 Key: SOLR-15565
 URL: https://issues.apache.org/jira/browse/SOLR-15565
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


When user specifies {{collection}} parameter in the {{CLUSTERSTATUS}} command 
one would expect that Solr retrieves and materializes only that collection 
state. Unfortunately, {{ClusterStatus.getClusterStatus}} retrieves the complete 
map, which in turn materializes all collection statuses. On large clusters with 
many large collections this is wasteful.

In this case, when user requests state information for some collection only, we 
should iterate just on the collection names and materialize only these 
collections' information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



  1   2   >