[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866531#comment-17866531
 ] 

ASF subversion and git services commented on SOLR-17160:


Commit 338fbdacbecaba76741fe5f8b755e2374f88d8a0 in solr's branch 
refs/heads/backport_SOLR-16842_to_branch_9x from Pierre Salagnac
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=338fbdacbec ]

SOLR-17160: Core admin async ID status, 10k limit and time expire (#2304)

Core Admin "async" request status tracking is no longer capped at 100; it's 10k.
Statuses are now removed 5 minutes after the read of a completed/failed status.
Helps collection async backup/restore and other operations scale to 100+ shards.

Co-authored-by: David Smiley 
(cherry picked from commit d3b4c2e1ae39b8ecc5428798531f8b7cf723d787)


> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866499#comment-17866499
 ] 

ASF subversion and git services commented on SOLR-17160:


Commit 338fbdacbecaba76741fe5f8b755e2374f88d8a0 in solr's branch 
refs/heads/branch_9x from Pierre Salagnac
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=338fbdacbec ]

SOLR-17160: Core admin async ID status, 10k limit and time expire (#2304)

Core Admin "async" request status tracking is no longer capped at 100; it's 10k.
Statuses are now removed 5 minutes after the read of a completed/failed status.
Helps collection async backup/restore and other operations scale to 100+ shards.

Co-authored-by: David Smiley 
(cherry picked from commit d3b4c2e1ae39b8ecc5428798531f8b7cf723d787)


> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-07-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866175#comment-17866175
 ] 

ASF subversion and git services commented on SOLR-17160:


Commit d3b4c2e1ae39b8ecc5428798531f8b7cf723d787 in solr's branch 
refs/heads/main from Pierre Salagnac
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=d3b4c2e1ae3 ]

SOLR-17160: Core admin async ID status, 10k limit and time expire (#2304)

Core Admin "async" request status tracking is no longer capped at 100; it's 10k.
Statuses are now removed 5 minutes after the read of a completed/failed status.
Helps collection async backup/restore and other operations scale to 100+ shards.

Co-authored-by: David Smiley 

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-07-08 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864007#comment-17864007
 ] 

David Smiley commented on SOLR-17160:
-

FYI I plan to merge the linked PR soon with the following CHANGES.txt summary:
bq. Core Admin "async" request status tracking is no longer capped at 100; it's 
10k.  Statuses are now removed 5 minutes after the read of a completed/failed 
status.   

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-05-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842732#comment-17842732
 ] 

David Smiley commented on SOLR-17160:
-

BTW I think Denial of Service should be a non-concern for admin APIs.

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-05-01 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842698#comment-17842698
 ] 

Jason Gerlowski commented on SOLR-17160:


bq. 100 is clearly way too low.

+1...to tweaking the "maxItems".  A value like 500 (or even 1000) makes it much 
much less likely that IDs will "disappear" in the normal case, and still 
provides some protection against DOS or memory leak concerns.

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-03-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822738#comment-17822738
 ] 

David Smiley commented on SOLR-17160:
-

Even at the top level (ZK), the async IDs are capped at 10K before getting 
purged -- SizeLimitedDistributedMap with 
org.apache.solr.cloud.Overseer#NUM_RESPONSES_TO_STORE (10K).

100 is clearly way too low.

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-02-29 Thread Pierre Salagnac (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822029#comment-17822029
 ] 

Pierre Salagnac commented on SOLR-17160:


I added a max limit at 10K tracked requests in the pull requests.
Please let me know if you still have concerns on memory utilization.

Thanks

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-02-28 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821831#comment-17821831
 ] 

Gus Heck commented on SOLR-17160:
-

Interfering with some requests is perhaps less problematic than running the 
server out of memory IMHO YMMV. Also, the client sees failure, but the original 
request still gets executed right? So client code that *really* cares can trap 
this case and re-verify heuristically, or re-send if idempotent...

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-02-28 Thread Pierre Salagnac (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821812#comment-17821812
 ] 

Pierre Salagnac commented on SOLR-17160:


Thanks for raising this concern [~gus].

Currently, this code is already somehow vulnerable to DOS. Even if we don't 
grow forever in memory, we only track 100 requests. So if we bomb Solr with 
unwanted admin requests, the legitimate ones with still fail from the client 
point of view because the client won't be able to the completion statuses?

But I agree not having any limited memory footprint at all is not ideal. In 
addition to time based eviction, I can add a max size for the cache. I was 
thinking it doing this initially, but I dropped it as it showed to be 
unnecessary.

 

I thinking of adding a limit much higher than the 100 requests tracked tracked, 
can be 10K. Most of the time, we should never hit this limit, and in case of a 
DOS, it should be sufficient to protect the host from an abusive memory 
utilization here.

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

2024-02-28 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821680#comment-17821680
 ] 

Gus Heck commented on SOLR-17160:
-

This concept worries me. I think it's fine to have this as an option, but not 
as a default. By default it should be limited to a set memory footprint 
otherwise it becomes a DOS target.

> Bulk admin operations may fail because of max tracked requests
> --
>
> Key: SOLR-17160
> URL: https://issues.apache.org/jira/browse/SOLR-17160
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.11, 9.5
>Reporter: Pierre Salagnac
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}{_}' - 'notfound'. Retried 
> __ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org