rzo1 opened a new issue, #1698:
URL: https://github.com/apache/stormcrawler/issues/1698

   Allow the waitAck cache in 
[StatusUpdaterBolt](https://github.com/apache/stormcrawler/blob/main/external/opensearch/src/main/java/org/apache/stormcrawler/opensearch/persistence/StatusUpdaterBolt.java#L163)
  to be configured via a cache specification string (similar to how 
AbstractStatusUpdaterBolt handles its internal cache), instead of being 
hard-coded.
   
   In the current implementation, waitAck is initialized as follows:
   
   ```
   waitAck = Caffeine.newBuilder()
       .expireAfterWrite(60, TimeUnit.SECONDS)
       .removalListener(this)
       .build();
   ```
   
   This configuration is static: both the expiration time and cache size are 
fixed. Users cannot adjust it through configuration. Under heavy indexing loads 
or high latency in OpenSearch responses, this can lead to:
   
   - Premature eviction of pending tuples.
   - “Purged from waitAck” errors in logs.
   - Storm replaying tuples and OpenSearch returning 409 CONFLICT or 429 TOO 
MANY REQUESTS responses.
   
   **Idea**:
   
   Add a configuration parameter (e.g. 
`opensearch.status.updater.waitack.cache.spec`) that allows the cache to be 
built from a Caffeine specification string, just like the one used in 
`AbstractStatusUpdaterBolt`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to