rzo1 opened a new issue, #1698: URL: https://github.com/apache/stormcrawler/issues/1698
Allow the waitAck cache in [StatusUpdaterBolt](https://github.com/apache/stormcrawler/blob/main/external/opensearch/src/main/java/org/apache/stormcrawler/opensearch/persistence/StatusUpdaterBolt.java#L163) to be configured via a cache specification string (similar to how AbstractStatusUpdaterBolt handles its internal cache), instead of being hard-coded. In the current implementation, waitAck is initialized as follows: ``` waitAck = Caffeine.newBuilder() .expireAfterWrite(60, TimeUnit.SECONDS) .removalListener(this) .build(); ``` This configuration is static: both the expiration time and cache size are fixed. Users cannot adjust it through configuration. Under heavy indexing loads or high latency in OpenSearch responses, this can lead to: - Premature eviction of pending tuples. - “Purged from waitAck” errors in logs. - Storm replaying tuples and OpenSearch returning 409 CONFLICT or 429 TOO MANY REQUESTS responses. **Idea**: Add a configuration parameter (e.g. `opensearch.status.updater.waitack.cache.spec`) that allows the cache to be built from a Caffeine specification string, just like the one used in `AbstractStatusUpdaterBolt`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
