[jira] [Updated] (UNOMI-908) Introduce Distributed Cache to avoid intensive polling

Jerome Blanchard (Jira) Thu, 18 Sep 2025 06:55:35 -0700


     [ 
https://issues.apache.org/jira/browse/UNOMI-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jerome Blanchard updated UNOMI-908:
-----------------------------------
    Description: 
h3. Context

Currently, some Unomi entities (like rules, segment, propertyTypes...) are 
polled in Elastic Search every seconds using Scheduled Job to ensure that if a 
another Unomi node has made some modifications on it, it is locally refreshed.
There is no need of explanation to understand that this approach, while 
functioning, is not efficient moreover in case of low frequency updates of the 
entity. 
More than that, without a strong scheduler engine capable of watchdog and 
failover, naïve scheduler implementation can die silently causing invisible 
integrity problems leading to corrupted data. We already faced similar 
production issues having nodes with different rule set. 

A second point is with the removal of Karaf Cellar (Karaf cluster bundle and 
config propagation feature) in Unomi 3, another cluster topology monitoring has 
been introduced. This new implementation rely on a dedicated entity : 
ClusterNode stored in a dedicated index of elasticsearch.  
Every 10 seconds (using also a scheduled job), the ClusterNode document is 
updated by setting its heartbeat field to the current timestamp. In the same 
time, other ClusterNode are checked to see if the latest heartbeat is fresh 
enough to keep it or not.
This topology management is very resource-intensive and does not follow state 
of the art in terms of architecture. 
Topology information is not something that need to be persisted (expect for 
needs of audit which is not the case here) and must to be managed in memory 
using dedicated and proven algorithm.
h3. Proposal

The goal here is to propose a solution that will address both problems by 
relying on an external, proven and widely used solution: distributed caching 
with Infinispan.

Because distributed caching libraries needs to rely on a cluster topology 
manager inside, we could use the same tools for managing entities cache without 
polling AND to discover and monitor the cluster topology.

We propose to use the generic caching features already available for Karaf: 
Infinispan. It will be packaged as a dedicated generic caching service based on 
annotated methods and directly inspired by the current Unomi entity cache.

Thus, the underlying JGroups library used in Infinispan will also be exposed to 
refactor the Unomi ClusterService instead of using a persistent entity.

By externalizing caching into a dedicated, widely used and proven solution the 
Unomi code will become lighter and more robust to manage cluster oriented 
operations on entities.

The use of a Distributed Cache for persistent entities is something that is 
widely used for decades and integrated in all Enterprise Level framework (EJB, 
,Spring, ...) for a very long time. This is proven technology with very strong 
implementation and support and Infinispan is one of the best reference in that 
domain (used in Widlfy, Hibernate,  Apache Camel, ...)
h3. Tasks
 * Package a Unomi Cache feature that will rely on the existing Karaf 
Infinispan Feature
 * Refactor ClusterServiceImpl to takes advantage of infinispan cluster manger 
or simply store ClusterNode in the DistributedCache instead of in ElasticSearch.
 * Remove Elasticsearch-based persistence logic for ClusterNode.
 * Ensure heartbeat updates are managed via distributed cache ; if not, rely on 
the distributed cache underlying cluster management to manage ClusterNode 
entities (JGroup for Infinispan)
 * Remove Entity polling feature and use the distributed caching strategy for 
the operations that loads entities in storage.
 ** Current listRules() is refactor to simply load entities in ES but with 
distributed caching.
 ** updateRule() operation will also propagate the update in the distributed 
cache avoiding any polling latency.
 * Update documentation to reflect the new architecture.

h3. Definition of Done
 * ClusterNode information is available and updated without Elasticsearch.
 * No additional Elasticsearch index is created for cluster nodes.
 * Heartbeat mechanism works reliably.
 * All 'cacheable' entities rely on the dedicated cluster aware cache feature 
based on infinispan karaf feature.
 * All polling jobs are removed
 * Test for entities update propagation over cluster is setup
 * All relevant documentation is updated.
 * Integration tests confirm correct cluster node management and heartbeat 
updates.
 * No regression in cluster management functionality.

Implementation tips

In the Unomi services

  was:
h3. Context

Currently, some Unomi entities (like rules, segment, propertyTypes...) are 
polled in Elastic Search every seconds using Scheduled Job to ensure that if a 
another unomi node has made some modifications on it, it is locally refreshed.
There is no need of explanation to understand that this approach, while 
functioning, is not efficient at all in case of low frequency updates of the 
entity. 
More than that, without a strong scheduler engine capable of watchdog naive 
scheduler implementation can have job (thread) that die. We already faced 
production issues because of that, having nodes with different rule set. 

A second point is with the removal of Karaf Cellar (Karaf cluster bundle and 
config propagation feature) in Unomi 3, a simple cluster topology monitoring 
has been introduced. 
This implementation rely on a dedicated entity : ClusterNode stored in a 
dedicated index of elasticsearch.  
Every 10 seconds (using also a scheduled job), the ClusterNode document is 
updated by setting its heartbeat field to the current timestamp. In the same 
time, other ClusterNode are checked to see if the latest heartbeat is fresh 
enough to keep it or not.
This topology management is very resource-intensive and does not follow state 
of the art in terms of architecture. 
Topology information is not something that need to be persisted expect for 
audit (which is not the case here) and need to be managed in memory using 
proven algorithm.
h3. Proposal

The goal here is to propose a solution that will serve both problems by relying 
on external, proven and widely use solution : Distributed Caching with 
Infinispan.

Because distributed caching libraries needs to rely on a cluster topology 
manager inside, we could use the same tools for managing entities cache without 
polling AND to discover and monitor the cluster topology.

We propose to use generic caching feature already available for Karaf : 
infinispan. It will be package in a dedicated generic caching service based on 
annotated methods and directly inspired from the current Unomi entities cache.

Thus, the underlying JGroup library used in Infinispan will also be exposed to 
refactor the Unomi ClusterService instead of using persistence entity.

By externalizing caching into a dedicated, widely used and proven solution the 
Unomi code will become lighter and more robust to manage cluster oriented 
operations on entities.

The use of a Distributed Cache for persistent entities is something that is 
widely used for decades and integrated in all Enterprise Level framework (EJB, 
,Spring, ...) for a very long time. This is proven technology with very strong 
implementation and support and Infinispan is one of the best reference in that 
domain (used in Widlfy, Hibernate,  Apache Camel, ...)
h3. Tasks
 * Package a Unomi Cache feature that will rely on the existing Karaf 
Infinispan Feature
 * Refactor ClusterServiceImpl to takes advantage of infinispan cluster manger 
or simply store ClusterNode in the DistributedCache instead of in ElasticSearch.
 * Remove Elasticsearch-based persistence logic for ClusterNode.
 * Ensure heartbeat updates are managed via distributed cache ; if not, rely on 
the distributed cache underlying cluster management to manage ClusterNode 
entities (JGroup for Infinispan)
 * Remove Entity polling feature and use the distributed caching strategy for 
the operations that loads entities in storage.
 ** Current listRules() is refactor to simply load entities in ES but with 
distributed caching.
 ** updateRule() operation will also propagate the update in the distributed 
cache avoiding any polling latency.
 * Update documentation to reflect the new architecture.

h3. Definition of Done
 * ClusterNode information is available and updated without Elasticsearch.
 * No additional Elasticsearch index is created for cluster nodes.
 * Heartbeat mechanism works reliably.
 * All 'cacheable' entities rely on the dedicated cluster aware cache feature 
based on infinispan karaf feature.
 * All polling jobs are removed
 * Test for entities update propagation over cluster is setup
 * All relevant documentation is updated.
 * Integration tests confirm correct cluster node management and heartbeat 
updates.
 * No regression in cluster management functionality.


> Introduce Distributed Cache to avoid intensive polling
> ------------------------------------------------------
>
>                 Key: UNOMI-908
>                 URL: https://issues.apache.org/jira/browse/UNOMI-908
>             Project: Apache Unomi
>          Issue Type: Improvement
>          Components: unomi(-core)
>    Affects Versions: unomi-3.0.0
>            Reporter: Jerome Blanchard
>            Priority: Major
>
> h3. Context
> Currently, some Unomi entities (like rules, segment, propertyTypes...) are 
> polled in Elastic Search every seconds using Scheduled Job to ensure that if 
> a another Unomi node has made some modifications on it, it is locally 
> refreshed.
> There is no need of explanation to understand that this approach, while 
> functioning, is not efficient moreover in case of low frequency updates of 
> the entity. 
> More than that, without a strong scheduler engine capable of watchdog and 
> failover, naïve scheduler implementation can die silently causing invisible 
> integrity problems leading to corrupted data. We already faced similar 
> production issues having nodes with different rule set. 
> A second point is with the removal of Karaf Cellar (Karaf cluster bundle and 
> config propagation feature) in Unomi 3, another cluster topology monitoring 
> has been introduced. This new implementation rely on a dedicated entity : 
> ClusterNode stored in a dedicated index of elasticsearch.  
> Every 10 seconds (using also a scheduled job), the ClusterNode document is 
> updated by setting its heartbeat field to the current timestamp. In the same 
> time, other ClusterNode are checked to see if the latest heartbeat is fresh 
> enough to keep it or not.
> This topology management is very resource-intensive and does not follow state 
> of the art in terms of architecture. 
> Topology information is not something that need to be persisted (expect for 
> needs of audit which is not the case here) and must to be managed in memory 
> using dedicated and proven algorithm.
> h3. Proposal
> The goal here is to propose a solution that will address both problems by 
> relying on an external, proven and widely used solution: distributed caching 
> with Infinispan.
> Because distributed caching libraries needs to rely on a cluster topology 
> manager inside, we could use the same tools for managing entities cache 
> without polling AND to discover and monitor the cluster topology.
> We propose to use the generic caching features already available for Karaf: 
> Infinispan. It will be packaged as a dedicated generic caching service based 
> on annotated methods and directly inspired by the current Unomi entity cache.
> Thus, the underlying JGroups library used in Infinispan will also be exposed 
> to refactor the Unomi ClusterService instead of using a persistent entity.
> By externalizing caching into a dedicated, widely used and proven solution 
> the Unomi code will become lighter and more robust to manage cluster oriented 
> operations on entities.
> The use of a Distributed Cache for persistent entities is something that is 
> widely used for decades and integrated in all Enterprise Level framework 
> (EJB, ,Spring, ...) for a very long time. This is proven technology with very 
> strong implementation and support and Infinispan is one of the best reference 
> in that domain (used in Widlfy, Hibernate,  Apache Camel, ...)
> h3. Tasks
>  * Package a Unomi Cache feature that will rely on the existing Karaf 
> Infinispan Feature
>  * Refactor ClusterServiceImpl to takes advantage of infinispan cluster 
> manger or simply store ClusterNode in the DistributedCache instead of in 
> ElasticSearch.
>  * Remove Elasticsearch-based persistence logic for ClusterNode.
>  * Ensure heartbeat updates are managed via distributed cache ; if not, rely 
> on the distributed cache underlying cluster management to manage ClusterNode 
> entities (JGroup for Infinispan)
>  * Remove Entity polling feature and use the distributed caching strategy for 
> the operations that loads entities in storage.
>  ** Current listRules() is refactor to simply load entities in ES but with 
> distributed caching.
>  ** updateRule() operation will also propagate the update in the distributed 
> cache avoiding any polling latency.
>  * Update documentation to reflect the new architecture.
> h3. Definition of Done
>  * ClusterNode information is available and updated without Elasticsearch.
>  * No additional Elasticsearch index is created for cluster nodes.
>  * Heartbeat mechanism works reliably.
>  * All 'cacheable' entities rely on the dedicated cluster aware cache feature 
> based on infinispan karaf feature.
>  * All polling jobs are removed
>  * Test for entities update propagation over cluster is setup
>  * All relevant documentation is updated.
>  * Integration tests confirm correct cluster node management and heartbeat 
> updates.
>  * No regression in cluster management functionality.
> Implementation tips
> In the Unomi services



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (UNOMI-908) Introduce Distributed Cache to avoid intensive polling

Reply via email to