[ https://issues.apache.org/jira/browse/UNOMI-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jerome Blanchard updated UNOMI-908: ----------------------------------- Description: h3. Context Currently, some Unomi entities (like rules, segment, propertyTypes...) are polled in Elastic Search every seconds using Scheduled Job to ensure that if a another Unomi node has made some modifications on it, it is locally refreshed. There is no need of explanation to understand that this approach, while functioning, is not efficient moreover in case of low frequency updates of the entity. More than that, without a strong scheduler engine capable of watchdog and failover, naïve scheduler implementation can die silently causing invisible integrity problems leading to corrupted data. We already faced similar production issues having nodes with different rule set. A second point is with the removal of Karaf Cellar (Karaf cluster bundle and config propagation feature) in Unomi 3, another cluster topology monitoring has been introduced. This new implementation rely on a dedicated entity : ClusterNode stored in a dedicated index of elasticsearch. Every 10 seconds (using also a scheduled job), the ClusterNode document is updated by setting its heartbeat field to the current timestamp. In the same time, other ClusterNode are checked to see if the latest heartbeat is fresh enough to keep it or not. This topology management is very resource-intensive and does not follow state of the art in terms of architecture. Topology information is not something that need to be persisted (expect for needs of audit which is not the case here) and must to be managed in memory using dedicated and proven algorithm. h3. Proposal The goal here is to propose a solution that will address both problems by relying on an external, proven and widely used solution: distributed caching with Infinispan. Because distributed caching libraries needs to rely on a cluster topology manager inside, we could use the same tools for managing entities cache without polling AND to discover and monitor the cluster topology. We propose to use the generic caching features already available for Karaf: Infinispan. It will be packaged as a dedicated generic caching service based on annotated methods and directly inspired by the current Unomi entity cache. Thus, the underlying JGroups library used in Infinispan will also be exposed to refactor the Unomi ClusterService instead of using a persistent entity. By externalizing caching into a dedicated, widely used and proven solution the Unomi code will become lighter and more robust to manage cluster oriented operations on entities. The use of a Distributed Cache for persistent entities is something that is widely used for decades and integrated in all Enterprise Level framework (EJB, ,Spring, ...) for a very long time. This is proven technology with very strong implementation and support and Infinispan is one of the best reference in that domain (used in Widlfy, Hibernate, Apache Camel, ...) h3. Tasks * Package a Unomi Cache feature that will rely on the existing Karaf Infinispan Feature * Refactor ClusterServiceImpl to takes advantage of infinispan cluster manger or simply store ClusterNode in the DistributedCache instead of in ElasticSearch. * Remove Elasticsearch-based persistence logic for ClusterNode. * Ensure heartbeat updates are managed via distributed cache ; if not, rely on the distributed cache underlying cluster management to manage ClusterNode entities (JGroup for Infinispan) * Remove Entity polling feature and use the distributed caching strategy for the operations that loads entities in storage. ** Current listRules() is refactor to simply load entities in ES but with distributed caching. ** updateRule() operation will also propagate the update in the distributed cache avoiding any polling latency. * Update documentation to reflect the new architecture. h3. Definition of Done * ClusterNode information is available and updated without Elasticsearch. * No additional Elasticsearch index is created for cluster nodes. * Heartbeat mechanism works reliably. * All 'cacheable' entities rely on the dedicated cluster aware cache feature based on infinispan karaf feature. * All polling jobs are removed * Test for entities update propagation over cluster is setup * All relevant documentation is updated. * Integration tests confirm correct cluster node management and heartbeat updates. * No regression in cluster management functionality. Implementation tips In the Unomi services was: h3. Context Currently, some Unomi entities (like rules, segment, propertyTypes...) are polled in Elastic Search every seconds using Scheduled Job to ensure that if a another unomi node has made some modifications on it, it is locally refreshed. There is no need of explanation to understand that this approach, while functioning, is not efficient at all in case of low frequency updates of the entity. More than that, without a strong scheduler engine capable of watchdog naive scheduler implementation can have job (thread) that die. We already faced production issues because of that, having nodes with different rule set. A second point is with the removal of Karaf Cellar (Karaf cluster bundle and config propagation feature) in Unomi 3, a simple cluster topology monitoring has been introduced. This implementation rely on a dedicated entity : ClusterNode stored in a dedicated index of elasticsearch. Every 10 seconds (using also a scheduled job), the ClusterNode document is updated by setting its heartbeat field to the current timestamp. In the same time, other ClusterNode are checked to see if the latest heartbeat is fresh enough to keep it or not. This topology management is very resource-intensive and does not follow state of the art in terms of architecture. Topology information is not something that need to be persisted expect for audit (which is not the case here) and need to be managed in memory using proven algorithm. h3. Proposal The goal here is to propose a solution that will serve both problems by relying on external, proven and widely use solution : Distributed Caching with Infinispan. Because distributed caching libraries needs to rely on a cluster topology manager inside, we could use the same tools for managing entities cache without polling AND to discover and monitor the cluster topology. We propose to use generic caching feature already available for Karaf : infinispan. It will be package in a dedicated generic caching service based on annotated methods and directly inspired from the current Unomi entities cache. Thus, the underlying JGroup library used in Infinispan will also be exposed to refactor the Unomi ClusterService instead of using persistence entity. By externalizing caching into a dedicated, widely used and proven solution the Unomi code will become lighter and more robust to manage cluster oriented operations on entities. The use of a Distributed Cache for persistent entities is something that is widely used for decades and integrated in all Enterprise Level framework (EJB, ,Spring, ...) for a very long time. This is proven technology with very strong implementation and support and Infinispan is one of the best reference in that domain (used in Widlfy, Hibernate, Apache Camel, ...) h3. Tasks * Package a Unomi Cache feature that will rely on the existing Karaf Infinispan Feature * Refactor ClusterServiceImpl to takes advantage of infinispan cluster manger or simply store ClusterNode in the DistributedCache instead of in ElasticSearch. * Remove Elasticsearch-based persistence logic for ClusterNode. * Ensure heartbeat updates are managed via distributed cache ; if not, rely on the distributed cache underlying cluster management to manage ClusterNode entities (JGroup for Infinispan) * Remove Entity polling feature and use the distributed caching strategy for the operations that loads entities in storage. ** Current listRules() is refactor to simply load entities in ES but with distributed caching. ** updateRule() operation will also propagate the update in the distributed cache avoiding any polling latency. * Update documentation to reflect the new architecture. h3. Definition of Done * ClusterNode information is available and updated without Elasticsearch. * No additional Elasticsearch index is created for cluster nodes. * Heartbeat mechanism works reliably. * All 'cacheable' entities rely on the dedicated cluster aware cache feature based on infinispan karaf feature. * All polling jobs are removed * Test for entities update propagation over cluster is setup * All relevant documentation is updated. * Integration tests confirm correct cluster node management and heartbeat updates. * No regression in cluster management functionality. > Introduce Distributed Cache to avoid intensive polling > ------------------------------------------------------ > > Key: UNOMI-908 > URL: https://issues.apache.org/jira/browse/UNOMI-908 > Project: Apache Unomi > Issue Type: Improvement > Components: unomi(-core) > Affects Versions: unomi-3.0.0 > Reporter: Jerome Blanchard > Priority: Major > > h3. Context > Currently, some Unomi entities (like rules, segment, propertyTypes...) are > polled in Elastic Search every seconds using Scheduled Job to ensure that if > a another Unomi node has made some modifications on it, it is locally > refreshed. > There is no need of explanation to understand that this approach, while > functioning, is not efficient moreover in case of low frequency updates of > the entity. > More than that, without a strong scheduler engine capable of watchdog and > failover, naïve scheduler implementation can die silently causing invisible > integrity problems leading to corrupted data. We already faced similar > production issues having nodes with different rule set. > A second point is with the removal of Karaf Cellar (Karaf cluster bundle and > config propagation feature) in Unomi 3, another cluster topology monitoring > has been introduced. This new implementation rely on a dedicated entity : > ClusterNode stored in a dedicated index of elasticsearch. > Every 10 seconds (using also a scheduled job), the ClusterNode document is > updated by setting its heartbeat field to the current timestamp. In the same > time, other ClusterNode are checked to see if the latest heartbeat is fresh > enough to keep it or not. > This topology management is very resource-intensive and does not follow state > of the art in terms of architecture. > Topology information is not something that need to be persisted (expect for > needs of audit which is not the case here) and must to be managed in memory > using dedicated and proven algorithm. > h3. Proposal > The goal here is to propose a solution that will address both problems by > relying on an external, proven and widely used solution: distributed caching > with Infinispan. > Because distributed caching libraries needs to rely on a cluster topology > manager inside, we could use the same tools for managing entities cache > without polling AND to discover and monitor the cluster topology. > We propose to use the generic caching features already available for Karaf: > Infinispan. It will be packaged as a dedicated generic caching service based > on annotated methods and directly inspired by the current Unomi entity cache. > Thus, the underlying JGroups library used in Infinispan will also be exposed > to refactor the Unomi ClusterService instead of using a persistent entity. > By externalizing caching into a dedicated, widely used and proven solution > the Unomi code will become lighter and more robust to manage cluster oriented > operations on entities. > The use of a Distributed Cache for persistent entities is something that is > widely used for decades and integrated in all Enterprise Level framework > (EJB, ,Spring, ...) for a very long time. This is proven technology with very > strong implementation and support and Infinispan is one of the best reference > in that domain (used in Widlfy, Hibernate, Apache Camel, ...) > h3. Tasks > * Package a Unomi Cache feature that will rely on the existing Karaf > Infinispan Feature > * Refactor ClusterServiceImpl to takes advantage of infinispan cluster > manger or simply store ClusterNode in the DistributedCache instead of in > ElasticSearch. > * Remove Elasticsearch-based persistence logic for ClusterNode. > * Ensure heartbeat updates are managed via distributed cache ; if not, rely > on the distributed cache underlying cluster management to manage ClusterNode > entities (JGroup for Infinispan) > * Remove Entity polling feature and use the distributed caching strategy for > the operations that loads entities in storage. > ** Current listRules() is refactor to simply load entities in ES but with > distributed caching. > ** updateRule() operation will also propagate the update in the distributed > cache avoiding any polling latency. > * Update documentation to reflect the new architecture. > h3. Definition of Done > * ClusterNode information is available and updated without Elasticsearch. > * No additional Elasticsearch index is created for cluster nodes. > * Heartbeat mechanism works reliably. > * All 'cacheable' entities rely on the dedicated cluster aware cache feature > based on infinispan karaf feature. > * All polling jobs are removed > * Test for entities update propagation over cluster is setup > * All relevant documentation is updated. > * Integration tests confirm correct cluster node management and heartbeat > updates. > * No regression in cluster management functionality. > Implementation tips > In the Unomi services -- This message was sent by Atlassian Jira (v8.20.10#820010)