Mikhail Petrov created IGNITE-15966: ---------------------------------------
Summary: [Security] Node can hang with authentication enabled after user drop operation Key: IGNITE-15966 URL: https://issues.apache.org/jira/browse/IGNITE-15966 Project: Ignite Issue Type: Bug Environment: Reporter: Mikhail Petrov Reproducer: {code:java} /** */ public class UserDropTest extends GridCommonAbstractTest { /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); cfg.setAuthenticationEnabled(true); cfg.setDataStorageConfiguration(new DataStorageConfiguration() .setDefaultDataRegionConfiguration(new DataRegionConfiguration() .setPersistenceEnabled(true))); return cfg; } /** */ @Test public void test() throws Exception { startGrid(0); startGrid(1); grid(0).cluster().state(ClusterState.ACTIVE); grid(0).createCache(DEFAULT_CACHE_NAME); try (AutoCloseable ignored = withSecurityContextOnAllNodes(authenticate(grid(0), "ignite", "ignite"))) { grid(0).context().security().createUser("cli", "pwd".toCharArray()); } IgniteClient client = Ignition.startClient(new ClientConfiguration().setAddresses("127.0.0.1:10800").setUserName("cli").setUserPassword("pwd")); ClientCache<Object, Object> cache = client.cache(DEFAULT_CACHE_NAME); try (AutoCloseable ignored = withSecurityContextOnAllNodes(authenticate(grid(0), "ignite", "ignite"))) { grid(0).context().security().dropUser("cli"); } Map<Integer, Integer> entries = new HashMap<>(); for (int i = 0; i < 10000; i++) entries.put(i, i); cache.putAll(entries); } /** {@inheritDoc} */ @Override protected void beforeTest() throws Exception { super.beforeTest(); cleanPersistenceDir(); } } {code} Exception: {code:java} [2021-11-22 11:04:32,390][ERROR][sys-stripe-3-#92%ignite.UserDropTest1%][IgniteTestResources] Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Failed to find security context for subject with given ID : 0898b227-30d5-3afc-9394-d8e4889ece4a]] java.lang.IllegalStateException: Failed to find security context for subject with given ID : 0898b227-30d5-3afc-9394-d8e4889ece4a at org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:167) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1906) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528) at org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242) at org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421) at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at java.lang.Thread.run(Thread.java:748) {code} The main problem is: Implementation of authentication plugin ties security user with the subject ID that is propagated through cluster nodes. If some node receives operation initiated by the deleted user, it fails to obtain security context via subject id since it was deleted and hangs with mentioned above exception. Here we are faced with a security implementation problem - we have no mechanism to determine that a security subject is no longer needed and can be safely removed and at the same time we throw unrecoverable exception in case security subject is not found that kills system worker and hangs node. -- This message was sent by Atlassian Jira (v8.20.1#820001)