This is an automated email from the ASF dual-hosted git repository. btellier pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/james-project.git
The following commit(s) were added to refs/heads/master by this push: new 65d7a3d9c3 [BOYSCOUT] Write some missing ADRs regarding recent work (#1196) 65d7a3d9c3 is described below commit 65d7a3d9c389517c9c100bf3fc2c703d4eb33c34 Author: Benoit TELLIER <btell...@linagora.com> AuthorDate: Thu Sep 22 09:50:23 2022 +0700 [BOYSCOUT] Write some missing ADRs regarding recent work (#1196) - ADR 57: Reactive IMAP - ADR 58: Upgrade to Netty 4 - ADR 59: Upgrade to Cassandra driver 4 - ADR 60: Adopt bounded elastic - ADR 61: Delegation - ADR 62: OIDC token introspection - ADR 63: Prevent temporary file leaks --- src/adr/0051-oidc.md | 4 +- src/adr/0053-email-rate-limiting.md | 2 +- src/adr/0057-reactive-imap.md | 101 +++++++++++++++++++++++++++ src/adr/0058-upgrade-to-netty-4.md | 66 +++++++++++++++++ src/adr/0059-upgrade-to-cassadra-driver-4.md | 52 ++++++++++++++ src/adr/0060-adopt-bounded-elastic.md | 60 ++++++++++++++++ src/adr/0061-delegation.md | 61 ++++++++++++++++ src/adr/0062-oidc-token-introspection.md | 50 +++++++++++++ src/adr/0063-temporary-file-leaks.md | 75 ++++++++++++++++++++ 9 files changed, 469 insertions(+), 2 deletions(-) diff --git a/src/adr/0051-oidc.md b/src/adr/0051-oidc.md index ef3f76c837..533474e9b3 100644 --- a/src/adr/0051-oidc.md +++ b/src/adr/0051-oidc.md @@ -6,7 +6,9 @@ Date: 2021-12-06 Accepted (lazy consensus). -Not yet implemented. +Implemented. + +Complemented by [ADR-62](0062-oidc-token-introspection.md). ## Context diff --git a/src/adr/0053-email-rate-limiting.md b/src/adr/0053-email-rate-limiting.md index e79276ade3..bb3e91e5af 100644 --- a/src/adr/0053-email-rate-limiting.md +++ b/src/adr/0053-email-rate-limiting.md @@ -6,7 +6,7 @@ Date: 2021-01-10 Accepted (lazy consensus). -Not yet implemented. +Implemented. ## Context diff --git a/src/adr/0057-reactive-imap.md b/src/adr/0057-reactive-imap.md new file mode 100644 index 0000000000..e54cb28c1b --- /dev/null +++ b/src/adr/0057-reactive-imap.md @@ -0,0 +1,101 @@ +# 57. Reactive IMAP + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +## Context + +Classic java programming involves doing some blocking calls. When performing some input-output, the +thread performing the tasks at hand will hang until a response is received. This means that a server +performing IOs, which is the case of James, would end up with a high number of threads: for an IMAP +server, one IMAP request in flight equals one thread. + +The widely documented issue with such an approach is that, in Java threads correspond to system threads, +and are expensive. Typically, each thread takes up to ~1MB of memory. To cap the count of threads at +hand, the Thread Pool model is used to cap the count of threads. Process at most N IMAP requests with +N threads, and queue requests that cannot yet be handled. + +The blocking thread pool model was the concurrency paradigm used by James IMAP implementation in 3.7.x. + +Reactive programming instead does not perform blocking operation, thus one thread can handle several +IO tasks. This can be seen as callbacks executed once the request is received. Reactive programming model, +amongst others, leads to efficient resource usage. It helps keeping the count of applicative threads +low, helps reduce context switches. + +James have been, among the past few years, transitioning to a reactive programming model using the +Reactor library. Migrated components involves: + - Cassandra/Distributed storage + - JMAP protocol stacks + - Some (webadmin) tasks also relies on Reactor + +It is to be noted that our IMAP network stack uses Netty 4.x library that allows handling requests +asynchronously. + +## Decision + +Migrate IMAP to a reactive model. + +Implement a way to limit IMAP concurrency to protect James server from bursts. + +## Consequences + +Significant rewrite of the IMAP stack. + +Going reactive on the IMAP stacks yields the following improvements: + - Reduced thread count. IMAP request handling can now be dealt with from the Netty IO threads. + - Improve latency/performance. Removing the need of an extra scheduling and of synchronisation + leads to improved tail latencies and better throughput. + +Diagnostic tools will be harder to use on top of the IMAP stack: + - Stack traces will be more complex to read + - Glowroot APM would not work anymore as asynchronous logic are notoriously badly handled + - Flame graphs will be messier. + +Associated risk: + - Badly managed blocking calls would now directly impact the Netty event loop thus strongly degrading the performance + of the server. Tools like blockhound can help to mitigate this risk. + +## Alternatives + +From Java 19 onward, similar concerns are addressed by project Loom. This project delivers "Virtual Threads" scheduled +by the Java Virtual Machine on top of carrier system threads. This approach allow preserving the imperative programing +style. Tooling also works better. + +Yet, this promissing feature: + + - Would only be available for use in Java 21 (LTS), not to be delivered prior to September 2023. Upgrading the James + ecosystem to Java 21 Java Development Kit would be a controversial decision in itself and might raise significant debates + within the community. + - Operational feedback would likely be required prior adoption. + - Do not have yet performance comparison with asynchronous libraries + +## Follow up work + +Some other components have not yet been migrated to a reactive style. This includes: + + - Mail processing within the mailet container + - Remote Delivery (outgoing SMTP) + - SMTP server (incoming SMTP / LMTP) + - WebAdmin + +With lower throughput, the benefits of migrating such components to a reactive style are lower thus might not match the +associated rewrite costs and operational risk. As such, the effort had not yet been undertaken, but might be in the future. + +It is to be mentioned that APPEND command is buffered to a file, which, when happening is done on a dedicated thread. The +associated code is complex as Netty 4 ByteToMessageDecoder is meant to be synchronous. Upcoming Netty 5 would enable us +to simplify associated code. + +## References + +- [JIRA JAMES-3737](https://issues.apache.org/jira/browse/JAMES-3737) +- [JIRA JAMES-3816](https://issues.apache.org/jira/browse/JAMES-3816) +- [Project reactor](https://projectreactor.io/) +- [Project loom](https://wiki.openjdk.org/display/loom/Main) +- [Netty](https://netty.io/) +- [Blockhound](https://github.com/reactor/BlockHound) +- [Discussion on the mailing list](https://www.mail-archive.com/server-dev@james.apache.org/msg72113.html) \ No newline at end of file diff --git a/src/adr/0058-upgrade-to-netty-4.md b/src/adr/0058-upgrade-to-netty-4.md new file mode 100644 index 0000000000..248991b460 --- /dev/null +++ b/src/adr/0058-upgrade-to-netty-4.md @@ -0,0 +1,66 @@ +# 58. Upgrade to netty 4 + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +## Context + +James 3.7.0 uses Netty 3 for its TCP stacks (IMAP, SMTP, LMTP, MANAGESIEVE, POP3). + +Latests Netty 3 minor releases dates from 20XX. This brings concerns regarding security +(unmaintained dependency). + +Netty 4 delivers many enhancements compared to Netty 3. + + - We can mention here the buffer APIs + - Netty 4 empowers the use of native SSL, native event loops + + +## Decision + +Upgrade James to Netty 4. + +## Consequences + +Significant rewrite of the protocol stack. Though the count of lines of code involved is low, the code is +tricky and would need several iteration to be stabilized. For this reason, James 3.7.0 was released prior +the Netty 4 adoption. + +Associated risk: + - Buffer leaks. Manual buffer management can be error prone. Netty project includes relevant + tooling. + - We encountered issues with write ordering (when execute on, or outside of, the event loop) + +Benefits: Netty 4 natively supports graceful shutdown, which enables easily implementing such a feature in +James. + +Netty 4 upgrades was also the opportunity to significantly improve test coverage for IMAP protocol stacks +and IMAP 4 extensions, which allowed fixing numerous bugs. To be mentioned: + - COMPRESSION + - STARTTLS and SSL + - IMAP IDLE + - QRESYNC and CONDSTORE + +To be noted that Netty version used by James needs to be aligned with those of our dependencies: + - S3 driver + - Cassandra 4 driver + - Lettuce driver (Redis for rate limiting) + - Reactor HTTP + +Netty upgrades thus needs to be carefully done. So far we successfully aligned versions, so the overall +compatibility looks good. + +## Follow up work + +We did not yet succeed to enable native epoll, native SSL. + +## References + +- [JIRA JAMES-3715](https://issues.apache.org/jira/browse/JAMES-3715) +- [Netty](https://netty.io) +- [Netty 4 migration](https://netty.io/wiki/new-and-noteworthy-in-4.0.html) diff --git a/src/adr/0059-upgrade-to-cassadra-driver-4.md b/src/adr/0059-upgrade-to-cassadra-driver-4.md new file mode 100644 index 0000000000..d7bf8e2436 --- /dev/null +++ b/src/adr/0059-upgrade-to-cassadra-driver-4.md @@ -0,0 +1,52 @@ +# 59. Upgrade to Cassandra driver 4 + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +## Context + +Cassandra driver 3 latest release took pace in 201X. An outdated dependency pauses a security risk. + +Moreover Cassandra driver 4: + - Have prime reactive support and do not need to pass through an intermediate "CompletableFuture" + representation which is more optimised. + - In Cassandra driver 3, paging is blocking. Handling paging the reactive way means we do not + need to aggressively switch the processing to an elastic thread, which helps keeping the count of + threads low. + - Cassandra driver 4 has built in support for advanced driver configuration tuning without the + need for the application to programmatically configure the driver. This means great flexibility without + the need for the application to proxy the driver configuration parameters. + +## Decision + +Upgrade James to Cassandra driver 4. + +Handle Cassandra requests callbacks on the Cassandra driver thread. + +## Consequences + +We expect a performance enhancement for native reactive support from the Cassandra driver, as well +as a better thread handling which translate to improved latencies/throughput. + +This code change needs a rewrite of all the Cassandra data access layer. + +Associated risk: + - Blocking on the Cassandra driver thread is prohibited. Tools like Blockhound can be adapted to + mitigate this. + - Blocking a Cassandra request from within a Cassandra driver thread result in a request timeout. + - Long running computation should be switched to other threads as it can prevent other Cassandra + requests from being executed. + +Mechanisms related to Cassandra driver configuration will need to be reworked. This leads to a breaking +change but our Cassandra driver configuration will be simpler and more standard. + +## References + +- [JIRA JAMES-3774](https://issues.apache.org/jira/browse/JAMES-3774) +- [Cassandra driver upgrade instructions](https://docs.datastax.com/en/developer/java-driver/4.0/upgrade_guide/) +- [BlockHound](https://github.com/reactor/BlockHound) diff --git a/src/adr/0060-adopt-bounded-elastic.md b/src/adr/0060-adopt-bounded-elastic.md new file mode 100644 index 0000000000..6dcea13171 --- /dev/null +++ b/src/adr/0060-adopt-bounded-elastic.md @@ -0,0 +1,60 @@ +# 60. Adopt bounded elastic + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +## Context + +See description of blocking vs reactive programming odel described in [ADR-57](0057-reactive-imap.md). + +Sometimes reactive code needs to execute blocking tasks. + + - This is the case when inter-operating with blocking libraries + - James reactive adoption is progressive, some parts of our application + are thus not reactive yet, which can result in reactive code calling + blocking code calling reactive code. + +Historically James used the elastic scheduler for scheduling such blocking calls. This scheduler +starts a thread for each task submitted (and attempt to reuse threads when possible) and results +in high thread count. + +That is why project reactor deprecated the elastic scheduler in favor of bounded-elastic (similar to +a thread pool). + +## Decision + +Migrate from elastic scheduler to bounded-elastic scheduler. + +## Consequences + +We expect a reduction in used threads under load from such a change. + +Also, getting rid of elastic scheduler might be a requirement to upgrade to reactor 3.5 upward. + +Associated risk: + - In some places, James protocol code is reactive, calls blocking intermediates API, to + end up calling reactive data access code. This results in nested blocking calls. Nested blocking + calls, when using the same scheduler with a bounded thread cap, can result in a dead lock. + +To prevent such a dead-lock code executed on bounded-elastic should not depend on scheduling nested +blocking call on bounded-elastic scheduler for its completion. We can thus avoid such a situation by +using a scheduler dedicated by nested blocking calls. + +## Alternatives + +Alternatives to the "blocking call wrapper" scheduler described above includes a full reactive +migration for Apache James (ie no blocking calls meaning no nested blocking calls). + +While this clearly is the target such a work is not realistically done within a limited time frame. +As such we need a transitional model allowing reactive code to inter-operate with legacy blocking +James code. + +## References + +- [JIRA JAMES-3773](https://issues.apache.org/jira/browse/JAMES-3773) +- [Reactor deprecating elastic scheduler](https://github.com/reactor/reactor-core/issues/1893) diff --git a/src/adr/0061-delegation.md b/src/adr/0061-delegation.md new file mode 100644 index 0000000000..f18b757288 --- /dev/null +++ b/src/adr/0061-delegation.md @@ -0,0 +1,61 @@ +# 61. Delegation + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +## Context + +Delegation is a common feature for email servers: + +``` +As user A I want to access mailbox of user B. +``` + +James currently supports a similar feature called impersonation: + +``` +As an administrator I want to acces mailbox of user B. +``` + +Impersonation can for instance be used to perform migrations with tools like IMAP-Sync. + +## Decision + +Implement delegation in Apache James (opt in). + +Reuse APIs used for impersonation to also back delegation up. Technically if user B delegates his +account to user A then user A can impersonate user B. + +Stored delegated access in a Cassandra database and expose it through webadmin. + +Support delegation while logging in IMAP/SMTP. Both LOGIN/PLAIN authentication and OIDC +authentication are supported. + +## Consequences + +Logging traces belongs to the target user and not the user that really authenticated +though an intermediate log upon logging should allow a correlation. + +Associated risk: + - Special care needs to be paid to delegation as it can fall into the `improper authorization` + attack class and might result in data leaks / modification / deletion. However, as delegation + is performed upon logging the attack surface is limited, and typically simpler than traditional + right management systems. + +## Follow-up work + +JMAP integration. We can expose delegated accounts through the JMAP session object and support +using non-default JMAP accounts. + +This might come at the price of one Cassandra read per JMAP request when interacting with delegated +accounts. + +## References + +- [JIRA JAMES-3756](https://issues.apache.org/jira/browse/JAMES-3756) +- [IMAP Sync](https://imapsync.lamiral.info/) diff --git a/src/adr/0062-oidc-token-introspection.md b/src/adr/0062-oidc-token-introspection.md new file mode 100644 index 0000000000..c9533dde37 --- /dev/null +++ b/src/adr/0062-oidc-token-introspection.md @@ -0,0 +1,50 @@ +# 62. OIDC token introspection + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +Complements [ADR 51](0051-oidc.md). + +## Context + +[ADR 51](0051-oidc.md) describes work required for OIDC adoption within James. + +This work enables the uses of an OIDC access token to authenticate using IMAP and SMTP. +It validates the signature of the token using cryptographic materials exposes by the +Identity Provider server through the mean of a JWKS endpoint. Yet no effort is made to +see if the access token in question was revoked or not, which can pause a security threat. + +OIDC ecosystem can support the following mechanisms to determine if an access token had been +revoked: + + - Use of an introspection endpoint: the application asks the OIDC server to validate the token + through an HTTP call. This result in load on the identity provider, which becomes central to the + authentication process. This can be assimilated to a 'synchronous' mode. + - Use of back-channel upon token revocation. The OIDC provider is then responsible to call an + applicative endpoint to invalidate a token. Invalidated tokens are then stored by the application + and access token are challenged against that storage. This approach scales better yet is harder + to implement and can be seen as less secure (a network incident can prevent revoked token + propagation to applications for instance). This can be seen as an 'asynchronous' mode. + +Also, we need to keep in mind that OIDC validation is only done upon establishing the connection in +IMAP/SMTP (as they are connected protocols) which defers from stateless protocols like HTTP. Performance +impact of token introspection is thus lower for connected protocols. + +## Decision + +Allow opt-in use of an introspection endpoint to further secure IMAP/SMTP OIDC implementation. + +## Consequences + +Security gains for the IMAP/SMTP OIDC implementation in James. + +## References + +- [JIRA JAMES-3755](https://issues.apache.org/jira/browse/JAMES-3755) +- [RFC-7628](https://www.rfc-editor.org/rfc/rfc7628.html) SASL OATH mechanism for SMTP and IMAP: https://datatracker.ietf.org/doc/html/rfc7628 +- [RFC-7662](https://datatracker.ietf.org/doc/html/rfc7662) OAUTH token introspection \ No newline at end of file diff --git a/src/adr/0063-temporary-file-leaks.md b/src/adr/0063-temporary-file-leaks.md new file mode 100644 index 0000000000..6faa3b1bd2 --- /dev/null +++ b/src/adr/0063-temporary-file-leaks.md @@ -0,0 +1,75 @@ +# 63. Prevent temporary file leaks + +Date: 2022-09-13 + +## Status + +Accepted (lazy consensus). + +Implemented. + +## Context + +In order to reduce memory allocation, Apache James buffers big emails into temporary files, in both SMTP +and during the email processing. + +This means that all James entities relying on `Mail` object needs to have resource management in place +to prevent temporary file leaks. If not this can result in both unreasonable disk occupation and file +descriptor leaks. + +Core components were found to badly manage those resources. Some mailets (bounce), mail queue APIs, mail +repository APIs were found to be causing temporary file leaks. + +James allows users to define custom mailets, that them too can badly manage emails. If insiders tend to +commit such errors, then we should expect our users to commit them too. + +This points toward the need of a systematic approach to detect and mitigate temporary file leaks. + +Similar leak detection is performed by other libraries. One can mention here Netty buffer libraries, +which relies on phantom references to detect leaks. Phantom references allows the Java garbage collector +to report recently GCed object. Upon phantom reference allocation, a check can then be done to check recently +GCed object, and release related resources if need be. + +## Decision + +Implement a leak detector "a la Netty" using phantom references. + +Allow several modes: + - Off: does nothing which is James 3.6.x behaviour. + - Simple: logs leaks and deletes associated files + - Advanced: Same as simple but with the allocation stack trace which comes at a performance cost. + Useful for diagnostic. + - Testing: Assertion errors upon leaks. Can be used to streangthen test suite and detect leaks as part + of a build. + +## Consequences + +Makes James core components "leak proofed". + +Allows user to safely write extensions involving emails (eg duplication / sending). + +Performance impact of turning on "simple" leak detection (default behaviour) is expected not +to be significant. + +## Alternatives + +Similar functionalities can be implemented in a simpler way by relying on Java finalize method, called +upon garbage collection. + +Yet, such a move should not be done as many operational problems come through the use of 'finalize': + - Longer GC pauses as finalise runs + - GC is significantly slower when finalize method exists + - During finalizing stage objects are still live which ease GC promotion to old generation and can result + in so called 'kill the world GCs'. + +## Follow up work + +Mail management in James test suite is poor which leads to many false positive and prevents +us from leveraging leaks detector benefits as part of our test suite. Significant work would +be needed to do so. + +## References + +- [JIRA JAMES-3724](https://issues.apache.org/jira/browse/JAMES-3724) +- [Leak detection in Netty](https://netty.io/wiki/reference-counted-objects.html) +- [Phantom references](https://www.baeldung.com/java-phantom-reference) --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@james.apache.org For additional commands, e-mail: notifications-h...@james.apache.org