Re: IMAP FETCH management

2024-03-22 Thread Benoit TELLIER

Hello all.

The streaming approach turned out to be a terrible idea!

ChunkStream was doing massive blocking reads on the event loop. 
InputStream is just not the right abstraction for non blocking reads.


Rewriting everything to support something like Flux looks 
doable but requires a major refactoring. Moreover AES transformations 
would need to be adapted to that new format...


https://github.com/apache/james-project/pull/2149 is taking the exact 
opposite approach:


 - accept that we represent things internally as byte[]
 - and carry that information to the Netty stack so that it can adapt 
in consequences


This means that big IMAP FETCH would have an overhead of a few messages. 
Which likely is acceptable.


If everyone agrees with this, I would carry on with this approach.

Best regards,

Benoit TELLIER

On 20/03/2024 16:40, Benoit TELLIER wrote:

Hello all,

Today I did put together a POC where the following IMAP command

    a0 FETCH 1:* (BODY[])

would directly stream content from the S3 storage without storing the 
full input in a byte array.


I did test it a bit manually on top of the S3 AES implementation.

Link: https://github.com/apache/james-project/pull/2137

While working on this I stumbled across ReactorUtils::toInputStream 
which do not implement available (returns 0) and always block when 
trying to access the next chunk of data.
This would defeat most of the benefits of Netty's ChuckedStream 
abstraction: a reliable available method allows polling on it in the 
enventLoop and send data as it is ready.
Feeling brave I decided to experiment with a subscriber bringing the 
gaps between the NIO world and the reactor word.

This work is incomplete as usage in real life situation causes crash.

Link: https://github.com/apache/james-project/pull/2138

Other consideration doing this is also the need to increase the count 
of S3 connection as they are going to stay open longer...


Those are advanced topics and I believe they would be crucial into 
making Apache James a better IMAP server...


Best regards,

Benoit TELLIER

On 19/03/2024 16:45, Benoit TELLIER wrote:

Hello all,

As I had already been writing here, I did encounter significant 
issues during a recent deployment [1]


[1] 
https://www.mail-archive.com/server-dev@james.apache.org/msg73848.html


This did lead to [2] implementing backpressure for IMAP FETCH. Which 
had been mitigating the issue.


[2] https://issues.apache.org/jira/projects/JAMES/issues/JAMES-3997

But not really well-enough. As the count of users/mails increases I 
ended up with some new OutOfMemory exception related to IMAP usage 
from this weekend.


I thus did take the time to write a test regarding backpressure [3] 
(not reading the socket and instrumenting the mailbox layer to see 
what is actually pulled) and started playing with some related Netty 
settings [4].


[3] https://github.com/apache/james-project/pull/2128

[4] https://github.com/apache/james-project/pull/2129

However high/low level write buffer watermarks seems ineffective: it 
takes dozens of several MB messages to be written for the 
back-pressure to quick-in. And the default values (32KB/64KB) are 
very low compared to a problematic message size. Netty expertise is 
more than welcome here!


Another problem is that as of today message content is loaded as a 
byte array by the mailbox layer. For a request like IMAP FETCH 
(BODY[]) this is ineffective and we could rather be streaming it 
straight from the object store (even applying backpressure from 
within a single message write). Yet this would require a major 
refactoring of mailbox / imap code. And also a bullet proof lifecycle 
management for connections/ temporary files.


Thoughts?

Benoit



-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



Re: IMAP FETCH management

2024-03-20 Thread Benoit TELLIER

Hello all,

Today I did put together a POC where the following IMAP command

    a0 FETCH 1:* (BODY[])

would directly stream content from the S3 storage without storing the 
full input in a byte array.


I did test it a bit manually on top of the S3 AES implementation.

Link: https://github.com/apache/james-project/pull/2137

While working on this I stumbled across ReactorUtils::toInputStream 
which do not implement available (returns 0) and always block when 
trying to access the next chunk of data.
This would defeat most of the benefits of Netty's ChuckedStream 
abstraction: a reliable available method allows polling on it in the 
enventLoop and send data as it is ready.
Feeling brave I decided to experiment with a subscriber bringing the 
gaps between the NIO world and the reactor word.

This work is incomplete as usage in real life situation causes crash.

Link: https://github.com/apache/james-project/pull/2138

Other consideration doing this is also the need to increase the count of 
S3 connection as they are going to stay open longer...


Those are advanced topics and I believe they would be crucial into 
making Apache James a better IMAP server...


Best regards,

Benoit TELLIER

On 19/03/2024 16:45, Benoit TELLIER wrote:

Hello all,

As I had already been writing here, I did encounter significant issues 
during a recent deployment [1]


[1] 
https://www.mail-archive.com/server-dev@james.apache.org/msg73848.html


This did lead to [2] implementing backpressure for IMAP FETCH. Which 
had been mitigating the issue.


[2] https://issues.apache.org/jira/projects/JAMES/issues/JAMES-3997

But not really well-enough. As the count of users/mails increases I 
ended up with some new OutOfMemory exception related to IMAP usage 
from this weekend.


I thus did take the time to write a test regarding backpressure [3] 
(not reading the socket and instrumenting the mailbox layer to see 
what is actually pulled) and started playing with some related Netty 
settings [4].


[3] https://github.com/apache/james-project/pull/2128

[4] https://github.com/apache/james-project/pull/2129

However high/low level write buffer watermarks seems ineffective: it 
takes dozens of several MB messages to be written for the 
back-pressure to quick-in. And the default values (32KB/64KB) are very 
low compared to a problematic message size. Netty expertise is more 
than welcome here!


Another problem is that as of today message content is loaded as a 
byte array by the mailbox layer. For a request like IMAP FETCH 
(BODY[]) this is ineffective and we could rather be streaming it 
straight from the object store (even applying backpressure from within 
a single message write). Yet this would require a major refactoring of 
mailbox / imap code. And also a bullet proof lifecycle management for 
connections/ temporary files.


Thoughts?

Benoit



-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



IMAP FETCH management

2024-03-19 Thread Benoit TELLIER

Hello all,

As I had already been writing here, I did encounter significant issues 
during a recent deployment [1]


[1] https://www.mail-archive.com/server-dev@james.apache.org/msg73848.html

This did lead to [2] implementing backpressure for IMAP FETCH. Which had 
been mitigating the issue.


[2] https://issues.apache.org/jira/projects/JAMES/issues/JAMES-3997

But not really well-enough. As the count of users/mails increases I 
ended up with some new OutOfMemory exception related to IMAP usage from 
this weekend.


I thus did take the time to write a test regarding backpressure [3] (not 
reading the socket and instrumenting the mailbox layer to see what is 
actually pulled) and started playing with some related Netty settings [4].


[3] https://github.com/apache/james-project/pull/2128

[4] https://github.com/apache/james-project/pull/2129

However high/low level write buffer watermarks seems ineffective: it 
takes dozens of several MB messages to be written for the back-pressure 
to quick-in. And the default values (32KB/64KB) are very low compared to 
a problematic message size. Netty expertise is more than welcome here!


Another problem is that as of today message content is loaded as a byte 
array by the mailbox layer. For a request like IMAP FETCH (BODY[]) this 
is ineffective and we could rather be streaming it straight from the 
object store (even applying backpressure from within a single message 
write). Yet this would require a major refactoring of mailbox / imap 
code. And also a bullet proof lifecycle management for connections/ 
temporary files.


Thoughts?

Benoit


-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org