Thank you Arnout for starting this thread.

I think it's going to be hard to come up with a sensible statement for all 20+ 
Commons components without categorizing them (some higher/lower level 
classification) even though this thread only refers to four components. 

We can make some general statements: For my $, we should deprecate all 
serialization and deserialization, and remove that support in next major 
versions, for all components. This is the simplest solution. We can recommend 
using serialization proxies a la Effective Java (Bloch). In practical terms, 
this means no longer implementing Serializable and not deserializing anything.

For RCEs and us calling APIs like Runtime#exec(...), it's not so simple, 
because we have some components that are in effect languages (JEXL for 
example), so there we can start with saying you must sanitize your inputs, 
which is "hard" to saying we'll provide allow and/or deny lists in the future.

For memory and CPU consumption issues, we want to avoid zip-bomb issues, and 
that's where code inspections and fuzzing will help and has already helped. 
Commons Imaging and Compress are two obvious targets here. It's going to take 
some rework of internals to allow for call sites to say something like: 
"process this blob but only use a max of 10 MB".

Gary

On 2023/12/14 11:09:18 Arnout Engelen wrote:
> Hello Commons developers,
> 
> I'd like to discuss what our security ambitions are for components like
> Commons Imaging, Compress, Codec and IO:
> 
> Generally for Commons, we say that unless otherwise specified it is up to
> the user of the library to make sure any input is either trusted or
> correctly validated/sanitized (https://commons.apache.org/security.html).
> 
> For these modules it might make sense to be a little more nuanced:
> https://commons.apache.org/proper/commons-imaging/ already explicitly says
> it intends to be "more secure against corrupt/malicious images", and while
> the others don't seem to say it explicitly AFAICS in practice we consider
> it OK to decompress/decode/... untrusted input at least to some degree.
> 
> So what does that mean?
> 
> * I'd say parsing/decompression/decoding should never allow malicious input
> to trigger arbitrary code execution(?)
> * Should parsing/decompression/decoding protect against 'disproportionate'
> CPU usage?
> * Should parsing/decompression/decoding protect against 'disproportionate'
> memory usage?
> * Should parsing/decompression/decoding protect against 'disproportionate'
> disk usage?
> 
> Where we say 'yes', we should also decide whether we intend to treat such
> issues as security problems (that should be fixed with some priority and,
> after release, disclosed in an advisory) or bugs/improvements (where we can
> possibly take more of an 'issues and patches welcome' position).
> 
> I'm curious about your thoughts!
> 
> 
> -- 
> Arnout Engelen
> ASF Security Response
> Committer on Apache Pekko
> Committer on NixOS
> Independent Open Source consultant
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to