Being able to have something that can encode any object (or at least a large class of objects) is extremely powerful so requiring SerializableCoder<T> to only encode T.class would hurt our users.
I believe someone looked at this kind of problem before and we came to an agreement of usng an explicit approve/deny list on the class names which would address the security concern. I don't remember the thread though and couldn't find the thread after a few minutes of searching. On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]> wrote: > So you think the spec for SerializableCoder<T> (currently doesn't really > have one) should be that it dynamically dispatches what it deserializes? I > had imagined we would treat it more as a statically determined coder, so > because it is invariant in T we would not allow up or down casts (they are > unsafe). But we probably don't actually have the static information to do > that anyhow so you are probably right. > > I wonder about the threat model here. Is this the event that the runner > (managed service or bespoke cluster) is compromised and is attempting RCE > on the Java SDK harness or runner-specific Java-based worker? > > Kenn > > On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote: > >> I don't think this is going to work since SerializableCoder<T> should be >> able to decode T and all objects that implement/extend T. I'm pretty sure >> SerializableCoder<Set/List/...> is common enough while the concrete type is >> HashSet/ArrayList/... >> I'm pretty sure there is some way you could come up with some way for >> making this optin though. >> >> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh <[email protected]> >> wrote: >> >>> Thanks Kenn. I submitted a PR here: >>> https://github.com/apache/beam/pull/11191 >>> >>> Colm. >>> >>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> wrote: >>> >>>> I think this is fine. The same coder is used for encode and decode, so >>>> the Class object should be the same as well. Inheritance is not part of the >>>> Beam model (thank goodness) so this is a language-specific concern. As far >>>> as the model is concerned, the full URN and the payload of the coder is its >>>> identity and coders with different identities have no inheritance or >>>> compatibility relationship. Pipeline snapshot/update is an edge case, but >>>> changing coder is not supported by any runner I know of, and probably won't >>>> be until we have some rather large new ideas. >>>> >>>> Kenn >>>> >>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a question on SerializableCoder. I'm looking at hardening the >>>>> Java Object deserialization that is taking place. We have a "Class<T> >>>>> type" >>>>> that is used to decode the input stream: >>>>> >>>>> ObjectInputStream ois = new ObjectInputStream(inStream); >>>>> return type.cast(ois.readObject()); >>>>> >>>>> What I would like to do would be something like: >>>>> >>>>> ObjectInputStream ois = new ObjectInputStream(inStream) { >>>>> @Override >>>>> protected Class<?> resolveClass(ObjectStreamClass desc) throws >>>>> IOException, ClassNotFoundException { >>>>> if (!desc.getName().equals(type.getName())) { >>>>> throw new InvalidClassException("Unauthorized >>>>> deserialization attempt", desc.getName()); >>>>> } >>>>> return super.resolveClass(desc); >>>>> } >>>>> }; >>>>> return type.cast(ois.readObject()); >>>>> >>>>> This would prevent a possible security hole where an attacker could >>>>> try to force the recipient of the input stream to deserialize to a gadget >>>>> class or the like for a RCE. >>>>> >>>>> The question is - does the deserialized type have to correspond >>>>> exactly to the supplied Class? Or is it supported that it's a base type / >>>>> abstract class? If the latter then my idea won't really work. But if the >>>>> type corresponds exactly then it should work OK. >>>>> >>>>> Thanks, >>>>> >>>>> Colm. >>>>> >>>>
