The link to the previous covnersation (discussion happened in private@ and I suppose we can bring some relevant bits here if needed) https://lists.apache.org/thread.html/2e1c00999e992e15b08938866bfe7bd3c3d3b3d4d7aa2f8f6eb4600d%40%3Cprivate.beam.apache.org%3E
I remember Robert had some points there, but I am not sure we found/agreed on a solution that was relevant and did not break current users and their user experience (like the case of blacklists). On Mon, Mar 23, 2020 at 9:22 PM Luke Cwik <[email protected]> wrote: > > Being able to have something that can encode any object (or at least a large > class of objects) is extremely powerful so requiring SerializableCoder<T> to > only encode T.class would hurt our users. > > I believe someone looked at this kind of problem before and we came to an > agreement of usng an explicit approve/deny list on the class names which > would address the security concern. I don't remember the thread though and > couldn't find the thread after a few minutes of searching. > > > > On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]> wrote: >> >> So you think the spec for SerializableCoder<T> (currently doesn't really >> have one) should be that it dynamically dispatches what it deserializes? I >> had imagined we would treat it more as a statically determined coder, so >> because it is invariant in T we would not allow up or down casts (they are >> unsafe). But we probably don't actually have the static information to do >> that anyhow so you are probably right. >> >> I wonder about the threat model here. Is this the event that the runner >> (managed service or bespoke cluster) is compromised and is attempting RCE on >> the Java SDK harness or runner-specific Java-based worker? >> >> Kenn >> >> On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote: >>> >>> I don't think this is going to work since SerializableCoder<T> should be >>> able to decode T and all objects that implement/extend T. I'm pretty sure >>> SerializableCoder<Set/List/...> is common enough while the concrete type is >>> HashSet/ArrayList/... >>> I'm pretty sure there is some way you could come up with some way for >>> making this optin though. >>> >>> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh <[email protected]> >>> wrote: >>>> >>>> Thanks Kenn. I submitted a PR here: >>>> https://github.com/apache/beam/pull/11191 >>>> >>>> Colm. >>>> >>>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> wrote: >>>>> >>>>> I think this is fine. The same coder is used for encode and decode, so >>>>> the Class object should be the same as well. Inheritance is not part of >>>>> the Beam model (thank goodness) so this is a language-specific concern. >>>>> As far as the model is concerned, the full URN and the payload of the >>>>> coder is its identity and coders with different identities have no >>>>> inheritance or compatibility relationship. Pipeline snapshot/update is an >>>>> edge case, but changing coder is not supported by any runner I know of, >>>>> and probably won't be until we have some rather large new ideas. >>>>> >>>>> Kenn >>>>> >>>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh <[email protected]> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have a question on SerializableCoder. I'm looking at hardening the >>>>>> Java Object deserialization that is taking place. We have a "Class<T> >>>>>> type" that is used to decode the input stream: >>>>>> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream); >>>>>> return type.cast(ois.readObject()); >>>>>> >>>>>> What I would like to do would be something like: >>>>>> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream) { >>>>>> @Override >>>>>> protected Class<?> resolveClass(ObjectStreamClass desc) throws >>>>>> IOException, ClassNotFoundException { >>>>>> if (!desc.getName().equals(type.getName())) { >>>>>> throw new InvalidClassException("Unauthorized >>>>>> deserialization attempt", desc.getName()); >>>>>> } >>>>>> return super.resolveClass(desc); >>>>>> } >>>>>> }; >>>>>> return type.cast(ois.readObject()); >>>>>> >>>>>> This would prevent a possible security hole where an attacker could try >>>>>> to force the recipient of the input stream to deserialize to a gadget >>>>>> class or the like for a RCE. >>>>>> >>>>>> The question is - does the deserialized type have to correspond exactly >>>>>> to the supplied Class? Or is it supported that it's a base type / >>>>>> abstract class? If the latter then my idea won't really work. But if the >>>>>> type corresponds exactly then it should work OK. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Colm.
