I have seen people ingest data using SerializableCoder. On Mon, Mar 23, 2020 at 2:51 PM Kenneth Knowles <[email protected]> wrote:
> I won't bring other people's words from private@, but can share mine. I > don't believe it exposes anything new. > > > If it is SerializableCoder - attacker controls the other end of e.g. > Kafka or Pubsub that is decoding w/ ObjectInputStream - [then we could have > an allowlist or try to automatically construct an allowlist] and otherwise > there is no vulnerability for internal coders. > > I have never seen or heard of a user doing dynamic deserialization > dispatch on ingestion, but that doesn't mean it doesn't happen. If it is > important to someone then they would need a more secure solution than > SerializableCoder. > > Side note: it would be great to provide an efficient and usable solution > for the problem of wanting to dynamically dispatch serde in the middle of a > pipeline. It is actually independent from being able to provide coders for > a wide variety of types, which we can do a bunch of different (mostly > better) ways. (has a better solution been built since the last time I > thought about this?) > > Kenn > > On Mon, Mar 23, 2020 at 1:36 PM Ismaël Mejía <[email protected]> wrote: > >> The link to the previous covnersation (discussion happened in private@ >> and I suppose we can bring some relevant bits here if needed) >> >> https://lists.apache.org/thread.html/2e1c00999e992e15b08938866bfe7bd3c3d3b3d4d7aa2f8f6eb4600d%40%3Cprivate.beam.apache.org%3E >> >> I remember Robert had some points there, but I am not sure we >> found/agreed on a solution that was relevant and did not break current >> users and their user experience (like the case of blacklists). >> >> On Mon, Mar 23, 2020 at 9:22 PM Luke Cwik <[email protected]> wrote: >> > >> > Being able to have something that can encode any object (or at least a >> large class of objects) is extremely powerful so requiring >> SerializableCoder<T> to only encode T.class would hurt our users. >> > >> > I believe someone looked at this kind of problem before and we came to >> an agreement of usng an explicit approve/deny list on the class names which >> would address the security concern. I don't remember the thread though and >> couldn't find the thread after a few minutes of searching. >> > >> > >> > >> > On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]> >> wrote: >> >> >> >> So you think the spec for SerializableCoder<T> (currently doesn't >> really have one) should be that it dynamically dispatches what it >> deserializes? I had imagined we would treat it more as a statically >> determined coder, so because it is invariant in T we would not allow up or >> down casts (they are unsafe). But we probably don't actually have the >> static information to do that anyhow so you are probably right. >> >> >> >> I wonder about the threat model here. Is this the event that the >> runner (managed service or bespoke cluster) is compromised and is >> attempting RCE on the Java SDK harness or runner-specific Java-based worker? >> >> >> >> Kenn >> >> >> >> On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote: >> >>> >> >>> I don't think this is going to work since SerializableCoder<T> should >> be able to decode T and all objects that implement/extend T. I'm pretty >> sure SerializableCoder<Set/List/...> is common enough while the concrete >> type is HashSet/ArrayList/... >> >>> I'm pretty sure there is some way you could come up with some way for >> making this optin though. >> >>> >> >>> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh < >> [email protected]> wrote: >> >>>> >> >>>> Thanks Kenn. I submitted a PR here: >> https://github.com/apache/beam/pull/11191 >> >>>> >> >>>> Colm. >> >>>> >> >>>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> >> wrote: >> >>>>> >> >>>>> I think this is fine. The same coder is used for encode and decode, >> so the Class object should be the same as well. Inheritance is not part of >> the Beam model (thank goodness) so this is a language-specific concern. As >> far as the model is concerned, the full URN and the payload of the coder is >> its identity and coders with different identities have no inheritance or >> compatibility relationship. Pipeline snapshot/update is an edge case, but >> changing coder is not supported by any runner I know of, and probably won't >> be until we have some rather large new ideas. >> >>>>> >> >>>>> Kenn >> >>>>> >> >>>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh < >> [email protected]> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I have a question on SerializableCoder. I'm looking at hardening >> the Java Object deserialization that is taking place. We have a "Class<T> >> type" that is used to decode the input stream: >> >>>>>> >> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream); >> >>>>>> return type.cast(ois.readObject()); >> >>>>>> >> >>>>>> What I would like to do would be something like: >> >>>>>> >> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream) { >> >>>>>> @Override >> >>>>>> protected Class<?> resolveClass(ObjectStreamClass desc) throws >> IOException, ClassNotFoundException { >> >>>>>> if (!desc.getName().equals(type.getName())) { >> >>>>>> throw new InvalidClassException("Unauthorized >> deserialization attempt", desc.getName()); >> >>>>>> } >> >>>>>> return super.resolveClass(desc); >> >>>>>> } >> >>>>>> }; >> >>>>>> return type.cast(ois.readObject()); >> >>>>>> >> >>>>>> This would prevent a possible security hole where an attacker >> could try to force the recipient of the input stream to deserialize to a >> gadget class or the like for a RCE. >> >>>>>> >> >>>>>> The question is - does the deserialized type have to correspond >> exactly to the supplied Class? Or is it supported that it's a base type / >> abstract class? If the latter then my idea won't really work. But if the >> type corresponds exactly then it should work OK. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> Colm. >> >
