Being able to have something that can encode any object (or at least a
large class of objects) is extremely powerful so requiring
SerializableCoder<T> to only encode T.class would hurt our users.

I believe someone looked at this kind of problem before and we came to an
agreement of usng an explicit approve/deny list on the class names which
would address the security concern. I don't remember the thread though and
couldn't find the thread after a few minutes of searching.



On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]> wrote:

> So you think the spec for SerializableCoder<T> (currently doesn't really
> have one) should be that it dynamically dispatches what it deserializes? I
> had imagined we would treat it more as a statically determined coder, so
> because it is invariant in T we would not allow up or down casts (they are
> unsafe). But we probably don't actually have the static information to do
> that anyhow so you are probably right.
>
> I wonder about the threat model here. Is this the event that the runner
> (managed service or bespoke cluster) is compromised and is attempting RCE
> on the Java SDK harness or runner-specific Java-based worker?
>
> Kenn
>
> On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote:
>
>> I don't think this is going to work since SerializableCoder<T> should be
>> able to decode T and all objects that implement/extend T. I'm pretty sure
>> SerializableCoder<Set/List/...> is common enough while the concrete type is
>> HashSet/ArrayList/...
>> I'm pretty sure there is some way you could come up with some way for
>> making this optin though.
>>
>> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh <[email protected]>
>> wrote:
>>
>>> Thanks Kenn. I submitted a PR here:
>>> https://github.com/apache/beam/pull/11191
>>>
>>> Colm.
>>>
>>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> wrote:
>>>
>>>> I think this is fine. The same coder is used for encode and decode, so
>>>> the Class object should be the same as well. Inheritance is not part of the
>>>> Beam model (thank goodness) so this is a language-specific concern. As far
>>>> as the model is concerned, the full URN and the payload of the coder is its
>>>> identity and coders with different identities have no inheritance or
>>>> compatibility relationship. Pipeline snapshot/update is an edge case, but
>>>> changing coder is not supported by any runner I know of, and probably won't
>>>> be until we have some rather large new ideas.
>>>>
>>>> Kenn
>>>>
>>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a question on SerializableCoder. I'm looking at hardening the
>>>>> Java Object deserialization that is taking place. We have a "Class<T> 
>>>>> type"
>>>>> that is used to decode the input stream:
>>>>>
>>>>> ObjectInputStream ois = new ObjectInputStream(inStream);
>>>>> return type.cast(ois.readObject());
>>>>>
>>>>> What I would like to do would be something like:
>>>>>
>>>>> ObjectInputStream ois = new ObjectInputStream(inStream) {
>>>>>     @Override
>>>>>     protected Class<?> resolveClass(ObjectStreamClass desc) throws
>>>>> IOException, ClassNotFoundException {
>>>>>         if (!desc.getName().equals(type.getName())) {
>>>>>             throw new InvalidClassException("Unauthorized
>>>>> deserialization attempt", desc.getName());
>>>>>         }
>>>>>         return super.resolveClass(desc);
>>>>>     }
>>>>> };
>>>>> return type.cast(ois.readObject());
>>>>>
>>>>> This would prevent a possible security hole where an attacker could
>>>>> try to force the recipient of the input stream to deserialize to a gadget
>>>>> class or the like for a RCE.
>>>>>
>>>>> The question is - does the deserialized type have to correspond
>>>>> exactly to the supplied Class? Or is it supported that it's a base type /
>>>>> abstract class? If the latter then my idea won't really work. But if the
>>>>> type corresponds exactly then it should work OK.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Colm.
>>>>>
>>>>

Reply via email to