Kenneth Knowles created BEAM-3227:
-------------------------------------

             Summary: Consider sharing Udf/SkdFunctionSpec records via pointer
                 Key: BEAM-3227
                 URL: https://issues.apache.org/jira/browse/BEAM-3227
             Project: Beam
          Issue Type: Sub-task
          Components: beam-model
            Reporter: Kenneth Knowles


Coders are stored by pointer, because they are often repeated and a common 
source of huge pipeline descriptions.

We considered doing the same for all UDFs but decided not to, based on the 
logic that they are not as often identical and will rarely implement the 
equals() needed to actually share encoded versions.

However, in the presence of generated code, it is very likely that DoFns and 
CombineFns are repeated, and also much more likely that they have meaningful 
equals(), so there could be size savings.

None of this is terribly important for storage or transmission, but has more to 
do with arbitrary and small size limits that occur in some API frameworks or 
database column types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to