Hi Oscar, I assume the "dependency" in your description refers to the custom fields in the ProcessFunction's implementation. You are right that as the ProcessFunction inherits `Serializable` interface so we should make all fields either serializable or transient. As for performance, I have no data but theoretically, there be no much difference in most cases(in fact, maybe you are wondering the default serialization performance of JDK). For a long running streaming job, the constructor or open() method are usually not in the key path of performance. For best practice or to clean codes, in flink's abstraction, open() method is designed for one time setup work. So it is usually better to mark these fields as transient and initialize these fields in open() methods (especially when we need to do some extra work like creating db connection).
Hope it helps! Best, Biao Geng Oscar Perez via user <user@flink.apache.org> 于2024年4月4日周四 17:14写道: > Hi, > > We would like to adhere to clean code and expose all dependencies in the > constructor of the process functions > > In flink, however, all dependencies passed to process functions must be > serializable. Another workaround is to instantiate these dependencies in > the open method of the process function and declare this dependency > transient > > I wonder how, performance wise, would impact the performance of the job if > we declare all dependencies in the constructor and make them serializable. > Is this a wrong pattern to do? Has anybody run any experiment on > performance degradation of dependency exposed in the constructor vs > declaring it in the open method? > > Thanks! > Oscar >