Hi Yunfeng, Thanks for your clarification. It might make sense to add one more performance test with object-reuse disabled alongside to let us know how big the improvement will be.
Best regards, Jing On Thu, Jul 13, 2023 at 11:51 AM Matt Wang <wang...@163.com> wrote: > Hi Yunfeng, > > Thanks for the proposal. The POC showed a performance improvement of 20%, > which is very exciting. But I have some questions: > 1. Is the performance improvement here mainly due to the reduction of > serialization, or is it due to the judgment consumption caused by tags? > 2. Watermark is not needed in some scenarios, but the latency maker is a > useful function. If the latency maker cannot be used, it will greatly limit > the usage scenarios. Whether the solution design can retain the capability > of the latency marker; > 3. The data of the POC test is of long type. Here I want to see how much > profit it will have if it is a string with a length of 100B or 1KB. > > > -- > > Best, > Matt Wang > > > ---- Replied Message ---- > | From | Yunfeng Zhou<flink.zhouyunf...@gmail.com> | > | Date | 07/13/2023 14:52 | > | To | <dev@flink.apache.org> | > | Subject | Re: [DISCUSS] FLIP-330: Support specifying record timestamp > requirement | > Hi Jing, > > Thanks for reviewing this FLIP. > > 1. I did change the names of some APIs in the FLIP compared with the > original version according to which I implemented the POC. As the core > optimization logic remains the same and the POC's performance can > still reflect the current FLIP's expected improvement, I have not > updated the POC code after that. I'll add a note on the benchmark > section of the FLIP saying that the namings in the POC code might be > outdated, and FLIP is still the source of truth for our proposed > design. > > 2. This FLIP could bring a fixed reduction on the workload of the > per-record serialization path in Flink, so if the absolute time cost > by non-optimized components could be lower, the performance > improvement of this FLIP would be more obvious. That's why I chose to > enable object-reuse and to transmit Boolean values in serialization. > If it would be more widely regarded as acceptable for a benchmark to > adopt more commonly-applied behavior(for object reuse, I believe > disable is more common), I would be glad to update the benchmark > result to disable object reuse. > > Best regards, > Yunfeng > > > On Thu, Jul 13, 2023 at 6:37 AM Jing Ge <j...@ververica.com.invalid> > wrote: > > Hi Yunfeng, > > Thanks for the proposal. It makes sense to offer the optimization. I got > some NIT questions. > > 1. I guess you changed your thoughts while coding the POC, I found > pipeline.enable-operator-timestamp in the code but is > pipeline.force-timestamp-support defined in the FLIP > 2. about the benchmark example, why did you enable object reuse? Since It > is an optimization of serde, will the benchmark be better if it is > disabled? > > Best regards, > Jing > > On Mon, Jul 10, 2023 at 11:54 AM Yunfeng Zhou <flink.zhouyunf...@gmail.com > > > wrote: > > Hi all, > > Dong(cc'ed) and I are opening this thread to discuss our proposal to > support optimizing StreamRecord's serialization performance. > > Currently, a StreamRecord would be converted into a 1-byte tag (+ > 8-byte timestamp) + N-byte serialized value during the serialization > process. In scenarios where timestamps and watermarks are not needed, > and latency tracking is enabled, this process would include > unnecessary information in the serialized byte array. This FLIP aims > to avoid such overhead and increases Flink job's performance during > serialization. > > Please refer to the FLIP document for more details about the proposed > design and implementation. We welcome any feedback and opinions on > this proposal. > > Best regards, Dong and Yunfeng > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-330%3A+Support+specifying+record+timestamp+requirement > >