wesm commented on PR #13753: URL: https://github.com/apache/arrow/pull/13753#issuecomment-1206908421
I do not think the implementation is less robust — the only change is that ephemeral `DataType` instances cannot be used to instantiate `InputType` without creating a TypeMatcher that wraps them. If the developer does create a KernelSignature / InputType with an ephemeral `shared_ptr<DataType>`, the signature would be unusable so any unit test which attempts to use the kernel would segfault. The probability of an Arrow user ever facing a bug related to this essentially zero — it would amount to us merging untested kernels code (because tests would immediately reveal the programming error). It also comes down to a cost philosophy in this code: * Pay the cost of using shared_ptr always — both in generated code (because most shared_ptr operations are inlined by the compiler) and in performance * Pay the cost only when it's needed It turns out that in this code, that retaining a `shared_ptr<DataType>` for kernel function signatures and type validation is rarely needed. Why pay the cost for > 95% of cases (or whatever the number is?) when the shared_ptr is only actually needed in < 5% of cases? I recall that we've already seen shared_ptr contention show up unfavorably in performance profiles. I will take the time when I can to write some benchmarks (for me, the present-and-future savings in generated code is evidence enough) that will hopefully provide some additional evidence to support this. In the meantime I hope that the rebase conflicts will not be too annoying to keep up with. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
