Re: [DISCUSS] Stream Pipelines on hot paths

David Capwell Fri, 31 May 2024 04:53:29 -0700

My concept of hot path is simply anything we can expect to be called frequently enough in normal operation that it might show up in a profiler. If it’s a library method then it’s reasonable to assume it should be able to be used in a hot path unless clearly labelled otherwise.

In fact, I’d say the default assumption should probably be that a method is “in a hot path” unless there’s good argument they aren’t

I think it’s fine to not have perfect agreement about edge cases, but if anyone in a discussion thinks something is a hot path then it should be treated as one IMO.

Works for me. If we are going to use the term “fast path” in the style guides then we should define it similarly to this; undefined terms are hard as everyone will have their own definitions

Sent from my iPhone

On May 31, 2024, at 2:08 AM, Berenguer Blasi <[email protected]> wrote:

+1 on avoiding streams in hot paths

On 31/5/24 9:48, Benedict wrote:

My concept of hot path is simply anything we can expect to be called frequently enough in normal operation that it might show up in a profiler. If it’s a library method then it’s reasonable to assume it should be able to be used in a hot path unless clearly labelled otherwise.

In my view this includes things that might normally be masked by caching but under supported workloads may not be - such as query preparation.

In fact, I’d say the default assumption should probably be that a method is “in a hot path” unless there’s good argument they aren’t - such as that the operation is likely to be run at some low frequency and the slow part is not part of any loop. Repair setup messages perhaps aren’t a hot path for instance (unless many of them are sent per repair), but validation compaction or merkle tree construction definitely is.

I think it’s fine to not have perfect agreement about edge cases, but if anyone in a discussion thinks something is a hot path then it should be treated as one IMO.

On 30 May 2024, at 18:39, David Capwell <[email protected]> wrote:

As a general statement I agree with you (same for String.format as well), but one thing to call out is that it can be hard to tell what is the hot path and what isn’t. When you are doing background work (like repair) its clear, but when touching something internal it can be hard to tell; this can also be hard with shared code as it gets authored outside the hot path then later used in the hot path…

Also, what defines hot path? Is this user facing only? What about Validation/Streaming (stuff processing a large dataset)?

On May 30, 2024, at 9:29 AM, Benedict <[email protected]> wrote:

Since it’s related to the logging discussion we’re already having, I have seen stream pipelines showing up in a lot of traces recently. I am surprised; I thought it was understood that they shouldn’t be used on hot paths as they are not typically as efficient as old skool for-each constructions done sensibly, especially for small collections that may normally take zero or one items.

I would like to propose forbidding the use of streams on hot paths without good justification that the cost:benefit is justified.

It looks like it was nominally agreed two years ago that we would include words to this effect in the code style guide, but I forgot to include them when I transferred the new contents from the Google Doc proposal. So we could just include the “Performance” section that was meant to be included at the time.

lists.apache.org

<favicon.ico>

On 30 May 2024, at 13:33, Štefan Miklošovič <[email protected]> wrote:

I see the feedback is overall positive. I will merge that and I will improve the documentation on the website along with what Benedict suggested.

On Thu, May 30, 2024 at 10:32 AM Mick Semb Wever <[email protected]> wrote:

Based on these findings, I went through the code and I have incorporated these rules and I rewrote it like this:

1) no wrapping in "if" if we are not logging more than 2 parameters.
2) rewritten log messages to not contain any string concatenation but moving it all to placeholders ({}).
3) wrap it in "if" if we need to execute a method(s) on parameter(s) which is resource-consuming.

+1

It's a shame slf4j botched it with lambdas, their 2.0 fluent api doesn't impress me.

Re: [DISCUSS] Stream Pipelines on hot paths

Reply via email to