+1 to forbidding Stream usage entirely; the convenience of using them outside of hot paths is less than the burden of figuring out whether or not a particular path is hot. Even for reviewers it can be difficult to tell whether a particular path is hot; hard-to-diagnose bugs like CASSANDRA-18110 <https://issues.apache.org/jira/browse/CASSANDRA-18110> were somewhat caused by this ambiguity.
I wonder if there are ways we can better automate tracking of performance-sensitive paths. I know FB's Infer claims to be able to do this[1], but is limited in practice. [1]: https://fbinfer.com/docs/checker-annotation-reachability > On May 31, 2024, at 10:19 AM, Benedict Elliott Smith <bened...@apache.org> > wrote: > > I think I have already proposed a simple solution to this problem on the > thread: if anyone says it’s a hot path (and cannot be persuaded otherwise), > it should be treated as such. Saves argument, but permits an easy escape > hatch if everyone agrees with minimal discussion. > > I think this is a good general principle for raising standards in the > codebase like this: if somebody says something is important, and cannot be > convinced otherwise, then it should generally be treated as important. This > is different from cases where there are simply competing approaches. > > That said, if people want to be absolutist about this I won’t mind. > > > >> On 31 May 2024, at 15:04, Benjamin Lerer <b.le...@gmail.com> wrote: >> >> For me the definition of hot path is too vague. We had arguments with >> Berenger multiple times and it is more a waste of time than anything else at >> the end. If we are truly concerned about stream efficiency then we should >> simply forbid them. That will avoid lengthy discussions about what >> constitute the hot path and what does not. >> >> Le ven. 31 mai 2024 à 11:08, Berenguer Blasi <berenguerbl...@gmail.com >> <mailto:berenguerbl...@gmail.com>> a écrit : >>> +1 on avoiding streams in hot paths >>> >>> On 31/5/24 9:48, Benedict wrote: >>>> My concept of hot path is simply anything we can expect to be called >>>> frequently enough in normal operation that it might show up in a profiler. >>>> If it’s a library method then it’s reasonable to assume it should be able >>>> to be used in a hot path unless clearly labelled otherwise. >>>> >>>> In my view this includes things that might normally be masked by caching >>>> but under supported workloads may not be - such as query preparation. >>>> >>>> In fact, I’d say the default assumption should probably be that a method >>>> is “in a hot path” unless there’s good argument they aren’t - such as that >>>> the operation is likely to be run at some low frequency and the slow part >>>> is not part of any loop. Repair setup messages perhaps aren’t a hot path >>>> for instance (unless many of them are sent per repair), but validation >>>> compaction or merkle tree construction definitely is. >>>> >>>> I think it’s fine to not have perfect agreement about edge cases, but if >>>> anyone in a discussion thinks something is a hot path then it should be >>>> treated as one IMO. >>>> >>>>> On 30 May 2024, at 18:39, David Capwell <dcapw...@apple.com> >>>>> <mailto:dcapw...@apple.com> wrote: >>>>> >>>>> As a general statement I agree with you (same for String.format as >>>>> well), but one thing to call out is that it can be hard to tell what is >>>>> the hot path and what isn’t. When you are doing background work (like >>>>> repair) its clear, but when touching something internal it can be hard to >>>>> tell; this can also be hard with shared code as it gets authored outside >>>>> the hot path then later used in the hot path… >>>>> >>>>> Also, what defines hot path? Is this user facing only? What about >>>>> Validation/Streaming (stuff processing a large dataset)? >>>>> >>>>>> On May 30, 2024, at 9:29 AM, Benedict <bened...@apache.org> >>>>>> <mailto:bened...@apache.org> wrote: >>>>>> >>>>>> Since it’s related to the logging discussion we’re already having, I >>>>>> have seen stream pipelines showing up in a lot of traces recently. I am >>>>>> surprised; I thought it was understood that they shouldn’t be used on >>>>>> hot paths as they are not typically as efficient as old skool for-each >>>>>> constructions done sensibly, especially for small collections that may >>>>>> normally take zero or one items. >>>>>> >>>>>> I would like to propose forbidding the use of streams on hot paths >>>>>> without good justification that the cost:benefit is justified. >>>>>> >>>>>> It looks like it was nominally agreed two years ago that we would >>>>>> include words to this effect in the code style guide, but I forgot to >>>>>> include them when I transferred the new contents from the Google Doc >>>>>> proposal. So we could just include the “Performance” section that was >>>>>> meant to be included at the time. >>>>>> >>>>>> lists.apache.org >>>>>> <favicon.ico> >>>>>> >>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>lists.apache.org >>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt> >>>>>> <favicon.ico> >>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt> >>>>>> >>>>>> >>>>>>> On 30 May 2024, at 13:33, Štefan Miklošovič >>>>>>> <stefan.mikloso...@gmail.com> <mailto:stefan.mikloso...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> I see the feedback is overall positive. I will merge that and I will >>>>>>> improve the documentation on the website along with what Benedict >>>>>>> suggested. >>>>>>> >>>>>>> On Thu, May 30, 2024 at 10:32 AM Mick Semb Wever <m...@apache.org >>>>>>> <mailto:m...@apache.org>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Based on these findings, I went through the code and I have >>>>>>>>> incorporated these rules and I rewrote it like this: >>>>>>>>> >>>>>>>>> 1) no wrapping in "if" if we are not logging more than 2 parameters. >>>>>>>>> 2) rewritten log messages to not contain any string concatenation but >>>>>>>>> moving it all to placeholders ({}). >>>>>>>>> 3) wrap it in "if" if we need to execute a method(s) on parameter(s) >>>>>>>>> which is resource-consuming. >>>>>>>> >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> >>>>>>>> It's a shame slf4j botched it with lambdas, their 2.0 fluent api >>>>>>>> doesn't impress me. >>>>> >