+1 to forbidding Stream usage entirely; the convenience of using them outside 
of hot paths is less than the burden of figuring out whether or not a 
particular path is hot. Even for reviewers it can be difficult to tell whether 
a particular path is hot; hard-to-diagnose bugs like CASSANDRA-18110 
<https://issues.apache.org/jira/browse/CASSANDRA-18110> were somewhat caused by 
this ambiguity.

I wonder if there are ways we can better automate tracking of 
performance-sensitive paths. I know FB's Infer claims to be able to do this[1], 
but is limited in practice.

[1]: https://fbinfer.com/docs/checker-annotation-reachability

> On May 31, 2024, at 10:19 AM, Benedict Elliott Smith <bened...@apache.org> 
> wrote:
> 
> I think I have already proposed a simple solution to this problem on the 
> thread: if anyone says it’s a hot path (and cannot be persuaded otherwise), 
> it should be treated as such. Saves argument, but permits an easy escape 
> hatch if everyone agrees with minimal discussion.
> 
> I think this is a good general principle for raising standards in the 
> codebase like this: if somebody says something is important, and cannot be 
> convinced otherwise, then it should generally be treated as important. This 
> is different from cases where there are simply competing approaches.
> 
> That said, if people want to be absolutist about this I won’t mind.
> 
> 
> 
>> On 31 May 2024, at 15:04, Benjamin Lerer <b.le...@gmail.com> wrote:
>> 
>> For me the definition of hot path is too vague. We had arguments with 
>> Berenger multiple times and it is more a waste of time than anything else at 
>> the end. If we are truly concerned about stream efficiency then we should 
>> simply forbid them. That will avoid lengthy discussions about what 
>> constitute the hot path and what does not.
>> 
>> Le ven. 31 mai 2024 à 11:08, Berenguer Blasi <berenguerbl...@gmail.com 
>> <mailto:berenguerbl...@gmail.com>> a écrit :
>>> +1 on avoiding streams in hot paths
>>> 
>>> On 31/5/24 9:48, Benedict wrote:
>>>> My concept of hot path is simply anything we can expect to be called 
>>>> frequently enough in normal operation that it might show up in a profiler. 
>>>> If it’s a library method then it’s reasonable to assume it should be able 
>>>> to be used in a hot path unless clearly labelled otherwise. 
>>>> 
>>>> In my view this includes things that might normally be masked by caching 
>>>> but under supported workloads may not be - such as query preparation.
>>>> 
>>>> In fact, I’d say the default assumption should probably be that a method 
>>>> is “in a hot path” unless there’s good argument they aren’t - such as that 
>>>> the operation is likely to be run at some low frequency and the slow part 
>>>> is not part of any loop. Repair setup messages perhaps aren’t a hot path 
>>>> for instance (unless many of them are sent per repair), but validation 
>>>> compaction or merkle tree construction definitely is.
>>>> 
>>>> I think it’s fine to not have perfect agreement about edge cases, but if 
>>>> anyone in a discussion thinks something is a hot path then it should be 
>>>> treated as one IMO.
>>>> 
>>>>> On 30 May 2024, at 18:39, David Capwell <dcapw...@apple.com> 
>>>>> <mailto:dcapw...@apple.com> wrote:
>>>>> 
>>>>>  As a general statement I agree with you (same for String.format as 
>>>>> well), but one thing to call out is that it can be hard to tell what is 
>>>>> the hot path and what isn’t.  When you are doing background work (like 
>>>>> repair) its clear, but when touching something internal it can be hard to 
>>>>> tell; this can also be hard with shared code as it gets authored outside 
>>>>> the hot path then later used in the hot path…
>>>>> 
>>>>> Also, what defines hot path?  Is this user facing only?  What about 
>>>>> Validation/Streaming (stuff processing a large dataset)?  
>>>>> 
>>>>>> On May 30, 2024, at 9:29 AM, Benedict <bened...@apache.org> 
>>>>>> <mailto:bened...@apache.org> wrote:
>>>>>> 
>>>>>> Since it’s related to the logging discussion we’re already having, I 
>>>>>> have seen stream pipelines showing up in a lot of traces recently. I am 
>>>>>> surprised; I thought it was understood that they shouldn’t be used on 
>>>>>> hot paths as they are not typically as efficient as old skool for-each 
>>>>>> constructions done sensibly, especially for small collections that may 
>>>>>> normally take zero or one items.
>>>>>> 
>>>>>> I would like to propose forbidding the use of streams on hot paths 
>>>>>> without good justification that the cost:benefit is justified. 
>>>>>> 
>>>>>> It looks like it was nominally agreed two years ago that we would 
>>>>>> include words to this effect in the code style guide, but I forgot to 
>>>>>> include them when I transferred the new contents from the Google Doc 
>>>>>> proposal. So we could just include the “Performance” section that was 
>>>>>> meant to be included at the time.
>>>>>> 
>>>>>> lists.apache.org
>>>>>> <favicon.ico>
>>>>>>  
>>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>lists.apache.org
>>>>>>  <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>   
>>>>>> <favicon.ico> 
>>>>>> <https://lists.apache.org/thread/1mt8rsg36p1mq8s8578l6js075lrmvlt>
>>>>>> 
>>>>>> 
>>>>>>> On 30 May 2024, at 13:33, Štefan Miklošovič 
>>>>>>> <stefan.mikloso...@gmail.com> <mailto:stefan.mikloso...@gmail.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> I see the feedback is overall positive. I will merge that and I will 
>>>>>>> improve the documentation on the website along with what Benedict 
>>>>>>> suggested.
>>>>>>> 
>>>>>>> On Thu, May 30, 2024 at 10:32 AM Mick Semb Wever <m...@apache.org 
>>>>>>> <mailto:m...@apache.org>> wrote:
>>>>>>>>   
>>>>>>>>  
>>>>>>>>> Based on these findings, I went through the code and I have 
>>>>>>>>> incorporated these rules and I rewrote it like this:
>>>>>>>>> 
>>>>>>>>> 1) no wrapping in "if" if we are not logging more than 2 parameters.
>>>>>>>>> 2) rewritten log messages to not contain any string concatenation but 
>>>>>>>>> moving it all to placeholders ({}).
>>>>>>>>> 3) wrap it in "if" if we need to execute a method(s) on parameter(s) 
>>>>>>>>> which is resource-consuming.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It's a shame slf4j botched it with lambdas, their 2.0 fluent api 
>>>>>>>> doesn't impress me.
>>>>> 
> 

Reply via email to