gortiz commented on issue #12078:
URL: https://github.com/apache/pinot/issues/12078#issuecomment-1884447441

   BTW, this issue is focused on the memory impact of the dictionary. But there 
is another theoretical improvement here. The solution proposed in 
#[12223](https://github.com/apache/pinot/pull/12223) has the side effect that 
two equal string literals that belong to the same column in different segments 
will _probably_ be resolved to the same Java String object.
   
   When working with ClickBench, I've seen that we waste a lot of time 
evaluating equals between actually equal (but not same) String objects when 
these Strings are used as aggregation keys. With this PR it is possible to find 
that these two equal String values that were read from different segments are 
actually the same String Java object, which means that the equals may be 
evaluated in constant time instead of linear (comparing all bytes).
   
   We should verify the impact in reality of this theoretical reasoning, but in 
case it actually shows an increase in performance, we could apply the same 
technique in the brokers when data is being read (interning strings sent by 
different servers). Although, as said in my previous message, I think the 
largest improvement would be to use a Str class that actually doesn't allocate 
in heap if it is not needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to