Dandandan commented on PR #19464:
URL: https://github.com/apache/datafusion/pull/19464#issuecomment-3691207724

   > Thanks @Dandandan for addressing this, I'm planning to run TPCDS query set 
later this week. After first glance the PR makes sense , so you keep TopK 
values in map[string, usize] where the value points to correspondent TopK value 
in `store`.
   > 
   > some thoughts aloud:
   > 
   > * Do you see any reason if `map` and `store` would be out of sync?
   > * `free_index` is to track removed items? so this points to available spot 
for new entry in the `store`, do you think it can be overwritten in concurrent 
environment and effectively we losing the spot that points to overwritten id 
forever?
   
   1. I don't see why they would be out of sync, the store stays just in place 
(it can only grow) and the indices remain valid. In case of an implementation 
bug it would panic (out of bounds) or give wrong results instead of UB we had 
before.
   2. It is to track the latest removed item before it is added again (removing 
the worst from the group before adding new) . The code isn't used in a 
concurrent way I believe.
   
   Perhaps it can be refactored a bit to return the removed id to caller, or 
perhaps even store the items directly in the heap instead of separate `Vec` or 
use the hashtable API in smarter ways. The goal in this PR is however to remove 
`RawTable` without doing much changes / refactoring, so I prefer to keep 
changes minimal as possible to achieve that goal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to