goldmedal commented on issue #15420:
URL: https://github.com/apache/datafusion/issues/15420#issuecomment-2752034334

   > The array outputs true for each row that has hash % partition == 0 (and 
false if not).
   
   I don't understand why the formula is `hash % partition == 0`? IMO, `hash % 
total_partition` is the number of the portion it belongs. Maybe the formula 
should be `hash % total_partition == current_partition`?
   
   Given the following data:
   ```
   col1 | col2 | ... | hash % total_partition
   -------------------------
   data | data | ... | 2
   data | data | ... | 1
   data | data | ... | 2
   data | data | ... | 0
   ```
   
   The 0 partition will get
   ```
   col1 | col2 | ... | selection
   -------------------------
   data | data | ... | false
   data | data | ... | false
   data | data | ... | false
   data | data | ... | true
   ```
   
   The 1 partition will get
   ```
   col1 | col2 | ... | selection
   -------------------------
   data | data | ... | false
   data | data | ... | true
   data | data | ... | false
   data | data | ... | false
   ```
   
   The 2 partition will get
   ```
   col1 | col2 | ... | selection
   -------------------------
   data | data | ... | true
   data | data | ... | false
   data | data | ... | true
   data | data | ... | false
   ```
   Then, the following plan can aggregate or join the record which `selection` 
is true.
   Does it make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to