baibaichen commented on issue #37559:
URL: https://github.com/apache/arrow/issues/37559#issuecomment-1711044056

   From the paper, IIUC, they copy **the selected encode value** out, and then 
decode them as if all records are decoded.
   
   The following cited from section 4.1
   
   > The framework is built upon a simple yet crucial observation: when 
performing a `filter` or `project` operation, records failing to meet prior 
predicates can be bypassed directly. While this observation is undeniably 
obvious, previous approaches have not leveraged it effectively. Indeed, in the 
case of filter operations, previous work tends to perform predicate evaluation 
on **all** values [[29](#_bookmark74), [34](#_bookmark80)], intentionally 
ignoring the fact that some values might have been filtered by prior filters. 
**This is primarily because the additional cost associated with the select 
operator often outweighs the potential savings in predicate evaluation**. 
However, given the fast select operator that operates on encoded values 
(Section [3](#_bookmark6)), it has become more favorable to select values 
upfront, even for filter operations.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to