caiwanli opened a new issue, #44598:
URL: https://github.com/apache/arrow/issues/44598

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   My requirement is: Suppose my RecordBatch contains three columns, and I 
would like to calculate a hash value based on two of those columns. The process 
is illustrated in the following diagram:
   
![image](https://github.com/user-attachments/assets/d70260f3-00dd-4f53-a450-2526a565684b)
   The function interface I designed is as follows:
   `vector<size_t> HashRB1(std::shared_ptr<arrow::RecordBatch> &input, 
                           vector<int> idxs, 
                           int type); 
   `
   Where ‘input’ is the input data, ‘idxs’ specifies the columns for which the 
hash needs to be calculated, and ‘type’ indicates the type of hash function.
   or,
   It can also be done as shown below, where the hash value is added as a new 
column to the existing RecordBatch.
   
![image](https://github.com/user-attachments/assets/18362b22-e171-4f71-8f7f-83ed18094f46)
   Function interface:
   `shared_ptr<arrow::RecordBatch> HashRB2(std::shared_ptr<arrow::RecordBatch> 
&input, 
                                           vector<int> idxs, 
                                           int type); 
   `
   In summary, how should I implement the HashRB1 or HashRB2 function? I 
checked the Arrow documentation, and it seems there isn't a direct function to 
compute a hash for a RecordBatch.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to