kolfild26 commented on issue #44513:
URL: https://github.com/apache/arrow/issues/44513#issuecomment-2550240017

   Cardinality can refer to different things.
   In a database context, cardinality usually refers to the number of unique 
values in a relational table column relative to the total number of rows in the 
table.
   So, if are both talking about the same, cardinality is presented in the 
report above,  ```cardinality_percentage = (unique_count / total_rows)*100```
   
   As it was mentioned before, I'd better provide the sources. Didn't find a 
better way than a google drive.
   
   https://drive.google.com/file/d/1nRWTnanI3gWuumVfZHl4nvjUXwwq5qDy
   https://drive.google.com/file/d/1sbfa-i3OL5_Wr-qmBtbKzWrpfUMGvWpV
   
   Files in the zip archives are pickled, so to restore them:
   ```
   with open('large.pkl', 'rb') as f:
       large_table = pickle.load(f)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to