kolfild26 commented on issue #44513: URL: https://github.com/apache/arrow/issues/44513#issuecomment-2550240017
Cardinality can refer to different things. In a database context, cardinality usually refers to the number of unique values in a relational table column relative to the total number of rows in the table. So, if are both talking about the same, cardinality is presented in the report above, ```cardinality_percentage = (unique_count / total_rows)*100``` As it was mentioned before, I'd better provide the sources. Didn't find a better way than a google drive. https://drive.google.com/file/d/1nRWTnanI3gWuumVfZHl4nvjUXwwq5qDy https://drive.google.com/file/d/1sbfa-i3OL5_Wr-qmBtbKzWrpfUMGvWpV Files in the zip archives are pickled, so to restore them: ``` with open('large.pkl', 'rb') as f: large_table = pickle.load(f) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
