[ https://issues.apache.org/jira/browse/ARROW-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe L. Korn updated ARROW-4501: ------------------------------- Fix Version/s: 0.12.1 > [C++] Unique returns non-unique strings > --------------------------------------- > > Key: ARROW-4501 > URL: https://issues.apache.org/jira/browse/ARROW-4501 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Ilya Tokar > Priority: Major > Labels: pull-request-available > Fix For: 0.13.0, 0.12.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Calling Unique on e. g. \{"some long string data","some long string > data","other data"} returns > dictionary with "some long string data" appearing twice. This is caused by > off by 1 error in DoubleCrcHash, which caused it to read 1 byte past the end > of the strings with length higher than 16, and not divisible by 4. In such > cases, we never hash p[0], and we always read one extra byte. -- This message was sent by Atlassian JIRA (v7.6.3#76005)