Sven Cattell created ARROW-18090: ------------------------------------ Summary: Dictionary Style array for Keywords or Tags Key: ARROW-18090 URL: https://issues.apache.org/jira/browse/ARROW-18090 Project: Apache Arrow Issue Type: New Feature Reporter: Sven Cattell
I want to efficiently encode lists of tags for each element in my database. In my case I have 30 tags, and a few are assigned to each of my ~20m records. Here's a simplified example of 5 records: * pe, keylogger, cryptojack * pe, packed * pe, cryptojack, c2 * pe, keylogger, c2 * pe Right now I have to store these in a List<Utf8> and have huge amounts of duplicate data. The dictionary array looks almost perfect for this task. I just want to allow for a List<T> instead of just T for the allowed primitive index type in a dictionary. -- This message was sent by Atlassian Jira (v8.20.10#820010)