[ https://issues.apache.org/jira/browse/IMPALA-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anurag Mantripragada reassigned IMPALA-9433: -------------------------------------------- Assignee: (was: Anurag Mantripragada) > Change FileHandleCache from using a multimap to an unordered_map > ---------------------------------------------------------------- > > Key: IMPALA-9433 > URL: https://issues.apache.org/jira/browse/IMPALA-9433 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.4.0 > Reporter: Joe McDonnell > Priority: Minor > Labels: ramp-up > > The file handle cache can contain multiple file handles per filename. > Currently it uses a std::multimap, where the file handles for each filename > are a contiguous set of entries. A lookup will find the beginning of that > range and then iterate through it to find a free one. > A multimap is implemented as a red-black tree with O(log(N)) lookup, so we > should be able to improve this by using a hashtable-based structure such as > unordered_map/unordered_multimap with O(1) lookup. > Another optimization would be to add an intermediary structure for each > filename and hold all the file handles for that file name in a linked list. > Lookup would find this intermediary structure by looking up the filename, > then it would iterate. In the current method, the key/value pair for each > file handle must store a copy of the filename string as the key, even for > duplicates. With the intermediary structure, it would store the filename once > per unique filename. > It also looks like the LRU list would benefit from being a Boost intrusive > list ([https://www.boost.org/doc/libs/1_64_0/doc/html/intrusive.html]). Every > file handle is always in the LRU list, so a std::list has a higher memory > overhead and requires more memory accesses. It also complicates the code, > because the FileHandleEntry needs to store a LruListType::iterator to its > location in the LRU list. > These optimizations are low priority, but they provide good ramp-up for some > C++ concepts/APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org