> -----Original Message----- > From: cmake-developers [mailto:cmake-developers-boun...@cmake.org] > On Behalf Of Milian Wolff > Sent: Thursday, January 21, 2016 22:31 > To: Daniel Pfeifer > Cc: CMake Developers; Stephen Kelly > Subject: Re: [cmake-developers] CMake daemon for user tools > > > What do you think about string interning? I started a cmString class > > [2] that stores short strings inplace and puts long strings into a > > pool. Copies never allocate extra memory and equality comparison is > > always O(1). > > In addition to `operator<` (which is lexicographical) there is a O(1) > > comparison that can be used where lexicographical ordering is not > > required (ie. for lookup tables). > > > > [1] https://en.wikipedia.org/wiki/String_interning > > [2] https://github.com/purpleKarrot/CMake/commits/string-pool > > Imo, you should replace this custom code by a boost::flyweight of std::string. > That said, I think this can have a significant impact on the memory footprint > of CMake, considering how heavy it relies on strings internally. But it also > seems to mutate strings a lot. I've seen places e.g. where a list of compile- > time known identifiers is prepended with "CMAKE_" at runtime. This is slow > with normal strings (reallocations), but will also be slow with a flyweight or > interning, possibly even leading to the pollution of the internal pool with > temporary strings. > > Did you measure any impact on both, runtime speed and memory footprint > yet?
I was wondering the same. I would guess maybe the biggest impact would be the inplace storage of strings for small sized strings. But to know the inplace buffer size would probably require some profiling and measurement of string sizes... otherwise it is just a wild guess... Maybe for testing, you can swap out the string header file on your system with one that logs allocations/string sizes, and perhaps also profiles the time it takes to make each allocation? The interesting question is: could inplace storage be used for 95% of the cases such that fussing with string interning becomes unnecessary complexity? If so, then you mentioned equality comparison as another issue: the interesting question there is how much time is spent on allocations vs comparisons... In another application I worked on, I was able to get a big improvement in performance by replacing usage of std::vector in one place with a custom vector that stack-allocated the first 10 items (i.e. fixed-size C array as a member variable of the class), and then reverted to a regular vector after that. But to pick the number "10" required some profiling/measurement. The remaining use of the heap was so negligible as to not be worth improving. Best regards, James Johnston -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers