Hi all, As discussed previously, I took a look at the feasibility of changing UUID references to bytestrings or short bytestrings. While doable, there are two problems with this.
The first is that we don't have performance/load benchmarks that would tell us how the overall behaviour changes after such a change. I've tried to look at the unittests (test/hs/htest) as a proxy, but it doesn't give any useful measures (± a few percents). The individual benchmarks (e.g. getNodeInstances) do show significant improvement, but how relevant are these? The second one is that there are two kinds of inefficiencies: basic costs (which are higher in terms of memory for String versus something else) and conversion costs (e.g. needing to go back to String from ByteString). While basic costs are/can be high, a conversion cost in a tight loop can be even higher (hence why getNodeInstances is very slow right now). So whatever data type change we do, we should make sure that the switch is complete, so that at runtime we don't have to pay any conversion costs (as much as possible). Given both of these, I think that fixing stable 2.16 is risky; I think a better approach would be to forego any current release (2.17 seems to be in beta and has a stable branch, so not feasible either), and focus on large scale changes in master: - switch for parsing from String+JSON to ByteString+Aeson; this should give nontrivial speed and memory improvements - decide on whether to use a single data type for both UUIDs and object names (e.g. Text) or use a split model (ByteStrings vs. Text or ShortByteString vs. Text) - convert all object fields according to the above - convert all internal data paths to not use String anymore This would be orthogonal to any algorithmic changes (e.g. hash consing or similar), which are needed for overall memory use (whereas string type changes would be useful for localised memory usage and lower cpu usage due to less conversions). What do you think? regards, iustin
