On 8 Říjen 2013, 6:23, Atri Sharma wrote: > Hi Tomas, > > >>> Consider the aspects associated with open addressing.Open addressing >>> can quickly lead to growth in the main table.Also, chaining is a much >>> cleaner way of collision resolution,IMHO. >> >> What do you mean by "growth in the main table"? > > Sorry, I should have been more verbose. > > AFAIK, Open addressing can be slower with a load factor approaching 1 > as compared to chaining. Also, I feel that implementation of open > addressing can be more complicated as we have to deal with deletes > etc. > > I feel we can redesign our current chaining mechanism to have skip > lists instead of singly linked lists. I experimented with it sometime > back and I feel that it gives a stable performance with higher loads. > > Regards, > > Atri
OK, thanks for the explanation. I've spent some time hacking this yesterday, the results are available in a separate branch: https://github.com/tvondra/count_distinct/tree/open-addressing The complexity of the implementation is pretty much comparable to chaining. I assume it would get much more complex with handling deletes etc., but that's not really necessary here. However I'm unable to make it at least comparable to chaining. The trick is that the hash table performs reasonably only until it's ~ 65-70% full, it gets really slow after that. So to maintain performance comparable to chaining, I'd have to keep the table below this threshold, effectively wasting ~30% of memory. And the topic of this thread was about decreasing the memory consumptions, so it seems to me open-addressing is not a good match here. I'll try a few more things but I don't think it's going to fly. I've made some significant improvements in the chaining version (in the master branch), now getting about the memory consumption I've estimated. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers