On 2 May 2012 04:24, Robert Haas <robertmh...@gmail.com> wrote: > I think Tom's question of whether the parser or lexer is the problem > is something that ought to be investigated. Personally, I suspect > that our tendency to use lists everywhere, for everything, an artifact > of our proud LISP heritage, may be costing us dearly in terms of parse > time. However, there's a difference between suspicion and proof, and > I certainly have none of the latter.
It's funny that you should say that, because I actually had a discussion with Greg Stark over drinks about this recently. John Bentley has described an experiment that undermines many traditional beliefs about the trade-offs represented by using a linked list rather than an array. The test is, using a modern computer, generate N integers at random and insert them in order into a sequence. Then, randomly remove integers from the sequence, one at a time. What is the performance at different sizes of N when the sequence is a doubly-linked list, and when it is an array? If you graph the two, the results are perhaps rather surprising. I think his graph went up to 100,000. The initial result shows a line representing an array down near the bottom of the graph. The list line looks exponential. Even if you use memory pooling so the list doesn't have to allocate memory as needed, the array still roundly beats the list, albeit not quite so convincingly and without the list hitting the roof near 100,000. I don't think that I need to point out that this is for inserting and deleting, and that's when you're supposed to use lists. The point is that on modern architectures, with many layers of cache, the cost of the linear search to get the insertion point completely dominates - this is without the array availing of a binary search, in the interest of fairness. CPU caches happen to do a very good job of moving over on-average half of the array for inserting elements at random points. The list is much larger than the array, with the two extra pointers per element (yeah, I know that we use singly linked lists, but we have other disadvantages compared to typical C lists), which matters. Predictable usage patterns - that is, temporal and spatial locality, resulting in good usage of the memory hierarchy matters a lot. We're not talking about a small difference, either. I think the difference in the published test was something like the array was 50 - 100 times faster. The list results in far more cache misses than the array. So, I'm right there with you - using lists everywhere is bad news. As for the question of Flex/Quex, I'm not in an immediate position to sink any more time into it, but it remains on my list of things to pursue for 9.3, though it's only about number 3 on that list right now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers