On Wed, Mar 3, 2010 at 4:59 PM, Marcin Krol <mrk...@gmail.com> wrote: > Hello everyone, > > I have inadvertently set off a furball on an unrelated ng on what is the > actual cost of SQL joins. But there's no reliable conclusion. I would like > to actually know that, that is, are JOINs truly expensive?
There's a lot of variables that go in to join performance. Are the necessary columns indexed, what is the comparison operator, size of the index, etc? Complicating the calculation, postgres has access to different algorithms it chooses depending on circumstances. In general, joins cost less than a lot of novice programmers think because they are simply unaware of the number of highly efficient ways to attack the problem of doing set operations on lists of sorted items, or perhaps have had bad experiences on databases that had a lousy selection of algorithms or poor (if any) statistics. The three biggest factors on join performance are 1: having index in appropriate places, 2: if/when there is a full cache fault and the database has to get information from disk when executing the join. And 3: there being enough working memory to do things like 'in query' sorts. Point 2 is where denormalization can reap real benefits because it can reduce the number of disk seeks to get data (compared to the cpu and memory, disk seeks take eons). Databases have a characteristic that reminds me of some of the challenges that aircraft engineers face. Jets that go supersonic have an entirely different set of operating principles because above the speed of sound all the rules changes in terms of thermals, vibration, stresses, etc. Similarly in the database world performance tend to lurch in a very unpleasant direction when you active working set of data is not able to be properly served in RAM. The 'sonic boom' you hear is your disk drives grinding as the clock ticks away...for many DBAs hearing this sound the first time is a life changing experience. This is when those little things you never bothered to think about like having the records that tend to be pulled up together by grouped together on disk become suddenly very important...you discover the CLUSTER command and become best friends :-). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general