Jim C. Nasby" <jnasby ( at ) pervasive ( dot ) com> wrote:
> Stefan Weiss wrote:
> ... IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are > not indexed.

Why do you think you'll know better than the database how frequently
something is used? At best, your guess will be correct and PostgreSQL
(or the kernel) will keep the table in memory. Or, your guess is wrong
and you end up wasting memory that could have been used for something
else.

It would probably be better if you describe why you want to force this
table (or tables) into memory, so we can point you at more appropriate
solutions.

Or perhaps we could explain why we NEED to force these tables into memory, so 
we can point you at a more appropriate implementation.  ;-)

Ok, wittiness aside, here's a concrete example.  I have an application with one 
critical index that MUST remain in memory at all times.  The index's tablespace 
is about 2 GB.  As long as it's in memory, performance is excellent - a user's 
query takes a fraction of a second.  But if it gets swapped out, the user's 
query might take up to five minutes as the index is re-read from memory.

Now here's the rub.  The only performance I care about is response to queries 
from the web application.  Everything else is low priority.  But there is other 
activity going on.  Suppose, for example, that I'm updating tables, performing 
queries, doing administration, etc., etc., for a period of an hour, during 
which no customer visits the site.  The another customer comes along and 
performs a query.

At this point, no heuristic in the world could have guessed that I DON'T CARE 
ABOUT PERFORMANCE for anything except my web application.  The performance of 
all the other stuff, the administration, the updates, etc., is utterly 
irrelevant compared to the performance of the customer's query.

What actually happens is that the other activities have swapped out the critical index, and my 
customer waits, and waits, and waits... and goes away after a minute or two.  To solve this, we've 
been forced to purchase two computers, and mirror the database on both.  All administration and 
modification happens on the "offline" database, and the web application only uses the 
"online" database.  At some point, we swap the two servers, sync the two databases, and 
carry on.  It's a very unsatisfactory solution.

There is ONLY one way to convey this sort of information to Postgres, which is 
to provide the application developer a mechanism to explicitely name the tables 
that should be locked in memory.

Look at tsearchd that Oleg is working on.  It's a direct response to this 
problem.

It's been recognized for decades that, as kernel developers (whether a Linux 
kernel or a database kernel), our ability to predict the behavior of an 
application is woefully inadequate compared with the application developer's 
knowledge of the application.  Computer Science simply isn't a match for the 
human brain yet, not even close.

To give you perspective, since I posted a question about this problem 
(regarding tsearch2/GIST indexes), half of the responses I received told me 
that they encountered this problem, and their solution was to use an external 
full-text engine.  They all confirmed that Postgres can't deal with this 
problem yet, primarily for the reasons outlined above.

Craig

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to