On Mon, Nov 14, 2005 at 08:31:50PM -0500, Robert Treat wrote:
> The basic scenario is one of a function that, given input, looks up
> corresponding information in a cache table.  If it can't find the
> information, it goes through a more complicated (and slower) search
> to obtain the information, inserts that information into the cache,
> and returns the result.  Note it always returns the same result
> whether the cache contains the information or not, which means you
> really do only need to evaluate it once per scan.  The problem is
> that when you mark such functions as volatile the performance you
> get is horrendous, so you're forced to mark them as stable so the
> planner will make use of index scans and such and give decent
> performance. Now maybe that's not a convincing use-case, but it is a
> common one.

Isn't this the sort of case that Tom just explained as not functioning
in 8.1, as the STABLE functions, and all functions called by the
STABLE functions will use the snapshot that is used at the time it
was called? As in, you do the INSERT, but within the same SELECT
statement invoking this 'STABLE' function, it never sees the inserted
cached value?

Also - what does it do with parallel inserts of the same cache values?
Three or four clients all require the data at the same time - they execute
the cache table lookup, to fail to find a row, they then all resolve the
query the slow way, and each try to insert a cache row.

The case seems problematic to me. Isn't it better served by a caching
daemon, such as memcached? It has similar problems - not transaction
safe, and so on, but I would suspect that this caching table that you
describe above cannot ever be truly transaction safe, unless you store
full row dependencies for each of the cache records, and validate
against the dependencies before returning any data. Who is to say the
cache data is up-to-date? Invalidation of the cache data rows may not
solve this either.

I'd say why bother?

Personally, I'm more in favour of PostgreSQL doing cheap caching of
query to results, making those very common slow queries you mention
faster where possible. For example, keeping the query results in a LRU
cache, with an identifier that would allow it to quickly determine if
all dependent tables have changed or not, allowing it to return the
results as is, if all of the tables are unchanged since the last
execution. To make it faster, and to minimize caching of less frequent
queries, perhaps the first few times a query is executed, it should
only remember the number of times it has been executed, and only after
some threshhold has passed, start to cache the results, and the
dependency information. If a query rarely keeps the same dependency
information, keep it in a cache of queries to never cache results or
dependency information for? I'm sure this has been talked about at
length, before I joined this mailing list.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]     
__________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to