Re: [GENERAL] Query caching absent query caching

2012-11-25 Thread Kevin Grittner
Bexley Hall wrote:

 Specifically, I have several computationally expensive
 functions that derive their results from specific values of
 these base types. *Solely*. (For example, area() when
 applied to a given circle always yields the same result...
 though this is a trivial/inexpensive function, by comparison).
 
 I can define the base types to set aside space to store
 these results and cache them *in* the base type. Then, serve
 up these cached results when they are needed, again. With
 plan caching, this should (?) reduce the cost of repeated
 queries significantly without the need/benefit for caching the
 actual query results. (Is that true?)
 
 To guard against future enhancements to the server (e.g., if
 query caching is ever implemented, etc.), I assume that all
 such functions should declare themselves as IMMUTABLE? Or,
 does my update of the internal representation of the data
 values (i.e., to include the cached results of each of these
 functions) conflict with this declaration?

As long as a call to a given function with a specific set of
arguments always returns the same result, and there are no *user
visible* side effects of the internal caching, I don't see a
problem with declaring the functions immutable.

Out of curiosity, are you planning on using a process-local cache
(which would start empty for each new connection) or are you
planning to allocate shared memory somehow and coordinate access to
that?

-Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query caching absent query caching

2012-11-25 Thread Bexley Hall

Hi Pavel,

On 11/24/2012 9:47 PM, Pavel Stehule wrote:

Hello

you can try use plperl as cache

http://okbob.blogspot.cz/2007/12/using-shared-as-table-cache-in-plperl.html


But how is this any different than just creating a named/shared
table manually?

And, how do further/additional accesses (by other clients or
the same client) *augment* the shared table?

In terms of my application:
- Assume client A does a query that evaluates expensive_function()
  for rows 1, 5 and 93
- Client B does a query that evaluates expensive_function() for
  rows 3, 5 and 97
- Client C does a query that evaluates expensive_function() for
  rows 93, 95 and 97
(no one alters any of the data on which expensive_function() relies
in this time interval)

Then, A should bear the cost of computing the results for 1, 5 and 93.
B should bear the cost of computing 3 and 97 -- but should be able to
benefit from A's computation of 5.  C should bear the cost of computing
95 but benefit from the previous computations of 93 and 97.

Thx,
--don


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query caching absent query caching

2012-11-25 Thread Bexley Hall

Hi Kevin,

On 11/25/2012 8:10 AM, Kevin Grittner wrote:

Bexley Hall wrote:


Specifically, I have several computationally expensive
functions that derive their results from specific values of
these base types. *Solely*. (For example, area() when
applied to a given circle always yields the same result...
though this is a trivial/inexpensive function, by comparison).

I can define the base types to set aside space to store
these results and cache them *in* the base type. Then, serve
up these cached results when they are needed, again. With
plan caching, this should (?) reduce the cost of repeated
queries significantly without the need/benefit for caching the
actual query results. (Is that true?)

To guard against future enhancements to the server (e.g., if
query caching is ever implemented, etc.), I assume that all
such functions should declare themselves as IMMUTABLE? Or,
does my update of the internal representation of the data
values (i.e., to include the cached results of each of these
functions) conflict with this declaration?


As long as a call to a given function with a specific set of
arguments always returns the same result, and there are no *user
visible* side effects of the internal caching, I don't see a
problem with declaring the functions immutable.


OK.


Out of curiosity, are you planning on using a process-local cache
(which would start empty for each new connection) or are you
planning to allocate shared memory somehow and coordinate access to
that?


I was planning on writing back the results of each successful
function evaluation into the data type's internal representation.
Ideally, back into PostgreSQL's master copy of the data
(though I would settle for hiding it in an anonymous table
behind a view, etc.)

The point is NEVER to have to RE-evaluate any of these functions
for the data on which they are evaluated once they have been
evaluated (assuming the data themselves do not change).  And,
in doing so, make the results of each evaluation available to
other clients regardless of the query which caused them to
be evaluated.

Thx,
--don


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query caching absent query caching

2012-11-25 Thread Pavel Stehule
2012/11/25 Bexley Hall bexley...@yahoo.com:
 Hi Pavel,

 On 11/24/2012 9:47 PM, Pavel Stehule wrote:

 Hello

 you can try use plperl as cache


 http://okbob.blogspot.cz/2007/12/using-shared-as-table-cache-in-plperl.html


 But how is this any different than just creating a named/shared
 table manually?

access to memory is faster than access to table - but it is limited.


 And, how do further/additional accesses (by other clients or
 the same client) *augment* the shared table?

 In terms of my application:
 - Assume client A does a query that evaluates expensive_function()
   for rows 1, 5 and 93
 - Client B does a query that evaluates expensive_function() for
   rows 3, 5 and 97
 - Client C does a query that evaluates expensive_function() for
   rows 93, 95 and 97
 (no one alters any of the data on which expensive_function() relies
 in this time interval)

 Then, A should bear the cost of computing the results for 1, 5 and 93.
 B should bear the cost of computing 3 and 97 -- but should be able to
 benefit from A's computation of 5.  C should bear the cost of computing
 95 but benefit from the previous computations of 93 and 97.


depends on implementation - probably you cannot to design a generic
solution, but for some not wide defined tasks, you can find effective
solutions.

Regards

Pavel

 Thx,
 --don


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query caching absent query caching

2012-11-24 Thread Bexley Hall

Hi,

In the absence of query caching AND NOT WANTING TO FORCE
THE APPLICATION TO DO SO EXPLICITLY, I'm looking for ideas
as to how I should future-safe the design of some custom
user base types and functions thereon.

Specifically, I have several computationally expensive
functions that derive their results from specific values of
these base types.  *Solely*.  (For example, area() when
applied to a given circle always yields the same result...
though this is a trivial/inexpensive function, by comparison).

I can define the base types to set aside space to store
these results and cache them *in* the base type.  Then, serve
up these cached results when they are needed, again.  With
plan caching, this should (?) reduce the cost of repeated
queries significantly without the need/benefit for caching the
actual query results.  (Is that true?)

To guard against future enhancements to the server (e.g., if
query caching is ever implemented, etc.), I assume that all
such functions should declare themselves as IMMUTABLE?  Or,
does my update of the internal representation of the data
values (i.e., to include the cached results of each of these
functions) conflict with this declaration?

[I am working in a fixed, constrained resource environment so
the idea of explicitly building a temporary table to hold these
results in the absence of a smart query caching strategy is not
possible -- the builder(s) of such tables would have to know
too much about each other to not exhaust the resources available!]

Thanks!
--don


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query caching absent query caching

2012-11-24 Thread Pavel Stehule
Hello

you can try use plperl as cache

http://okbob.blogspot.cz/2007/12/using-shared-as-table-cache-in-plperl.html

Regards

Pavel Stehule

2012/11/25 Bexley Hall bexley...@yahoo.com:
 Hi,

 In the absence of query caching AND NOT WANTING TO FORCE
 THE APPLICATION TO DO SO EXPLICITLY, I'm looking for ideas
 as to how I should future-safe the design of some custom
 user base types and functions thereon.

 Specifically, I have several computationally expensive
 functions that derive their results from specific values of
 these base types.  *Solely*.  (For example, area() when
 applied to a given circle always yields the same result...
 though this is a trivial/inexpensive function, by comparison).

 I can define the base types to set aside space to store
 these results and cache them *in* the base type.  Then, serve
 up these cached results when they are needed, again.  With
 plan caching, this should (?) reduce the cost of repeated
 queries significantly without the need/benefit for caching the
 actual query results.  (Is that true?)

 To guard against future enhancements to the server (e.g., if
 query caching is ever implemented, etc.), I assume that all
 such functions should declare themselves as IMMUTABLE?  Or,
 does my update of the internal representation of the data
 values (i.e., to include the cached results of each of these
 functions) conflict with this declaration?

 [I am working in a fixed, constrained resource environment so
 the idea of explicitly building a temporary table to hold these
 results in the absence of a smart query caching strategy is not
 possible -- the builder(s) of such tables would have to know
 too much about each other to not exhaust the resources available!]

 Thanks!
 --don


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query caching

2000-11-01 Thread Poul L. Christiansen

Daniel Freedman wrote:
 
 On the topic of query cache (or maybe this is just tangential and I'm
 confused):
 
 I've always heard that Oracle has the ability to essentially suck in as
 much of the database into RAM as you have memory to allow it, and can then
 just run its queries on that in-RAM database (or db subset) without doing
 disk I/O (which I would probably imagine is one of the more expensive
 parts of a given SQL command).  I've looked for references as to
 Postgresql's ability to do something like this, but I've never been
 certain if it's possible.  Can postgresql do this, please?  And, if not,
 does it have to hit the disk for every SQL instruction (I would assume
 so)?

PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
are cached, but the default cache is only ½MB of RAM. You can change
this to whatever you want.

I'm using Cold Fusion and it can cache queries itself, so no database
action is necessary. But I don't think PHP and others have this
possibility. But Cold Fusion costs 1300$ :(

Poul L. Christiansen



Re: [GENERAL] Query caching

2000-11-01 Thread Denis Perchine

 PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
 are cached, but the default cache is only ½MB of RAM. You can change
 this to whatever you want.

 I'm using Cold Fusion and it can cache queries itself, so no database
 action is necessary. But I don't think PHP and others have this
 possibility. But Cold Fusion costs 1300$ :(

No, PHP has this.

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [GENERAL] Query caching

2000-11-01 Thread Frank Joerdens

On Wed, Nov 01, 2000 at 10:16:58AM +, Poul L. Christiansen wrote:
 PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
 are cached, but the default cache is only ½MB of RAM. You can change
 this to whatever you want.

That sound like a very cool thing to do, and the default seems awfully
conservative, given the average server´s RAM equipment nowadays. If you
have a small Linux server with 128 MB of RAM, it would be interesting to
see what happens, performance-wise, if you increase the cache for
selects to, for instance, 64 MB. Has anyone tried to benchmark this? How
would you benchmark it? Where do you change this cache size? How do you
keep the cache from being swapped out to disk (which would presumably
all but eradicate the benefits of such a measure)?

Cheers Frank

-- 
frank joerdens   

joerdens new media
urbanstr. 116
10967 berlin
germany

e: [EMAIL PROTECTED]
t: +49 (0)30 69597650
f: +49 (0)30 7864046 
h: http://www.joerdens.de

pgp public key: http://www.joerdens.de/pgp/frank_joerdens.asc



Re: [GENERAL] Query caching

2000-11-01 Thread Poul L. Christiansen

Frank Joerdens wrote:
 
 On Wed, Nov 01, 2000 at 10:16:58AM +, Poul L. Christiansen wrote:
  PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's
  are cached, but the default cache is only ½MB of RAM. You can change
  this to whatever you want.
 
 That sound like a very cool thing to do, and the default seems awfully
 conservative, given the average server´s RAM equipment nowadays. If you
 have a small Linux server with 128 MB of RAM, it would be interesting to
 see what happens, performance-wise, if you increase the cache for
 selects to, for instance, 64 MB. Has anyone tried to benchmark this? How
 would you benchmark it? Where do you change this cache size? How do you
 keep the cache from being swapped out to disk (which would presumably
 all but eradicate the benefits of such a measure)?

I have a PostgreSQL server with 80MB of RAM running Redhat Linux 7.0 and
in my /etc/rc.d/init.d/postgresql start script I have these 2 lines that
start the postmaster.

echo 67108864  /proc/sys/kernel/shmmax

su -l postgres -c "/usr/bin/pg_ctl  -D $PGDATA -p /usr/bin/postmaster -o
'-i -B 4096 -o -F' start /dev/null 21"  /dev/null

The first line increases the maxium shared memory to 64MB.
The "-B 4096" indicates 4096 * 8kb = 32MB to each postmaster.

I haven't benchmarked it, but I know it's MUCH faster.

Poul L. Christiansen



[GENERAL] Query caching

2000-10-31 Thread Steve Wolfe


 (Incidentally,  we've toyed  around with  developping a
query-caching
  system that would sit betwen PostgreSQL and our DB libraries.

  Sounds  amazing, but  requires some  research, I  guess. However,  in
many
 cases one  would be  more than  happy with  cahced connections.  Of
course,
 cahced query results  can be naturally added to that,  but just
connections
 are OK to start with. Security

To me, it doesn't sound like it would be that difficult of a project, at
least not for the likes of the PostgreSQL developpers.  It also doesn't seem
like it would really introduce any security problems, not if it were done
inside of PostgreSQL.  Long ago, I got sidetracked from my endeavors in C,
and so I don't feel that I'm qualified to do it.  (otherwise, I would have
done it already. : ) )   If you wanted it done in Perl or Object Pascal, I
could help. : )

Here's a simple design that I was tossing back and forth.  Please
understand that I'm not saying this is the best way to do it, or even a good
way to do it.  Just a possible way to do it.  I haven't been able to give it
as much thought as I would like to.  Here goes.


Implementation

Upon starting, the PostgreSQL engine could allocate a chunk of memory,
sized according to the administrator's desire.  That chunk would be used
solely for query caching.

When a query came in that was not cached (say, the first query), the
database engine would process it as normal.  It would then return it to the
user, and add it to the cache.  "Adding it to the cache" would mean that it
would enter the query itself, the result set, and a list of which tables the
query relied upon.  The query that is stored could be either the query
coming from the user, or the query after it goes through the optimizer.
Each has pros and cons, I would probably favor using the query that comes
from the user.

 When another query comes along, the caching engine would quickly look
in the hash table, and see if it already had the cached results of the
query.  If so, it returns them, and wham.  You've just avoided all of the
work of optimizing, parsing, and executing, not to mention the disk I/O.  A
hash lookup seems extremely cheap compared to the work of actually
processing a query.

 When an update/delete/insert comes along, the engine would analyze
which tables were affected, and clear the cache entries that relied upon
those tables.

-
Cache Clearing

 Cache clearing would be achieved via an LRU-based algorithm, which
would also take into account the amount of RAM used by each query in the
cache.
-
Performance Impact

   The potential performance differences range from a miniscule decrease to
a tremendous increase.  And it's a lot cheaper to throw an extra half gig of
RAM in a machine that to upgrade processors and disk subsystems!

--
Possible Changes

 One potential drawback is that when a table is modified, the queries
that rely upon it would be discarded.  Where a table is updated frequently,
that could greatly reduce the performance benefit.  One possible alternative
is to store the query cost with each query in the cache.  When a table is
updated, those queries are marked as "dirty".  If the system load is below a
certain amount, or the system has been idle, it could then re-execute those
queries and update the cache.  Which queries it re-executed would be
determined on a factor of query cost and how frequently those cache entries
were used.
---

   The reason I would prefer it done in the PostgreSQL engine (as opposed to
in a middleware application) is that the caching engine needs to know (a)
which tables a query relies upon, and (b) which tables get changed.  It
seems that it would significantly reduce overhead to do those inside of
PostgreSQL (which is already doing the query parsing and analysis).

This could certainly give PostgreSQL a huge advantage over other
database systems, too.   It could save administrators a very large chunk of
cash that they would otherwise have to spend on large systems.  And it would
just be cool.   ; )

steve