Re: [HACKERS] query cache

2012-03-25 Thread Billy Earney
Thanks..  I'll keep those issues in mind.

On Sat, Mar 24, 2012 at 6:18 PM, Tatsuo Ishii is...@postgresql.org wrote:

  Well, you'd have to start by demonstrating the benefit of it.  The
  advantage of query caches in proxies and clients is well-known, because
 you
  can offload some of the work of the database onto other servers, this
  increasing capacity.  Adding a query cache to the database server would
  require the query identity recognition of the cache to be far cheaper
 (as
  in 10X cheaper) than planning and running the query, which seems
 unlikely
  at best.
 
  I figured I'd create the md5 digest of the sourceText of a query, and
 then
  look that up in a hash.  I don't think that will be very expensive.  I'll
  have another hash to keep track of which queries are dependent on which
  relations, so that when a relation is changed somehow (and committed),
 the
  query is then invalidated and removed from the query hash.

 From the experience of implementing query cache in pgool-II there are
 some suggestions:

 - A query result cache should not be created if the transaction
  including the SELECT is not committed.

 - Since a transaction could have many SELECTs, you need to keep those
  query results somewhere in a temporary storage. You could either
  discard or register them to the query cache storage depending on the
  transaction's fate, either aborted or committed.

 - If a SELECT has non-immutable functions, then the query result
  should not be cached.

 - If a SELECT uses temporary tables, then the query result should not
  be cached.

 - If a SELECT uses unlogged tables, then the query result should not
  be cached because their data could vanish after crash recovery. Of
  course this is only applied if you plan to use cache storage which
  does not survive after crash.
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan
 English: http://www.sraoss.co.jp/index_en.php
 Japanese: http://www.sraoss.co.jp



Re: [HACKERS] query cache

2012-03-24 Thread Billy Earney
On Sat, Mar 24, 2012 at 3:22 PM, Joshua Berkus j...@agliodbs.com wrote:

 Billy,

  I've done a brief search of the postgresql mail archives, and I've
  noticed a few projects for adding query caches to postgresql, (for
  example, Masanori Yamazaki's query cache proposal for GSOC 2011),

 ... which was completed, btw.  Take a look at the current release of
 pgPool.

 Are you proposing this for GSOC2012, or is this just a general idea?


just a general idea, but if someone wants to work on it for GSOC2012, I
wouldn't mind giving a helping hand.  I'm not a student, so GSOC probably
doesn't apply to me.


  I'm wondering if anyone would be interested in a query cache as a
  backend to postgresql? I've been playing around with the postgresql
  code, and if I'm understanding the code, I believe this is possible.

 Well, you'd have to start by demonstrating the benefit of it.  The
 advantage of query caches in proxies and clients is well-known, because you
 can offload some of the work of the database onto other servers, this
 increasing capacity.  Adding a query cache to the database server would
 require the query identity recognition of the cache to be far cheaper (as
 in 10X cheaper) than planning and running the query, which seems unlikely
 at best.

 I figured I'd create the md5 digest of the sourceText of a query, and then
look that up in a hash.  I don't think that will be very expensive.  I'll
have another hash to keep track of which queries are dependent on which
relations, so that when a relation is changed somehow (and committed), the
query is then invalidated and removed from the query hash.


Billy


[HACKERS] query cache

2012-03-23 Thread Billy Earney
Greetings!

I've done a brief search of the postgresql mail archives, and I've noticed
a few projects for adding query caches to postgresql,  (for example,
Masanori Yamazaki's query cache proposal for GSOC 2011), as well as the
query cache announced at http://www.postgresql.org/about/news/1296/
(pgc).  Both of these seem to be external solutions that act more like a
proxy between clients and servers, instead of being part of the server
processes.

I'm wondering if anyone would be interested in a query cache as a backend
to postgresql?  I've been playing around with the postgresql code, and if
I'm understanding the code, I believe this is possible. I've been writing
some code, but don't have anything working yet, (I'm receiving a hash table
corruption error), but I'm working through it.

here's my basic idea:

1.  intercept select queries in execMain.c  at ExecuteQuery and see if the
sourcetext of this query is in the query hash.  (later we could make this
more sophisticated  by  using the query plan or some type of AST) instead
of the query text since adding or removing a space would create a different
query hash key.
2.  if the query is in the cache, return the cached results of this query.
3.  if the query is not cached, run the query like normal, grabbing the
tuples as they are sent to the dest and store them in the cache. (For
now, I'm ignoring storage constraints, etc, but these details will need to
be added before going to production).

To invalidate cache entries, look at the transactions being committed (and
written to WAL log, if my memory serves me) and send a message to the
qcache process to invalidate any query which depends on the modfied
relation (ie, table, etc)


For the experts out there, does this seem reasonable, or am I
misunderstanding the source code?  Anyone aware of a project trying to
accomplish this?

Thanks!

Billy Earney


Re: [HACKERS] query cache

2012-03-23 Thread Billy Earney
On Fri, Mar 23, 2012 at 11:29 AM, Greg Stark st...@mit.edu wrote:

 On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  The complication, opportunities for bugs, and general slowdown
  associated with that would outweigh any possible gain, in the opinion
  of most hackers who have thought about this.

 I wouldn't be quite so pessimistic. I think the problem is that the
 hard part in doing this for real is all the parts the proposal glosses
 over. How much memory is it worth dedicating to the cache before the
 cost of that memory costs more than it helps? How do you invalidate
 cache entries efficiently enough that it doesn't become a bottleneck?

 Also, you need to identify the specific advantages you hope a built-in
 cache would have over one implemented in the ORM or database library.
 If there aren't any advantages then those solutions are much simpler.
 And they have other advantages as well -- one of the main reason
 people implement caches is so they can move the load away from the
 bottleneck of the database to the more easily scaled out application.

 Thanks for the input.  I've had many of these thoughts myself, and I guess
it depends on the environment the database will be used, memory settings,
and other variables,  on how valuable a query cache would be.  I'll
definitely give this more thought before sending an official proposal.

Billy


Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Billy Earney
Jay,

Good links, and I've also looked at a few others with benchmarks.  I
believe most of the benchmarks are done before PCRE implemented jit.  I
haven't found a benchmark with jit enabled, so I'm not sure if it will make
a difference.  Also I'm not sure how accurately the benchmarks will show
how they will perform in an RDBMS environment. The optimizer probably is a
very important variable in many complex queries.  I'm leaning towards
trying to implement RE2 and PCRE and running some benchmarks to see which
performs best.

Also would it be possible to set a session variable (lets say  PGREGEXTYPE)
and set it to ARE (current alg), RE2, or PCRE, that way users could choose
which implementation they want (unless we find a single implementation that
beats the others in almost all categories)?  Or is this a bad idea?

Just a thought.


On Mon, Feb 20, 2012 at 12:09 AM, Jay Levitt jay.lev...@gmail.com wrote:

 Stephen Frost wrote:

 Alright, I'll bite..  Which existing regexp implementation that's well
 written, well maintained, and which is well protected against malicious
 regexes should we be considering then?


 FWIW, there's a benchmark here that compares a number of regexp engines,
 including PCRE, TRE and Russ Cox's RE2:

 http://lh3lh3.users.**sourceforge.net/reb.shtmlhttp://lh3lh3.users.sourceforge.net/reb.shtml

 The fastest backtracking-style engine seems to be oniguruma, which is
 native to Ruby 1.9 and thus not only supports Unicode but I'd bet performs
 pretty well on it, on account of it's developed in Japan.  But it goes
 pathological on regexen containing '|'; the only safe choice among
 PCRE-style engines is RE2, but of course that doesn't support
 backreferences.

 Russ's page on re2 (http://code.google.com/p/re2/**) says:

 If you absolutely need backreferences and generalized assertions, then
 RE2 is not for you, but you might be interested in irregexp, Google
 Chrome's regular expression engine.

 That's here:

 http://blog.chromium.org/2009/**02/irregexp-google-chromes-**
 new-regexp.htmlhttp://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

 Sadly, it's in Javascript.  Seems like if you need a safe, performant
 regexp implementation, your choice is (a) finish PLv8 and support it on all
 platforms, or (b) add backreferences to RE2 and precompile it to C with
 Comeau (if that's still around), or...

 Jay


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Billy Earney
Tom,

Thanks for your reply.  So is the group leaning towards just maintaining
the current regex code base, or looking into introducing a new library
(RE2, PCRE, etc)?   Or is this still open for discussion?

Thanks!

Billy

On Mon, Feb 20, 2012 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Billy Earney billy.ear...@gmail.com writes:
  Also would it be possible to set a session variable (lets say
  PGREGEXTYPE)
  and set it to ARE (current alg), RE2, or PCRE, that way users could
 choose
  which implementation they want (unless we find a single implementation
 that
  beats the others in almost all categories)?  Or is this a bad idea?

 We used to have a GUC that selected the default mode for Spencer's
 package (ARE, ERE, or BRE), and eventually gave it up on the grounds
 that it did more harm than good.  In particular, you really cannot treat
 the regex operators as immutable if their behavior varies depending on
 a GUC, which is more or less fatal from an optimization standpoint.
 So I'd say a GUC that switches engines, and thereby brings in subtler
 but no less real incompatibilities than the old one did, would be a
 pretty bad idea.

 Also, TBH I have exactly zero interest in supporting pluggable regex
 engines in Postgres.  Regex is not sufficiently central to what we do
 to justify the work of coping with N different APIs and sets of
 idiosyncrasies.  (Perl evidently sees that differently, and with some
 reason.)

regards, tom lane



Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney
Tom,

I did a google search, and found the following:
http://www.arglist.com/regex/

Which states that Tcl uses the same library from Henry.  Maybe someone
involved with that project would help explain the library?  Also I noticed
at the url above is a few ports people did from Henry's code.  I didn't
download and analyze their code, but maybe they have made some comments
that could help, or maybe have some improvements to the code..

Just a thought.. :)

Billy Earney

On Sun, Feb 19, 2012 at 5:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Brendan Jurd dire...@gmail.com writes:
  Are you far enough into the backrefs bug that you'd prefer to see it
  through, or would you like me to pick it up?

 Actually, what I've been doing today is a brain dump.  This code is
 never going to be maintainable by anybody except its original author
 without some internals documentation, so I've been trying to write
 some based on what I've managed to reverse-engineer so far.  It's
 not very complete, but I do have some words about the DFA/NFA stuff,
 which I will probably revise and fill in some more as I work on the
 backref fix, because that's where that bug lives.  I have also got
 a bunch of text about the colormap management code, which I think
 is interesting right now because that is what we are going to have
 to fix if we want decent performance for Unicode \w and related
 classes (cf the other current -hackers thread about regexes).
 I was hoping to prevail on you to pick that part up as your first
 project.  I will commit what I've got in a few minutes --- look
 for src/backend/regex/README in that commit.  I encourage you to
 add to that file as you figure stuff out.  We could stand to upgrade
 a lot of the code comments too, of course, but I think a narrative
 description is pretty useful before diving into code.

regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney
Thanks Tom.  I looked at the code in the libraries I referred to earlier,
and it looks like the code in the regex directory is exactly the same as
Walter Waldo's version, which has at least one comment from the middle of
last decade (~ 2003).   Has people thought about migrating to the pcre
library?  It seems to have a lot of neat features, and also has a jit, and
it looks like it is being actively maintained and has decent comments.


On Sun, Feb 19, 2012 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Billy Earney billy.ear...@gmail.com writes:
  I did a google search, and found the following:
  http://www.arglist.com/regex/

 Hmm ... might be worth looking at those two pre-existing attempts at
 making a standalone library from Henry's code, just to see what choices
 they made.

  Which states that Tcl uses the same library from Henry.  Maybe someone
  involved with that project would help explain the library?

 Um ... did you see the head message in this thread?

regards, tom lane