Re: [HACKERS] query cache
Thanks.. I'll keep those issues in mind. On Sat, Mar 24, 2012 at 6:18 PM, Tatsuo Ishii is...@postgresql.org wrote: Well, you'd have to start by demonstrating the benefit of it. The advantage of query caches in proxies and clients is well-known, because you can offload some of the work of the database onto other servers, this increasing capacity. Adding a query cache to the database server would require the query identity recognition of the cache to be far cheaper (as in 10X cheaper) than planning and running the query, which seems unlikely at best. I figured I'd create the md5 digest of the sourceText of a query, and then look that up in a hash. I don't think that will be very expensive. I'll have another hash to keep track of which queries are dependent on which relations, so that when a relation is changed somehow (and committed), the query is then invalidated and removed from the query hash. From the experience of implementing query cache in pgool-II there are some suggestions: - A query result cache should not be created if the transaction including the SELECT is not committed. - Since a transaction could have many SELECTs, you need to keep those query results somewhere in a temporary storage. You could either discard or register them to the query cache storage depending on the transaction's fate, either aborted or committed. - If a SELECT has non-immutable functions, then the query result should not be cached. - If a SELECT uses temporary tables, then the query result should not be cached. - If a SELECT uses unlogged tables, then the query result should not be cached because their data could vanish after crash recovery. Of course this is only applied if you plan to use cache storage which does not survive after crash. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Re: [HACKERS] query cache
On Sat, Mar 24, 2012 at 3:22 PM, Joshua Berkus j...@agliodbs.com wrote: Billy, I've done a brief search of the postgresql mail archives, and I've noticed a few projects for adding query caches to postgresql, (for example, Masanori Yamazaki's query cache proposal for GSOC 2011), ... which was completed, btw. Take a look at the current release of pgPool. Are you proposing this for GSOC2012, or is this just a general idea? just a general idea, but if someone wants to work on it for GSOC2012, I wouldn't mind giving a helping hand. I'm not a student, so GSOC probably doesn't apply to me. I'm wondering if anyone would be interested in a query cache as a backend to postgresql? I've been playing around with the postgresql code, and if I'm understanding the code, I believe this is possible. Well, you'd have to start by demonstrating the benefit of it. The advantage of query caches in proxies and clients is well-known, because you can offload some of the work of the database onto other servers, this increasing capacity. Adding a query cache to the database server would require the query identity recognition of the cache to be far cheaper (as in 10X cheaper) than planning and running the query, which seems unlikely at best. I figured I'd create the md5 digest of the sourceText of a query, and then look that up in a hash. I don't think that will be very expensive. I'll have another hash to keep track of which queries are dependent on which relations, so that when a relation is changed somehow (and committed), the query is then invalidated and removed from the query hash. Billy
[HACKERS] query cache
Greetings! I've done a brief search of the postgresql mail archives, and I've noticed a few projects for adding query caches to postgresql, (for example, Masanori Yamazaki's query cache proposal for GSOC 2011), as well as the query cache announced at http://www.postgresql.org/about/news/1296/ (pgc). Both of these seem to be external solutions that act more like a proxy between clients and servers, instead of being part of the server processes. I'm wondering if anyone would be interested in a query cache as a backend to postgresql? I've been playing around with the postgresql code, and if I'm understanding the code, I believe this is possible. I've been writing some code, but don't have anything working yet, (I'm receiving a hash table corruption error), but I'm working through it. here's my basic idea: 1. intercept select queries in execMain.c at ExecuteQuery and see if the sourcetext of this query is in the query hash. (later we could make this more sophisticated by using the query plan or some type of AST) instead of the query text since adding or removing a space would create a different query hash key. 2. if the query is in the cache, return the cached results of this query. 3. if the query is not cached, run the query like normal, grabbing the tuples as they are sent to the dest and store them in the cache. (For now, I'm ignoring storage constraints, etc, but these details will need to be added before going to production). To invalidate cache entries, look at the transactions being committed (and written to WAL log, if my memory serves me) and send a message to the qcache process to invalidate any query which depends on the modfied relation (ie, table, etc) For the experts out there, does this seem reasonable, or am I misunderstanding the source code? Anyone aware of a project trying to accomplish this? Thanks! Billy Earney
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 11:29 AM, Greg Stark st...@mit.edu wrote: On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: The complication, opportunities for bugs, and general slowdown associated with that would outweigh any possible gain, in the opinion of most hackers who have thought about this. I wouldn't be quite so pessimistic. I think the problem is that the hard part in doing this for real is all the parts the proposal glosses over. How much memory is it worth dedicating to the cache before the cost of that memory costs more than it helps? How do you invalidate cache entries efficiently enough that it doesn't become a bottleneck? Also, you need to identify the specific advantages you hope a built-in cache would have over one implemented in the ORM or database library. If there aren't any advantages then those solutions are much simpler. And they have other advantages as well -- one of the main reason people implement caches is so they can move the load away from the bottleneck of the database to the more easily scaled out application. Thanks for the input. I've had many of these thoughts myself, and I guess it depends on the environment the database will be used, memory settings, and other variables, on how valuable a query cache would be. I'll definitely give this more thought before sending an official proposal. Billy
Re: [HACKERS] Future of our regular expression code
Jay, Good links, and I've also looked at a few others with benchmarks. I believe most of the benchmarks are done before PCRE implemented jit. I haven't found a benchmark with jit enabled, so I'm not sure if it will make a difference. Also I'm not sure how accurately the benchmarks will show how they will perform in an RDBMS environment. The optimizer probably is a very important variable in many complex queries. I'm leaning towards trying to implement RE2 and PCRE and running some benchmarks to see which performs best. Also would it be possible to set a session variable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose which implementation they want (unless we find a single implementation that beats the others in almost all categories)? Or is this a bad idea? Just a thought. On Mon, Feb 20, 2012 at 12:09 AM, Jay Levitt jay.lev...@gmail.com wrote: Stephen Frost wrote: Alright, I'll bite.. Which existing regexp implementation that's well written, well maintained, and which is well protected against malicious regexes should we be considering then? FWIW, there's a benchmark here that compares a number of regexp engines, including PCRE, TRE and Russ Cox's RE2: http://lh3lh3.users.**sourceforge.net/reb.shtmlhttp://lh3lh3.users.sourceforge.net/reb.shtml The fastest backtracking-style engine seems to be oniguruma, which is native to Ruby 1.9 and thus not only supports Unicode but I'd bet performs pretty well on it, on account of it's developed in Japan. But it goes pathological on regexen containing '|'; the only safe choice among PCRE-style engines is RE2, but of course that doesn't support backreferences. Russ's page on re2 (http://code.google.com/p/re2/**) says: If you absolutely need backreferences and generalized assertions, then RE2 is not for you, but you might be interested in irregexp, Google Chrome's regular expression engine. That's here: http://blog.chromium.org/2009/**02/irregexp-google-chromes-** new-regexp.htmlhttp://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html Sadly, it's in Javascript. Seems like if you need a safe, performant regexp implementation, your choice is (a) finish PLv8 and support it on all platforms, or (b) add backreferences to RE2 and precompile it to C with Comeau (if that's still around), or... Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future of our regular expression code
Tom, Thanks for your reply. So is the group leaning towards just maintaining the current regex code base, or looking into introducing a new library (RE2, PCRE, etc)? Or is this still open for discussion? Thanks! Billy On Mon, Feb 20, 2012 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Billy Earney billy.ear...@gmail.com writes: Also would it be possible to set a session variable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose which implementation they want (unless we find a single implementation that beats the others in almost all categories)? Or is this a bad idea? We used to have a GUC that selected the default mode for Spencer's package (ARE, ERE, or BRE), and eventually gave it up on the grounds that it did more harm than good. In particular, you really cannot treat the regex operators as immutable if their behavior varies depending on a GUC, which is more or less fatal from an optimization standpoint. So I'd say a GUC that switches engines, and thereby brings in subtler but no less real incompatibilities than the old one did, would be a pretty bad idea. Also, TBH I have exactly zero interest in supporting pluggable regex engines in Postgres. Regex is not sufficiently central to what we do to justify the work of coping with N different APIs and sets of idiosyncrasies. (Perl evidently sees that differently, and with some reason.) regards, tom lane
Re: [HACKERS] Future of our regular expression code
Tom, I did a google search, and found the following: http://www.arglist.com/regex/ Which states that Tcl uses the same library from Henry. Maybe someone involved with that project would help explain the library? Also I noticed at the url above is a few ports people did from Henry's code. I didn't download and analyze their code, but maybe they have made some comments that could help, or maybe have some improvements to the code.. Just a thought.. :) Billy Earney On Sun, Feb 19, 2012 at 5:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: Brendan Jurd dire...@gmail.com writes: Are you far enough into the backrefs bug that you'd prefer to see it through, or would you like me to pick it up? Actually, what I've been doing today is a brain dump. This code is never going to be maintainable by anybody except its original author without some internals documentation, so I've been trying to write some based on what I've managed to reverse-engineer so far. It's not very complete, but I do have some words about the DFA/NFA stuff, which I will probably revise and fill in some more as I work on the backref fix, because that's where that bug lives. I have also got a bunch of text about the colormap management code, which I think is interesting right now because that is what we are going to have to fix if we want decent performance for Unicode \w and related classes (cf the other current -hackers thread about regexes). I was hoping to prevail on you to pick that part up as your first project. I will commit what I've got in a few minutes --- look for src/backend/regex/README in that commit. I encourage you to add to that file as you figure stuff out. We could stand to upgrade a lot of the code comments too, of course, but I think a narrative description is pretty useful before diving into code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future of our regular expression code
Thanks Tom. I looked at the code in the libraries I referred to earlier, and it looks like the code in the regex directory is exactly the same as Walter Waldo's version, which has at least one comment from the middle of last decade (~ 2003). Has people thought about migrating to the pcre library? It seems to have a lot of neat features, and also has a jit, and it looks like it is being actively maintained and has decent comments. On Sun, Feb 19, 2012 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Billy Earney billy.ear...@gmail.com writes: I did a google search, and found the following: http://www.arglist.com/regex/ Hmm ... might be worth looking at those two pre-existing attempts at making a standalone library from Henry's code, just to see what choices they made. Which states that Tcl uses the same library from Henry. Maybe someone involved with that project would help explain the library? Um ... did you see the head message in this thread? regards, tom lane