[HACKERS] Postgresql on multi-core CPU's: is this old news?
Came across the following in a paper from Oct 2010. Was wondering is this is old news I missed in this group. http://pdos.csail.mit.edu/papers/linux:osdi10.pdf about Linux optimization on multi-core CPU's. The group at MIT were exploring how some Linux apps were scaling up --- sometimes badly, mostly due to hidden contention over cache-line consistency across the cores' caches. In a nutshell: if an app, or the system calls it uses, tries to modify anything in a cache line (32-64 byte slice of memory) that another core is using, there's a lot of fumbling in the dark to make sure there is no conflict. When I saw PostgreSQL named in the abstract, I thought, "Aha! Contention over shm". Not so. Skip to page 11 (section 5.5) for most of the PG specifics.
Re: [HACKERS] Multi-pass planner
In a federated database engine I built in the mid-90's, it more or less ran both plans in parallel, to implement fast-first and min-total cost. The index query in general started returning rows whose oids went into a filter that discarded them from the serial query once it started to crank things out. 'index' and 'serial' is not exactly what was going on; the federated engine was joining multiple tables across multiple (sometimes hundreds) of databases, with really variable transmission times, and tolerance for timed-out db servers. It had no reliable way to get cost estimates in advance, since it didn't have access to statistical metadata (no, Postgres wasn't one of the databases I had to take as input, more's the pity). 'Parallel' is also not quite accurate. It retained plans (queries were heavily repeated) and did mickey-mouse simulated annealing of past performance, so that if one of two plans was the faster 90% of the time, then there was a 10% probability it would run both plans in parallel, just to check whether something had changed. The code was part of a proprietary product, and was used by Acxiom and Cisco. But the concept was pretty simple and as described above. > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Greg Stark > Sent: Thursday, August 20, 2009 10:32 AM > To: Robert Haas > Cc: Kevin Grittner; decibel; Pg Hackers > Subject: Re: [HACKERS] Multi-pass planner > > On Thu, Aug 20, 2009 at 6:28 PM, Greg Stark wrote: > > I don't think it's a bad idea, I just think you have to set your > > expectations pretty low. If the estimates are bad there > isn't really > > any plan that will be guaranteed to run quickly. > > Actually this is usually Tom's point when this topic comes > up. Say you're deciding between an index scan and a > sequential scan. The sequential scan has a total cost of > 1000..1000 but the index scan has an estimated total cost of > 1..1. If you pick the sequential scan you might be > running 1000x slower than the index scan in the worst case. > But if you pick the index scan you might be running 10x > slower than the sequential scan in the worst case. If you > don't trust the estimate where does that leave you? Making a > mistake either way is fatal. > > > -- > greg > http://mit.edu/~gsstark/resume.pdf > > -- > Sent via pgsql-hackers mailing list > (pgsql-hackers@postgresql.org) To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans
Now I'm back where I can go look at the source code :-) Thanks. > -Original Message- > From: Jaime Casanova [mailto:jcasa...@systemguards.com.ec] > Sent: Monday, July 13, 2009 8:40 PM > To: Mischa Sandberg > Cc: Heikki Linnakangas; PostgreSQL-development > Subject: Re: [HACKERS] Index-only scans > > On Mon, Jul 13, 2009 at 5:38 PM, Mischa > Sandberg wrote: > > Does PG have an intermediate execution node to sort/batch > index entries (heap tuple ptrs) by heap page prior to lookup? > Something mssql does ... > > > > it sounds a lot like a bitmap index scan > > > -- > Atentamente, > Jaime Casanova > Soporte y capacitación de PostgreSQL > AsesorÃa y desarrollo de sistemas > Guayaquil - Ecuador > Cel. +59387171157 > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans
Does PG have an intermediate execution node to sort/batch index entries (heap tuple ptrs) by heap page prior to lookup? Something mssql does ... > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Bruce Momjian > Sent: Monday, July 13, 2009 6:38 AM > To: Heikki Linnakangas > Cc: PostgreSQL-development > Subject: Re: [HACKERS] Index-only scans > > Heikki Linnakangas wrote: > > Even if we don't solve the visibility > > map problem, just allowing the executor to evaluate quals > that are not > > directly indexable using data from the index, would be useful. For > > example, "SELECT * FROM foo WHERE textcol LIKE '%bar%', and > you have a > > b-tree index on textcol, the planner could choose a > full-index-scan, > > apply the '%bar%' filter on the index tuples, and only fetch those > > heap tuples that match that qual. > > Interesting, I had not considered that. You are using the > index as a single-column table that can be scanned more > quickly than the heap. > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > -- > Sent via pgsql-hackers mailing list > (pgsql-hackers@postgresql.org) To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] postgresql in FreeBSD jails: proposal
Quoting Tom Lane <[EMAIL PROTECTED]>: > Mischa Sandberg <[EMAIL PROTECTED]> writes: > > I'm going to skip the kill(1,0) test and depend on nattch only, > > with a function that PGSharedMemoryIsInUse() can also use. > > (For a healthy server, nattch is never less than 2, right?) > > Oh, forgot to mention: healthy servers are not the point here. > You should make the code keep its hands off any segment with > nonzero nattch, because even one orphaned backend is enough > to cause trouble. Note taken. Worth putting a warning in the log, too? Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [ADMIN] postgresql in FreeBSD jails: proposal
Quoting Stephen Frost <[EMAIL PROTECTED]>: > * Tom Lane ([EMAIL PROTECTED]) wrote: > > "Marc G. Fournier" <[EMAIL PROTECTED]> writes: > > > Easiest fix: change the UID of the user running the postmaster > (ie. pgsql) so > > > that each runs as a distinct UID (instead of distinct PGPORT) ... > been doing > > > this since moving to FreeBSD 6.x ... no patches required ... > > > > Sure, but in the spirit of "belt and suspenders too", I'd think > that > > doing that *and* something like Mischa's proposal wouldn't be bad. > > I agree that we should try to be careful about stepping on > segments that might still be in use, but I would also discourage > jail users from using the same uid for multiple PG clusters > since the jail doesn't protect the shmem segment. > We use seperate uids even w/ linux-vservers where shmem > and everything *is* seperate, following the same > 'belt and suspenders too' spirit for security. Thanks for all the input. Fixing freebsd might get answered on a different channel :-) Unfortunately, different uid's is not even an option here; but serious security in this sitch is not relevant, either. We have a freebsd core guy here, and he says that there's no pressing incentive for jails to handle sysv ipc, given mmap and file locking :-( And given his other comments, I wouldn't consider jails a "secure" environment, just a modest and convenient way to emulate multiple machines with caveats. . So, given Tom's comment, that it's antisocial to zap a shm seg that other processes have attached ... I'm going to skip the kill(1,0) test and depend on nattch only, with a function that PGSharedMemoryIsInUse() can also use. (For a healthy server, nattch is never less than 2, right?) If no unpleasant edge cases come out of this in our test framework, I'd like to submit that as a patch. Talked with our Linux guys about vserver, and they see no issues. Mr. Solaris here is currently a long way ooto ... opinions? Afaics the change in behaviour is, if a degraded server exited with some backend hanging, the second server will create a new segment after bumping the ipc key; if system shm limits do not allow for two such shm segments, the second server will bail. For production systems, ensuring no orphan shm segs is not left to heuristic clean-up by server re-start. Hope that makes sense for the generic Postgres world. If anyone is interested in creating hung backends, you can create a named pipe, and tell the server to COPY from it. --- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql in FreeBSD jails: proposal
Quoting Tom Lane <[EMAIL PROTECTED]>: > Mischa Sandberg <[EMAIL PROTECTED]> writes: > > + /* In a FreeBSD jail, you can't "kill -0" a > postmaster > > +* running in a different jail, so the shm seg > might > > +* still be in use. Safer to test nattch ? > > +*/ > > + if (kill(1,0) && errno == ESRCH && > PGSharedMemoryIsInUse(0,NextShmemSegID)) > > + continue; > > Isn't the last part of that test backward? If it isn't, I don't > understand what it's for at all. Serious blush here. Yes. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] postgresql in FreeBSD jails: proposal
Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A jail is halfway between a chroot and a VM. Jails blow a number of assumptions about a unix environment: sysv ipc's are global to all jails; but a process can only "see" other processes also running in the jail. In fact, the quickest way to tell whether you're running in a jail is to test for process 1. PGSharedMemoryCreate chooses/reuses an ipc key in a reasonable way to cover previous postmasters crashing and leaving a shm seg behind, possibly with some backends still running. Unfortunately, with multiple jails running PG servers and (due to app limitations) all servers having same PGPORT, you get the situation that when jail#2 (,jail#3,...) server comes up, it: - detects that there is a shm seg with ipc key 5432001 - checks whether the associated postmaster process exists (with kill -0) - overwrites the segment created and being used by jail #1 There's a workaround (there always is) other than this patch, involving NAT translation so that the postmasters listen on different ports, but the outside world sees them each listening on 5432. But that seems somewhat circuitous. I've hacked sysv_shmem.c (in PG 8.0.9) to handle this problem. Given the trouble that postmaster goes to, to stop shm seg leakage, I'd like to solicit any opinions on the wisdom of this edge case. If this patch IS useful, what would be the right level of compile-time restriction ("#ifdef __FreeBSD__" ???) @@ -319,7 +319,8 @@ if (makePrivate)/* a standalone backend shouldn't do this */ continue; - + /* In a FreeBSD jail, you can't "kill -0" a postmaster +* running in a different jail, so the shm seg might +* still be in use. Safer to test nattch ? +*/ + if (kill(1,0) && errno == ESRCH && !PGSharedMemoryIsInUse(0,NextShmSegID)) + continue; if ((memAddress = PGSharedMemoryAttach(NextShmemSegID, &shmid)) == NULL) continue; /* can't attach, not one of mine */ End of Patch. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Porting MSSQL to PGSQL -- triggers
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: How expensive is this going to be, especially for huge numbers of rows? Certainly cheaper than firing a per-row trigger. I'm curious: I've never written a MSSQL trigger that did NOT use the INSERTED/DELETED pseudotables (aka NEW/OLD). I know STATEMENT-level triggers have existed in PG for a while ... but what do people use them for??? -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Porting MSSQL to PGSQL (Was: [HACKERS] [OT] MySQL is bad, but
Jim C. Nasby wrote: On Sun, May 21, 2006 at 02:58:17PM -0700, Josh Berkus wrote: Actually, porting TSQL to PL/pgSQL would be very hard. I speak as an expert TSQL developer. For example, most data manipulation in TSQL is done through updatable cursors, something we don't currently support. There are plenty of direct runtime incompatibilities, that have to be implemented not as language-to-language, but language-to-procedure. That also makes them stick out as the first thing to refactor and reimplement, once the basic working system has been ported :-) Hadn't thought about updatable cursors, but solved that problem before in SimbaExpress and SimbaFusion with (effectively) arrays of OID's. That's what I love about PG: so many rich consistent facilities that you can implement things that tie other DB's in knots. Also, T-SQL uses un-ordered, callable parameters for SPs, something which we *also* don't support. Not such a big. Any translation has full access to the catalog (ported over to the target PG system) and the parser/translator can fill in the blanks. And TSQL doesn't fail a transaction on an error, resulting in code like UPDATE ... if @@error = 0 then UPDATE ... end if @@error = 0 then ... Yeah, transaction error behaviour always brings out the nasty incompatibilities (Jim, was that a major problem on Project Lance?). -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Porting MSSQL to PGSQL: trace and profile
Martijn van Oosterhout wrote: For high-end MSSQL shops, a high value is being able to trace and profile (EXPLAIN) every client SQL command from the server side ... with plenty of options for selective trace. This would also be highly valuable to have in PostgreSQL. Are we talking EXPLAIN (which is cheap) or EXPLAIN ANALYZE (which is less so)? No, that's the thing; server-side trace and filtering are EXPLAIN ANALYZE, and require no change to the app, catching (problem) plans in context. For example, (using PG here) we had an ETL program that occasionally went very stupid. It turned out that the problem was the interaction between autovacuum timing, and when the program wiped out and rebuilt a worktable. I had modified the app to EXPLAIN ANALYZE the update commands, but how do you modify the straight SELECTs? Taking the statements out of context and trying them offline with EXPLAIN ANALYZE sent me up the wrong tree a few times :-( With PG, where I end up is setting log_min_duration_statement=10 and grinding the logs for the information I want, hoping to convincingly replay the SQL (with EXPLAIN ANALYZE) under similar circumstances. If there are no applications that PREPARE their statements, I have a chance. BTW was logging of (execution of) prepared statements with context info ever considered? Or have I missed something? -- The MSSQL facility make on-the-fly EXPLAIN ANALYZE possible for all statements. Its selective filter functions make feasible in production systems, where you can turn it on for a week, to catch hard-to-reproduce issues. -- I did go digging into source to make EXPLAIN ANALYZE work for a server-side trace, but it looked like a major. Any pointers? Willing to go dive into it again. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Porting MSSQL to PGSQL (Was: [HACKERS] [OT] MySQL is bad, but THIS bad?)
On Thursday 18 May 2006 12:38, Josh Berkus wrote: Personally, I'd go after MSSQL before I bothered with MySQL. Sure, let's make *migration* easier for those who wake up and smell the BS, but migration can (and probably should) be one-way. Somebody earlier was mentioning, why no automatic transformer from Transact-SQL to PLPGSQL (maybe with a bunch of glue routines). The grammar is not a problem, though you have to wonder at all the wired-in keywords (T-SQL always felt like COBOL). The stumbling blocks are not in language, but function. Many of those functions are rarely used, but some big ones are quite common ... T-SQL has statement-level triggers, and they get used a lot (some big apps ONLY put code in triggers). Statement-level triggers are very efficient for maintaining aggregates; the closest PG has are rewrite rules. Other issues: stored procs returning multiple result sets; "print" statements; SELECT TOP n PERCENT; COMPUTE-expressions (subtotals); and some of the @@global variables that are hard to emulate @@IDENTITY being the main problem in older T-SQL code. OpenXML is cool, but such a pig, that its difficulty in emulation is probably not an issue. There are plenty of things that happily go away, or can be implemented with a client wrapper; for example, BULK INSERT and BACKUP. Other things just have no equivalent, and amount to no-ops in a PG world (partition functions) A few things require some custom metadata tables (MSSQL "RULE" != PG "RULE"). If you want to convince MSSQL users to move over to PG, statement-level triggers (with "OLD" and "NEW" rowsets) are a bottom-line requirement. ... For high-end MSSQL shops, a high value is being able to trace and profile (EXPLAIN) every client SQL command from the server side ... with plenty of options for selective trace. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GPUSort project
[short] This probably would be an uneasy fit into generic backend code. Was hoping the GPUSort project might have fleeced/sorted out some issues. [long] Simon Riggs wrote: On Wed, 2006-04-12 at 10:00 -0700, Mischa Sandberg wrote: ... Long answer: we're shipping a server (appliance) product built on stock rackmount hardware, that includes an ATI Rage (8MB) with nothing to do. Much of what the box does is a single cpu-bound process, sorting maillog extracts. The GPU is an asset, even at 8MB; the headwork is in mapping/truncating sort keys down to dense ~32bit prefixes; and in making smooth judgements as to when to give the job to (a) the GPU (b) quicksort (c) a tiny bitonic sort in the SSE2 registers. It sounds like its possible, but it would have to give incredible gains before its worth the effort to make it happen. 8MB of video RAM doesn't score much against 256MB of normal RAM, which is pretty cheap these days. A better comparison is 8MB of video RAM vs 512K of L2 cache. GPU's (also) have faster access (>32GB/s) to RAM than the CPU, using AGP/PCI with no contention. Our product uses Xeons instead of Opterons; the 3GHz CPUs are just slogging, waiting >70% for RAM fetch. The hardware dependency would make this extremely sensitive to change, so effort in this area might not give lasting benefit. As it happens, I'm in favour of making code changes to exploit hardware, but this one is too far for me to encourage anybody to pursue it further. Fair comment. I'm using OpenGL, and looking at Glift, so it's not as hardware-specific as you might think. Other projects at gpgpu.org seem to be able to switch among GPU's. That being said, humbly admit that targetting specific hardware tends to give one tunnel vision. Coding "if all these conditions are true, use the fast algorithm, else do it the normal way" is also messier to extend than a nice clean interface layer :-( Any of this would apply to postgres, if tuplesort.c can tolerate a preprocessing step that looks for special cases, and degrades gracefully into the standard case. For other techniques, I think it can, depending upon the cost of the preprocessing step. But the overall improvement from improving small sorts could well be lost in the noise...so maybe not worth it. Agreed. GPU setup makes sorts <1MB not worth it. Small sorts get a boost from bitonic sort in SSE2, which wires into the bottom of a special-case quicksort, where any subrange of 9..16 elements gets done in xmm registers. I think the preprocessing to test and format keys for such sorts is useful anyway. I was trying to make radix sort usable, and that requires the same key prep. Even if the key prep hits its space limit and says, the input is unsuitable for radix sort, it still makes the normal quicksort faster, since some key prefixes are shorter. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Get explain output of postgresql in Tables
Greg Sabino Mullane wrote: I wonder if it would help much just to change EXPLAIN to indent with something other than spaces? I like that. Maybe even decrease the indenting a little more, and compress some of the inner whitespace (such as the 2 spaces after the operator name) Might it be worth checking how many people (and apps) use EXPLAIN output to drive apps? Our (web) reporting has a paging system for long reports, that depends on getting the row/cost estimate from "EXPLAIN somequery" before actually executing "somequery". (Yep, we have pg_autovacuum run ANALYZE a lot :-) Anybody else out there using explain output in an automated way? -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GPUSort project
Martijn van Oosterhout wrote: On Tue, Apr 11, 2006 at 04:02:07PM -0700, Mischa Sandberg wrote: Anybody on this list hear/opine anything pf the GPUSort project for postgresql? I'm working on a radix-sort subcase for tuplesort, and there are similarities. http://www.andrew.cmu.edu/user/ngm/15-823/project/ I've heard it meantioned, didn't know they'd got it working. However, none of my database servers have a 3D graphics anywhere near the power they suggest in the article. Is this of practical use for run-of-the-mill video cards? Short answer: maybe. Long answer: we're shipping a server (appliance) product built on stock rackmount hardware, that includes an ATI Rage (8MB) with nothing to do. Much of what the box does is a single cpu-bound process, sorting maillog extracts. The GPU is an asset, even at 8MB; the headwork is in mapping/truncating sort keys down to dense ~32bit prefixes; and in making smooth judgements as to when to give the job to (a) the GPU (b) quicksort (c) a tiny bitonic sort in the SSE2 registers. Any of this would apply to postgres, if tuplesort.c can tolerate a preprocessing step that looks for special cases, and degrades gracefully into the standard case. I'm guessing that there are enough internal sorts (on oid, for example) having only small, memcmp-able sort keys, that this is worth adding in. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Get explain output of postgresql in Tables
Jim C. Nasby wrote: On Wed, Apr 12, 2006 at 04:53:20PM +0200, Thomas Hallgren wrote: Well, the downside is that such a format means explain output is now twice as long. But I'd love to see something like that as an option. I'd also still like to see an SQL-parseable version as well, since I think there's applications for that. On the plus side, a complex xml document is an easy read in a browser (IE or Firefox, either way). Hard to picture the representation in relational tables, though ... did you have some specific idea for what to do with a plan in SQL, once it was parsed? -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] GPUSort project
Anybody on this list hear/opine anything pf the GPUSort project for postgresql? I'm working on a radix-sort subcase for tuplesort, and there are similarities. http://www.andrew.cmu.edu/user/ngm/15-823/project/ -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] plpgsql by default
Are there are more possibilities for some bug in the plpgsql engine to allow an exploit: actually changing the stack through a buffer overflow, or a bug in an intrinsic function, or allowing an injection that crosses some privilege boundary, via someone else's EXECUTE? It's a lot easier to verify the few places where straight SQL can interact with the outside world (NOTIFY, COPY, and trojan .so's come to mind). It is harder for someone to find an unexpected combined-effect exploit, since there's not much you can combine. Perhaps somebody in the core team has reservations about possible points of error to certify in plpgsql: is every possible weird array-overflow case covered? Further, can some innocuous side-effects in execution (INOUT parameters; function ownership; schema settings) combine to create a hole? There's just that much more to worry about. As they say, in theory, theory and practice are the same. In practice, they differ :0) I can understand someone being cautious about making guarantees (or even risk estimates) about plpgsql versus the core engine. And so, just like not INITIALLY letting the server listen on all TCP sockets, it's modest conservatism to let the default be a bit restricted. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Logging explain-analyze output in pg log?
I'm stuck with a web app that periodically has truly awful query response times. The cause may be an interaction between system load, vacuum-analyze scheduling, the occasional wild variation in join selectivity, and stats collection. Logging the queries and running them later doesn't create an obvious pattern, because it's hard to catch the exact circumstances where the query takes forever, or returns zero rows. It would be difficult to change the app to run an EXPLAIN ANALYZE and record the results. I do this when I can, because it's the fastest way to figure out what went wrong in a 10-way join. I would like to add a guc variant on debug_print_plan that logs the output of ExplainOneQuery(), rather than dumping the plan node-traversal output --- perhaps it's just me, but I have trouble relating that output to the tables and columns named neatly in EXPLAIN ANALYZE output. This is different from just running EXPLAIN ANALYZE: this is asking the backend to log the EXPLAIN ANALYZE output, but ALSO return the normal result set. Some points I ponder: (1) Could not find this on the TODO list, nor mentioned in the mail archive. Has this truly never come up? Logging detail query execution to to the profiler is something one gets accustomed to, with MSSQL. (2) src/backend/commands/explain.c:ExplainQuery gives a cautionary comment on scribbling (recording stats) in the execution tree, particularly when that tree will be executed repeatedly from a PLPGSQL proc (not a problem in my case, but ...) (3) any suggestions on what to avoid ... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
Quoting Josh Berkus : > Mischa, > > > Okay, although given the track record of page-based sampling for > > n-distinct, it's a bit like looking for your keys under the > streetlight, > > rather than in the alley where you dropped them :-) > > Bad analogy, but funny. Bad analogy? Page-sampling effort versus row-sampling effort, c'est moot. It's not good enough for stats to produce good behaviour on the average. Straight random sampling, page or row, is going to cause enough untrustworthy engine behaviour,for any %ages small enough to allow sampling from scratch at any time. I'm curious what the problem is with relying on a start-up plus incremental method, when the method in the distinct-sampling paper doesn't degenerate: you can start when the table is still empty. Constructing an index requires an initial full scan plus incremental update; what's the diff? > Unless, of course, we use indexes for sampling, which seems like a > *really > good* idea to me "distinct-sampling" applies for indexes, too. I started tracking the discussion of this a bit late. Smart method for this is in VLDB'92: Gennady Antoshenkov, "Random Sampling from Pseudo-ranked B+-trees". I don't think this is online anywhere, except if you have a DBLP membership. Does nybod else know better? Antoshenkov was the brains behind some of the really cool stuff in DEC Rdb (what eventually became Oracle). Compressed bitmap indices, parallel competing query plans, and smart handling of keys with hyperbolic distributions. -- Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
Quoting Markus Schaber <[EMAIL PROTECTED]>: > Hi, Josh, > > Josh Berkus wrote: > > > Yes, actually. We need 3 different estimation methods: > > 1 for tables where we can sample a large % of pages (say, >= 0.1) > > 1 for tables where we sample a small % of pages but are "easily > estimated" > > 1 for tables which are not easily estimated by we can't afford to > sample a > > large % of pages. > > > > If we're doing sampling-based estimation, I really don't want > people to lose > > sight of the fact that page-based random sampling is much less > expensive than > > row-based random sampling. We should really be focusing on > methods which > > are page-based. Okay, although given the track record of page-based sampling for n-distinct, it's a bit like looking for your keys under the streetlight, rather than in the alley where you dropped them :-) How about applying the distinct-sampling filter on a small extra data stream to the stats collector? -- Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] OLAP and PG and deja-vu
Just finished writing the PG rules to maintain a bunch of materialized (aggregate) views on a ROLAP cube --- yes, I've seen Jonathan Gardner's "matview" for postgres; didnt cover what I needed :-(. PG happens to be pretty convenient for roll-your-own OLAP, thanks to RULES and ARRAY datatypes. So it hasn't been a pain to implement. As it happens, what I've done amounts to implementing covering indexes (that's another thread), and an aid for join-selectivity/result-size estimation. And suddenly I had a case of deja-vu... A decade ago, working with Sybase 4.8/MSSQL 4.2, you could only enforce FK relationships by coding them in triggers. Then, one day, declarative-FK's could be recognized by the engine itself. Suddenly, the query planner could make real use of those relations, typically dropping them out of complex joins. Hurrah! Now, here I am, doing what every ROLAP system needs: the ability to store and query aggregates of the fact table, then parsing a query to determine which (smallest) level of aggregate can answer the query. There are many such app-level query generators that bear the brunt of keeping track of, and choosing from, multiple aggregates --- a task that's on a par with creating appropriate indexes, and about as much fun. Maybe one could kill a couple of birds with one stone, by making materialized views visible to the query optimizer? Discussion? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
Quoting Josh Berkus : > > >Perhaps I can save you some time (yes, I have a degree in Math). If I > > >understand correctly, you're trying extrapolate from the correlation > > >between a tiny sample and a larger sample. Introducing the tiny sample > > >into any decision can only produce a less accurate result than just > > >taking the larger sample on its own; GIGO. Whether they are consistent > > >with one another has no relationship to whether the larger sample > > >correlates with the whole population. You can think of the tiny sample > > >like "anecdotal" evidence for wonderdrugs. > > Actually, it's more to characterize how large of a sample we need. For > example, if we sample 0.005 of disk pages, and get an estimate, and then > sample another 0.005 of disk pages and get an estimate which is not even > close to the first estimate, then we have an idea that this is a table which > defies analysis based on small samples. Wheras if the two estimates are < > 1.0 stdev apart, we can have good confidence that the table is easily > estimated. Note that this doesn't require progressively larger samples; any > two samples would work. We're sort of wandering away from the area where words are a good way to describe the problem. Lacking a common scratchpad to work with, could I suggest you talk to someone you consider has a background in stats, and have them draw for you why this doesn't work? About all you can get out of it is, if the two samples are disjunct by a stddev, yes, you've demonstrated that the union of the two populations has a larger stddev than either of them; but your two stddevs are less info than the stddev of the whole. Breaking your sample into two (or three, or four, ...) arbitrary pieces and looking at their stddevs just doesn't tell you any more than what you start with. -- "Dreams come true, not free." -- S.Sondheim, ITW ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
Quoting Andrew Dunstan <[EMAIL PROTECTED]>: > After some more experimentation, I'm wondering about some sort of > adaptive algorithm, a bit along the lines suggested by Marko Ristola, but limited to 2 rounds. > > The idea would be that we take a sample (either of fixed size, or > some small proportion of the table) , see how well it fits a larger sample > > (say a few times the size of the first sample), and then adjust the > formula accordingly to project from the larger sample the estimate for the full population. Math not worked out yet - I think we want to ensure that the result remains bounded by [d,N]. Perhaps I can save you some time (yes, I have a degree in Math). If I understand correctly, you're trying extrapolate from the correlation between a tiny sample and a larger sample. Introducing the tiny sample into any decision can only produce a less accurate result than just taking the larger sample on its own; GIGO. Whether they are consistent with one another has no relationship to whether the larger sample correlates with the whole population. You can think of the tiny sample like "anecdotal" evidence for wonderdrugs. -- "Dreams come true, not free." -- S.Sondheim, ITW ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Indexed views?
Greg Stark wrote: Mischa Sandberg <[EMAIL PROTECTED]> writes: I take it that it is a very reasonable assumption that only a small proportion of index records are actually invalid (else Yurk why use the index?). That's faulty logic, the percentage of tuples that are valid is entirely independent from the percentage of tuples that match your range criterion. Ie, I could be selecting 100 tuples out of a million -- even if 99 are invalid it's still worthwhile to use the index. Ummm ... perhaps I glossed over a bit of inference. If only a small proportion of the index contains invalid row references, then (in the absence of specific biases otherwise) arbitrary queries using that index will average the same proportion of invalid row references. And agreed, it would still be worthwhile to use the index in that case. Your analyze stats would be a bit queered, though. Since you're using an index at all, the planner must be expecting a restricted set of rows to make it up through to the root. If there is any filter criteria against the values from the index rows, you won't even have to check rows for tuple visibility, that don't pass that filter. It's an interesting idea though. But I can't think of many queries where it would be interesting. The query would still have to visit every page containing a record used in the final result. So the only time this would be a significant win is if you're applying very selective restrictions to columns that were in the index but weren't able to put in the index condition. This seems like a pretty rare situation; usually the reason you put columns in an index definition is because it is going to be useful for index conditions-- especially if it's a particularly selective column. Ummm ... two situations where filters on index columns do not fit the standard index probe are: - filtering by restrictive join. Whether the index is the source or target of restriction, you get a better choice of join operators/orders. For example, if the index is the restrictor, you may be able to build a hash table of (filtered) index rows, where building a hash table from a heap scan would be a bad choice. - filtering of non-root index fields, or filtering with inequalities. Normally, the planner will not bother with the index for these, and may do a serial scan of the table. This can be done with a serial scan of the index, with possible optimizations like Oracle's "skip scan". Furthermore, what 'covering' indexes buy you is, they have all the data you need for the query results, whether you apply predicates to them all or not. At another level, people are talking about decomposition storage models for data in disk pages, versus n-ary storage models. That's more or less a fancy way of saying, organize data by columns instead of groups. This storage model pays you back in CPU cycles on most computers with L1/L2 cache splits. At such point as PG might consider moving to that, then the row validity columns would be grouped together in a page, and the verification of index rows would be significantly faster: only a small portion of a large page need be read and pushed through the CPU. [For more on DSM vs NSM, google: NSM n-ary DSM ] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Indexed views?
Greg Stark wrote: Doug McNaught <[EMAIL PROTECTED]> writes: Short answer: MVCC tuple visibility status isn't (and can't be) stored in the index. Well the "can't" part is false or at least unproven. From prior discussion the only thing that would be technically challenging would be avoiding deadlocks. Rather than yank the MVCC visibility around, how about a (relatively small) change to the query plan ... I take it that it is a very reasonable assumption that only a small proportion of index records are actually invalid (else Yurk why use the index?). In that case, how about tacking the heap table row ptr onto result tuples, and letting them percolate up through the tree? Since you're using an index at all, the planner must be expecting a restricted set of rows to make it up through to the root. If there is any filter criteria against the values from the index rows, you won't even have to check rows for tuple visibility, that don't pass that filter. As soon as you rise to a point where (index) rows will either be multiplied (a join) or combined (a group-by/distinct), you can validate them against the heap file in relatively large batches, with a hash caching of which ones have been checked for visibility. Just a thought. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Installing PostgreSQL in a Unix Platform
Ummm time to get a 'Unix in 21 days' book, because what you're relating indicates you are battling with some beginner basics: Try: $ cd /usr/local or: $ cd /usr $ cd local not: $ cd /usr $ cd /local And you can't execute a program in the current directory (like DOS) unless you explicitly have '.' in your $PATH. It would have executed if you'd entered: $ ./pg_ctl ... but I STRONGLY suggest that, until you understand why, you do not continue with what you're doing, but rather, learn the Unix basics first. Eyinagho Newton wrote: Dear Drake, Thanks for your response and suggestions on what i should do to solve the above problem. I have tried it and here are the results i had: doing a cd /usr from the command line took me into usr directory. However when i tried to do another cd /local from the above resulting command line, i got the following error: bash: cd: /local: no such file or directory. I later tried to change directory again to 'bin' and therein i found 'pg_ctl' . However, when i tried to click on it, i got a display message with the words 'couldn't find the program "pg_ctl" . Again, i tried to enter shell command as a superuser, and through it executed the following shell command cd /usr and then cd /bin. When in the bin directory, i did an 'ls' command but couldn't find pg_ctl. I suspect that it truely wasn't installed. What should i do next please? Cheers, Newton Eyinagho --- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: Hello, If you are compiling from source then the postgresql startup script does not get installed. You can use pg_ctl in /usr/local/pgsql/bin/pg_ctl to start postgresql. Alternatively you can install the startup script from the contrib directory startscripts/linux Sincerely, Joshua D. Drake Eyinagho Newton wrote: It has been impossible to install PostgreSQL, an open source software, from my SUSE 8.0 . Although it tells you its been installed, each time i try to start it using SysV-Init Editor, i get the following message: starting/etc/init.d/postgresql"" starting postgreSQL /etc/init.d/postgresql.checkproc:command not found done What exactly am i doing wrong? Or better still, what haven't i done yet? Please help me Newton Eyinagho __ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL ___ Do you Yahoo!? Win 1 of 4,000 free domain names from Yahoo! Enter now. http://promotions.yahoo.com/goldrush ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] PG replic refs
http://www.cs.mcgill.ca/~kemme/papers/vldb00.html ARC buffer replacement policy supersedes LRU. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings