Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 18:05, Fujii Masao wrote: > On Sat, Mar 24, 2012 at 1:49 AM, Robert Haas wrote: >> On Fri, Mar 23, 2012 at 12:42 PM, Fujii Masao wrote: >>> On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas wrote: On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao wrote: > On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander > wrote: >> Might it be a good idea to put it on it's own row instead of changing >> the format of an existing row, in order not to break scripts and >> programs that are parsing the previous output? > > Good idea! What row name should we use for the WAL file containing > REDO record? "Latest checkpoint's REDO file"? Sounds good to me. I like the idea, too. The status quo is an unnecessary nuisance, so this will be a nice usability improvement. >>> >>> Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into >>> the result of pg_controldata. I used the term "WAL segment" for the row name >>> instead of "file" because "WAL segment" is used in another row "Bytes per >>> WAL >>> segment:". But better name? >> >> s/segment/file/g? > > Yep, "file" might be more intuitive for a user than "segment". Attached is the > "file" version of the patch. We're already using "file" to mean something different *internally*, don't we? And since pg_controldata shows fairly internal information, I'm not sure this is the best idea. Maybe compromise and call it "segment file" - that is both easier to understand than segment, and not actually using a term that means something else... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, v16
On Friday, March 16, 2012 10:40:46 AM Dimitri Fontaine wrote: > > This will have the effect of calling triggers outside of alphabetic > > order. I don't think thats a good idea even if one part is ANY and the > > other a specific command. > > I don't think there is any reason anymore to separate the two? The only > > > callsite seems to look like: > The idea is to have a predictable ordering of command triggers. The code > changed in the patch v16 (you pasted code from git in between v15 and > v16, I cleaned it up) and is now easier to read: > > case CMD_TRIGGER_FIRED_BEFORE: > whenstr = "BEFORE"; > procs[0] = cmd->before_any; > procs[1] = cmd->before; > break; > > case CMD_TRIGGER_FIRED_AFTER: > whenstr = "AFTER"; > procs[0] = cmd->after; > procs[1] = cmd->after_any; > break; > > So it's BEFORE ANY then BEFORE command then AFTER command then AFTER > ANY. That's an arbitrary I made and we can easily reconsider. Triggers > are called in alphabetical order in each “slot” here. > > In my mind it makes sense to have ANY triggers around the specific > triggers, but it's hard to explain why that feels better. I still think this would be a mistake. I don't have a hard time imagining usecases where a specific trigger should be called before or after an ANY trigger because e.g. it wants to return a more specific error or doesn't want to check all preconditions already done by the ANY trigger... All that would be precluded by enforcing a strict ordering between ANY and specific triggers. I don't see a use-case that would benefit from the current behaviour... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb and fsync
On Sat, 2012-03-17 at 17:48 +0100, Cédric Villemain wrote: > I agree with Andres. > > > I believe we should use sync_file_range (_before?) with linux. > > And we can use posix_fadvise_dontneed on other kernels. > OK, updated patch attached. sync_file_range() is preferred, posix_fadvise() is a fallback. Regards, Jeff Davis initdb-fsync-20120325.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Fri, Mar 23, 2012 at 11:11 AM, Alvaro Herrera wrote: > Are you going to provide a rebased version? Rebased version attached, this patch also includes Robert's earlier suggestions. parallel_pg_dump_5.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY / extend ExclusiveLock
Jim Nasby writes: > On 3/22/12 2:13 PM, Stephen Frost wrote: >> Has anyone been working on or considering how to improve the logic >> around doing extends on relations to perhaps make larger extensions >> for larger tables? Or make larger extensions when tables are growing >> very quickly? > I know that there's been discussion around this. Way back in the day we > extended relations one page at a time. I don't remember if that was changed > or not. No, it's still on the to-do list. One issue is that you probably don't want individual insertion queries taking big response-time hits to do large extensions themselves --- so somehow the bulk of the work needs to be pushed to some background process. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY / extend ExclusiveLock
On 3/22/12 2:13 PM, Stephen Frost wrote: Greetings, I've recently become a bit annoyed and frustrated looking at this in top: 23296 postgres 20 0 3341m 304m 299m S 12 0.9 1:50.02 postgres: sfrost gis [local] COPY waiting 24362 postgres 20 0 3353m 298m 285m D 12 0.9 1:24.99 postgres: sfrost gis [local] COPY 24429 postgres 20 0 3340m 251m 247m S 11 0.8 1:13.79 postgres: sfrost gis [local] COPY waiting 24138 postgres 20 0 3341m 249m 244m S 10 0.8 1:28.09 postgres: sfrost gis [local] COPY waiting 24153 postgres 20 0 3340m 246m 241m S 10 0.8 1:24.44 postgres: sfrost gis [local] COPY waiting 24166 postgres 20 0 3341m 318m 313m S 10 1.0 1:40.52 postgres: sfrost gis [local] COPY waiting 24271 postgres 20 0 3340m 288m 283m S 10 0.9 1:34.12 postgres: sfrost gis [local] COPY waiting 24528 postgres 20 0 3341m 290m 285m S 10 0.9 1:21.23 postgres: sfrost gis [local] COPY waiting 24540 postgres 20 0 3340m 241m 236m S 10 0.7 1:15.91 postgres: sfrost gis [local] COPY waiting Has anyone been working on or considering how to improve the logic around doing extends on relations to perhaps make larger extensions for larger tables? Or make larger extensions when tables are growing very quickly? I haven't looked at the code, but I'm guessing we extend relations when they're full (that part makes sense..), but we extend them an itty-bitty bit at a time, which very quickly ends up being not fast enough for the processes that want to get data into the table. My gut feeling is that we could very easily and quickly improve this situation by having a way to make larger extensions, and then using that method when we detect that a table is growing very quickly. I know that there's been discussion around this. Way back in the day we extended relations one page at a time. I don't remember if that was changed or not. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoint patches
On 3/23/12 7:38 AM, Robert Haas wrote: And here are the latency results for 95th-100th percentile with checkpoint_timeout=16min. ckpt.master.13: 1703, 1830, 2166, 17953, 192434, 43946669 ckpt.master.14: 1728, 1858, 2169, 15596, 187943, 9619191 ckpt.master.15: 1700, 1835, 2189, 22181, 206445, 8212125 The picture looks similar here. Increasing checkpoint_timeout isn't *quite* as good as spreading out the fsyncs, but it's pretty darn close. For example, looking at the median of the three 98th percentile numbers for each configuration, the patch bought us a 28% improvement in 98th percentile latency. But increasing checkpoint_timeout by a minute bought us a 15% improvement in 98th percentile latency. So it's still not clear to me that the patch is doing anything on this test that you couldn't get just by increasing checkpoint_timeout by a few more minutes. Granted, it lets you keep your inter-checkpoint interval slightly smaller, but that's not that exciting. That having been said, I don't have a whole lot of trouble believing that there are other cases where this is more worthwhile. I wouldn't be too quick to dismiss increasing checkpoint frequency (ie: decreasing checkpoint_timeout). On a high-value production system you're going to care quite a bit about recovery time. I certainly wouldn't want to run our systems with checkpoint_timeout='15 min' if I could avoid it. Another $0.02: I don't recall the community using pg_bench much at all to measure latency... I believe it's something fairly new. I point this out because I believe there are differences in analysis that you need to do for TPS vs latency. I think Robert's graphs support my argument; the numeric X-percentile data might not look terribly good, but reducing peak latency from 100ms to 60ms could be a really big deal on a lot of systems. My intuition is that one or both of these patches actually would be valuable in the real world; it would be a shame to throw them out because we're not sure how to performance test them... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] who's familiar with the GSOC application process
On 3/23/12 1:58 PM, Dave Cramer wrote: > Someone has approached the JDBC list to do some work on the driver as > a GSOC project. > > I need to know what is involved in mentoring and how to get the > project approved I've approved you as a mentor. I suggest also joining the pgsql-students mailing list. You should get the student to submit an application on Monday. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how can i see the log..?
lekon wrote: > i want see the log of INSERT, SELECT, DELETE... within postgresql > v9.1.3 The pgsql-hackers list is for discussions about development of the PostgreSQL datatbase system; discussions about how to use it should go to another list. The particular question would be appropriate for pgsql-general or pgsql-admin. See this page descriptions of the various listings to help decide where to post future questions: http://www.postgresql.org/community/lists/ Regarding the question itself, see the log_statement configuration setting: http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Friday, March 23, 2012 04:32:02 PM Dimitri Fontaine wrote: > I would like to get back on code level review now if at all possible, > and I would integrate your suggestions here into the next patch revision > if another one is needed. Ok, I will give it another go. Btw I just wanted to alert you to being careful when checking in the expect files ;) NOTICE: snitch: BEFORE any DROP TRIGGER -ERROR: unexpected name list length (3) +NOTICE: snitch: BEFORE DROP TRIGGER foo_trigger +NOTICE: snitch: AFTER any DROP TRIGGER create conversion test for 'utf8' to 'sjis' from utf8_to_sjis; j you had an apparerently un-noticed error in there ;) 1. if (!HeapTupleIsValid(tup)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("command trigger \"%s\" does not exist, skipping", trigname))); The "skipping" part looks like a copy/pasto... 2. In PLy_exec_command_trigger youre doing a PG_TRY() which looks pointless in the current incarnation. Did you intend to add something in the catch? I think without doing a decref of pltdata both in the sucess and the failure path youre leaking memory. 3. In plpython: Why do you pass objectId/pltobjectname/... as "NULL" instead of None? Using a string for it seems like a bad from of in-band signalling to me. 4. Not sure whether InitCommandContext is the best place to suppress command trigger usage for some commands. That seems rather unintuitive to me. But perhaps the implementation-ease is big enough... Thats everything new I found... Not bad I think. After this somebody else should take a look at I think (commiter or not). > The only point yet to address from last round from Andres is about the > API around CommandFiresTrigger() and the Memory Context we use here. > We're missing an explicit Reset call, and to be able to have we need to > have a more complex API, because of the way RemoveObjects() and > RemoveRelations() work. > > We would need to add no-reset APIs and an entry point to manually reset > the memory context, which currently gets disposed at the same time as > its parent context, the current one that's been setup before entering > standard_ProcessUtility(). Not sure if youre expecting further input from me about that? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] occasional startup failures
On Sun, Mar 25, 2012 at 18:59, Tom Lane wrote: > Andrew Dunstan writes: >> Every so often buildfarm animals (nightjar and raven recently, for >> example) report failures on starting up the postmaster. It appears that >> these failures are due to the postmaster not creating the pid file >> within 5 seconds, and so the logic in commit >> 0bae3bc9be4a025df089f0a0c2f547fa538a97bc kicks in. Unfortunately, when >> this happens the postmaster has in fact sometimes started up, and the >> end result is that subsequent buildfarm runs will fail when they detect >> that there is already a postmaster listening on the port, and without >> manual intervention to kill the "rogue" postmaster this continues endlessly. > >> I can probably add some logic to the buildfarm script to try to detect >> this condition and kill an errant postmaster so subsequent runs don't >> get affected, but that seems to be avoiding a problem rather than fixing >> it. I'm not sure what we can do to improve it otherwise, though. > > Yeah, this has been discussed before. IMO the only real fix is to > arrange things so that the postmaster process is an immediate child of > pg_ctl, allowing pg_ctl to know its PID directly and not have to rely > on the pidfile appearing before it can detect whether the postmaster > is still alive. Then there is no need for a guesstimated timeout. > That means not using system() anymore, but rather fork/exec, which > mainly implies having to write our own code for stdio redirection. > So that's certainly doable if a bit tedious. I have no idea about > the Windows side of it though. We already do something like this on Win32 - at least one reason being dealing with restricted tokens. Right now we just close the handles to the child, but we could easily keep those around for doing this type of detection. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] occasional startup failures
Andrew Dunstan writes: > Every so often buildfarm animals (nightjar and raven recently, for > example) report failures on starting up the postmaster. It appears that > these failures are due to the postmaster not creating the pid file > within 5 seconds, and so the logic in commit > 0bae3bc9be4a025df089f0a0c2f547fa538a97bc kicks in. Unfortunately, when > this happens the postmaster has in fact sometimes started up, and the > end result is that subsequent buildfarm runs will fail when they detect > that there is already a postmaster listening on the port, and without > manual intervention to kill the "rogue" postmaster this continues endlessly. > I can probably add some logic to the buildfarm script to try to detect > this condition and kill an errant postmaster so subsequent runs don't > get affected, but that seems to be avoiding a problem rather than fixing > it. I'm not sure what we can do to improve it otherwise, though. Yeah, this has been discussed before. IMO the only real fix is to arrange things so that the postmaster process is an immediate child of pg_ctl, allowing pg_ctl to know its PID directly and not have to rely on the pidfile appearing before it can detect whether the postmaster is still alive. Then there is no need for a guesstimated timeout. That means not using system() anymore, but rather fork/exec, which mainly implies having to write our own code for stdio redirection. So that's certainly doable if a bit tedious. I have no idea about the Windows side of it though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
Thanks.. I'll keep those issues in mind. On Sat, Mar 24, 2012 at 6:18 PM, Tatsuo Ishii wrote: > >> Well, you'd have to start by demonstrating the benefit of it. The > >> advantage of query caches in proxies and clients is well-known, because > you > >> can offload some of the work of the database onto other servers, this > >> increasing capacity. Adding a query cache to the database server would > >> require the "query identity recognition" of the cache to be far cheaper > (as > >> in 10X cheaper) than planning and running the query, which seems > unlikely > >> at best. > >> > >> I figured I'd create the md5 digest of the sourceText of a query, and > then > > look that up in a hash. I don't think that will be very expensive. I'll > > have another hash to keep track of which queries are dependent on which > > relations, so that when a relation is changed somehow (and committed), > the > > query is then invalidated and removed from the query hash. > > From the experience of implementing query cache in pgool-II there are > some suggestions: > > - A query result cache should not be created if the transaction > including the SELECT is not committed. > > - Since a transaction could have many SELECTs, you need to keep those > query results somewhere in a temporary storage. You could either > discard or register them to the query cache storage depending on the > transaction's fate, either aborted or committed. > > - If a SELECT has non-immutable functions, then the query result > should not be cached. > > - If a SELECT uses temporary tables, then the query result should not > be cached. > > - If a SELECT uses unlogged tables, then the query result should not > be cached because their data could vanish after crash recovery. Of > course this is only applied if you plan to use cache storage which > does not survive after crash. > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp >
[HACKERS] occasional startup failures
Every so often buildfarm animals (nightjar and raven recently, for example) report failures on starting up the postmaster. It appears that these failures are due to the postmaster not creating the pid file within 5 seconds, and so the logic in commit 0bae3bc9be4a025df089f0a0c2f547fa538a97bc kicks in. Unfortunately, when this happens the postmaster has in fact sometimes started up, and the end result is that subsequent buildfarm runs will fail when they detect that there is already a postmaster listening on the port, and without manual intervention to kill the "rogue" postmaster this continues endlessly. I can probably add some logic to the buildfarm script to try to detect this condition and kill an errant postmaster so subsequent runs don't get affected, but that seems to be avoiding a problem rather than fixing it. I'm not sure what we can do to improve it otherwise, though. Thoughts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats
Robert Haas writes: > The bit about maybe not getting both t and f as MCVs on a Boolean does > seem a little worrying, but I'm not sure whether it actually affects > query planning in a materially negative way. Can you demonstrate a > case where it matters? If we were trying to force that to happen it would be wrong anyway. Consider a column that contains *only* "t", or at least has so few "f"'s that "f" appears never or only once in the selected sample. (IIRC there is a clamp that prevents selecting anything as an MCV unless it appears at least twice in the sample.) Like Robert, I'm not convinced whether or not this is a reasonable change, but arguing for it on the basis of boolean columns doesn't seem very sound. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
On Sun, Mar 25, 2012 at 6:11 AM, Marc Mamin wrote: > Hello, > > Here is something we'd like to have: > > http://archives.postgresql.org/pgsql-hackers/2012-01/msg00650.php > > As we are quite busy and this issue hasn't a high priority, we haven't > followed it until now :-( > > I'm only a Postgres user, not a hacker, so I don't have the knowledge to > help on this nor to evaluate if this is might be a good Gssoc project. > > Just an idea for the case you are looking for another topic. Good idea. If anyone want so pursue it, I'd strongly suggest building it as a contrib module rather than dedicated syntax, because I'm not sure there'd be any consensus on adding syntax for it to core. Actually, though, I wonder how much faster it would be than CREATE TABLE AS? Block-level copy should be faster than tuple-level copy, but I'm not sure whether it would be a lot faster or only slightly faster. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats
On Sat, Mar 24, 2012 at 12:17 AM, Dan McGee wrote: > This is a bit of a corner case in all honesty, but if you have a short > table (under 20 rows), the 10% heuristic used that decides whether > distinct values scale with the row count will result in rather odd > values for stadistinct in pg_statistic, such as '-0.2' or '-0.67', > rather than the expected '2'. Additionally, this can cause only one of > {t, f} to appear in the most common values array. > > Does this actually affect query planning in any way? Probably not, but > it is extremely odd to look at pg_stats for these columns, and the > solution seems easy. But the stats aren't there to be looked at, but rather to guide query planning. If at execution time there are 100 rows in the table, should we still assume that there are only 2 distinct values in the table, or that it's gone up to about 50 distinct values? It's hard to say, but there's no apparent reason to think that the number of distinct values will scale up for a large table but not a small table. The bit about maybe not getting both t and f as MCVs on a Boolean does seem a little worrying, but I'm not sure whether it actually affects query planning in a materially negative way. Can you demonstrate a case where it matters? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL optimisations on Linux machines with more than 24 cores
On Sun, Mar 25, 2012 at 7:14 AM, Constantin Teodorescu wrote: > "This paper analyzes the scalability of seven system applications (Exim, > memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on > Linux on a 48-core computer. Except for gmake, all applications > trigger scalability bottlenecks inside a recent Linux kernel. Using mostly > standard parallel programming techniques— this paper introduces one new > technique, sloppy counters—these bottlenecks can be removed from the > kernel or avoided by changing the applications slightly. Modifying the > kernel required in total 3002 lines of code changes. A speculative > conclusion from this analysis is that there is no scalability reason to give > up on traditional operating system organizations just yet." > > There are a couple of recommendations there for PostgreSQL, small changes > in PostgreSQL source (ex. increasing the number of mutexes from 16 to 1024, > row and table-level locking) other in Linux kernel (modified lseek) that > seems to be effective for scaling beyond 24 cores. Linux 3.2 includes lseek modifications to improve scalability. PostgreSQL 9.2 will include significant lock manager changes; we did not adopt their proposal of just bumping up the number of lock manager locks, as that doesn't help in common cases, such as when lots of concurrent backends are accessing the same table, and it has other downsides, such as slowing down operations that must touch multiple lock manager partitions. Rather, we introduced a "fast path" for common cases that bypasses the main lock manager altogether; thus, traffic on the lock manager locks will be dramatically reduced in PostgreSQL 9.2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL optimisations on Linux machines with more than 24 cores
Hello all ... again ! SHORT INTRODUCTION === Remember me? A couple of years ago (maybe 10) I created the PgAccess package for managing PostgreSQL! Since than I have done many things, all of them related to PostgreSQL, of course! Our company is managing the National Animal Tracking and Registering Database in Romania, a pretty big database on a 24 core Intel Xeon machine with 32 Gb RAM THE PROBLEM After many months of optimisation and fine tuning, getting the most out of our machine, I just found an article regarding Linux scalability on many cores, including an analysis of other different programs, including PostgreSQL: http://pdos.csail.mit.edu/papers/linux:osdi10.pdf "This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48-core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques— this paper introduces one new technique, sloppy counters—these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet." There are a couple of recommendations there for PostgreSQL, small changes in PostgreSQL source (ex. increasing the number of mutexes from 16 to 1024, row and table-level locking) other in Linux kernel (modified lseek) that seems to be effective for scaling beyond 24 cores. THE QUESTION = The article says that the tests were done using a PostgreSQL 8.3.9 version. Is the new 9.1.3 PostgreSQL using some of the patches recommended in that article that would enhance multi-core scalability? Thank you very much for your attention, Constantin Teodorescu
[HACKERS] [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats
This is a bit of a corner case in all honesty, but if you have a short table (under 20 rows), the 10% heuristic used that decides whether distinct values scale with the row count will result in rather odd values for stadistinct in pg_statistic, such as '-0.2' or '-0.67', rather than the expected '2'. Additionally, this can cause only one of {t, f} to appear in the most common values array. Does this actually affect query planning in any way? Probably not, but it is extremely odd to look at pg_stats for these columns, and the solution seems easy. --- The only other minor changes included here were to make it clear when we were comparing float values, so use 0.0 instead of 0. Example stats output from the database I noticed this on: archweb=# SELECT c.relname, a.attname, pg_stat_get_live_tuples(c.oid) AS n_live_tup, stadistinct, stanullfrac, stawidth, stavalues1, stanumbers1 FROM pg_statistic s JOIN pg_class c ON c.oid = s.starelid JOIN pg_attribute a ON c.oid = a.attrelid AND a.attnum = s.staattnum LEFT JOIN pg_namespace n ON n.oid = c.relnamespace JOIN pg_type t ON t.oid = a.atttypid WHERE NOT a.attisdropped AND nspname = 'public' AND t.typname = 'bool' ORDER BY stadistinct, n_live_tup; relname|attname| n_live_tup | stadistinct | stanullfrac | stawidth | stavalues1 | stanumbers1 ---+---++-+-+--++--- mirrors_mirrorprotocol| is_download | 3 | -0.67 | 0 |1 | {t}| {0.67} arches| agnostic | 3 | -0.67 | 0 |1 | {f}| {0.67} repos | staging | 10 |-0.2 | 0 |1 | {f,t} | {0.7,0.3} repos | testing | 10 |-0.2 | 0 |1 | {f,t} | {0.7,0.3} devel_pgpsignature| valid |264 | 1 | 0 |1 | {t}| {1} packages_flagrequest | is_spam |415 | 1 | 0 |1 | {f}| {1} donors| visible |716 | 1 | 0 |1 | {t}| {1} auth_user | is_superuser | 95 | 2 | 0 |1 | {f,t} | {0.957895,0.0421053} user_profiles | notify| 95 | 2 | 0 |1 | {t,f} | {0.957895,0.0421053} auth_user | is_active | 95 | 2 | 0 |1 | {t,f} | {0.621053,0.378947} auth_user | is_staff | 95 | 2 | 0 |1 | {f,t} | {0.873684,0.126316} releng_iso| active|158 | 2 | 0 |1 | {f,t} | {0.89,0.106667} mirrors_mirror| isos |180 | 2 | 0 |1 | {t,f} | {0.972678,0.0273224} mirrors_mirror| active|180 | 2 | 0 |1 | {t,f} | {0.672131,0.327869} mirrors_mirror| public|180 | 2 | 0 |1 | {t,f} | {0.978142,0.0218579} mirrors_mirrorurl | has_ipv6 |379 | 2 | 0 |1 | {f,t} | {0.709763,0.290237} mirrors_mirrorurl | has_ipv4 |379 | 2 | 0 |1 | {t}| {0.997361} packages_flagrequest | is_legitimate |415 | 2 | 0 |1 | {t,f} | {0.992754,0.00724638} packages_signoffspecification | enabled | 1130 | 2 | 0 |1 | {t,f} | {0.977578,0.0224215} packages_signoffspecification | known_bad | 1130 | 2 | 0 |1 | {f,t} | {0.993722,0.00627803} mirrors_mirrorlog | is_success| 12715 | 2 | 0 |1 | {t,f} | {0.953345,0.0466552} package_depends | optional | 28592 | 2 | 0 |1 | {f,t} | {0.880322,0.119678} package_files | is_directory | 225084 | 2 | 0 |1 | {f,t} | {0.829933,0.170067} (23 rows) src/backend/commands/analyze.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 9cd6e67..995ed9d 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -2110,7 +2110,7 @@ compute_minimal_stats(VacAttrStatsP stats, * least 2 instances in the sample. */ if (track_cnt < track_max && to
[HACKERS] who's familiar with the GSOC application process
Someone has approached the JDBC list to do some work on the driver as a GSOC project. I need to know what is involved in mentoring and how to get the project approved Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade incorrectly equates pg_default and database tablespace
Hi, while working on a support case I stumbled upon a bug in pg_upgrade. Upgrade fails with "No such file or directory" when a database is moved to a non-default tablespace and contains a table that is moved to pg_default. The cause seems to be that the following test incorrectly equates empty spclocation with database tablespace: tblspace = PQgetvalue(res, relnum, i_spclocation); /* if no table tablespace, use the database tablespace */ if (strlen(tblspace) == 0) tblspace = dbinfo->db_tblspace; Patch to fix this is attached. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c index 36683fa..3914403 100644 --- a/contrib/pg_upgrade/info.c +++ b/contrib/pg_upgrade/info.c @@ -253,6 +253,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) char *nspname = NULL; char *relname = NULL; int i_spclocation, +i_spcoid, i_nspname, i_relname, i_oid, @@ -269,7 +270,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) snprintf(query, sizeof(query), "SELECT c.oid, n.nspname, c.relname, " - " c.relfilenode, %s " + " c.relfilenode, t.oid AS spcoid, %s " "FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n " " ON c.relnamespace = n.oid " " LEFT OUTER JOIN pg_catalog.pg_tablespace t " @@ -306,6 +307,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_nspname = PQfnumber(res, "nspname"); i_relname = PQfnumber(res, "relname"); i_relfilenode = PQfnumber(res, "relfilenode"); + i_spcoid = PQfnumber(res, "spcoid"); i_spclocation = PQfnumber(res, "spclocation"); for (relnum = 0; relnum < ntups; relnum++) @@ -325,7 +327,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) tblspace = PQgetvalue(res, relnum, i_spclocation); /* if no table tablespace, use the database tablespace */ - if (strlen(tblspace) == 0) + if (atooid(PQgetvalue(res, relnum, i_spcoid)) == InvalidOid) tblspace = dbinfo->db_tblspace; strlcpy(curr->tablespace, tblspace, sizeof(curr->tablespace)); } diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h index c1925cf..234ca99 100644 --- a/contrib/pg_upgrade/pg_upgrade.h +++ b/contrib/pg_upgrade/pg_upgrade.h @@ -109,7 +109,8 @@ typedef struct char relname[NAMEDATALEN]; /* relation name */ Oid reloid; /* relation oid */ Oid relfilenode; /* relation relfile node */ - char tablespace[MAXPGPATH]; /* relations tablespace path */ + /* relations tablespace path, empty for pg_default */ + char tablespace[MAXPGPATH]; } RelInfo; typedef struct -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)
Hi, [...] > > Placing a sync (i.e., the strongest Power barrier) accordingly would, > > however, > > still be insufficient for the second problem, as it would only fix the > > reordering of read-read pairs by Worker 1 and the store atomicity issue from > > Worker 0. But the writes on Worker 0 could still be reordered (problem > > number > > 2). One possible fix consists of placing a sync between the two writes on > > Worker > > 0, and an address dependency between the two reads on Worker 1. Clearly, > > however, these are changes that cannot any longer be hidden behind the > > ResetLatch/WaitLatch interface, but rather go in the code using these. > [...] > However, your example is enough unlike the actual code that the > conclusion you state following the word "clearly" isn't actually clear > to me. According to latch.h, the correct method of using a latch is > like this: > > * for (;;) > * { > * ResetLatch(); > * if (work to do) > * Do Stuff(); > * WaitLatch(); > * } > > Meanwhile, anyone who is creating additional work to do should add the > work to the queue and then set the latch. > When writing the above statement, including the "clearly", we were possibly too much thinking of the above usage hint, which just uses ResetLatch and WaitLatch. As you say, ... > So it seems to me that we could potentially fix this by inserting > barriers at the end of ResetLatch and at the beginning of SetLatch and > WaitLatch. Then the latch has to get reset before we check whether > there's work to do; and we've got to finish checking for work before > we again try to wait for the latch. Similarly, any work that was in > progress before SetLatch was called will be forced to be committed to > memory before SetLatch does anything else. Adding that many barriers > might not be very good for performance but it seems OK from a > correctness point of view, unless I am missing something, which is > definitely possible. I'd appreciate any thoughts you have on this, as > this is clearly subtle and tricky to get exactly right. > ... placing another barrier in "SetLatch" could just do the trick. We will apply our tools to actually prove this and come back with the conclusive answer. Best, Michael pgpYu5lX5GiTq.pgp Description: PGP signature
[HACKERS] how can i see the log..?
i want see the log of INSERT, SELECT, DELETE... within postgresql v9.1.3 i'm so sorry, my english level... how can i see that. transaction log?... or other thing...? -- View this message in context: http://postgresql.1045698.n5.nabble.com/how-can-i-see-the-log-tp5583083p5583083.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On 20 mar 2012, at 13.08, Heikki Linnakangas wrote: > On 20.03.2012 11:10, Claes Jakobsson wrote: >> >> Personally I'd love a type 2 JDBC driver for PostgreSQL. > > Why? listen/notify over SSL for example unless that's been fixed in the JDBC driver recently. And I'm sure there are other things in libpq that would be nice to have. As mainly a Perl dude which uses libpq via DBD::Pg I find it odd that the Java people doesn't do the same instead of reimplementing everything. Cheers, /Claes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error trying to compile a simple C trigger
Il giorno mar, 20/03/2012 alle 11.16 +, Peter Geoghegan ha scritto: > On 20 March 2012 10:53, Marco Nenciarini > wrote: > > alert.c: In function ‘dbms_alert_defered_signal’: > > alert.c:839:33: error: dereferencing pointer to incomplete type > > make: *** [alert.o] Error 1 > > > > I've also tried the example at > > > > http://www.postgresql.org/docs/devel/static/trigger-example.html > > > > and the result is exactly the same. > > > > trigtest.c: In function ‘trigf’: > > trigtest.c:44:36: error: dereferencing pointer to incomplete type > > make: *** [trigtest.o] Error 1 > > I'd say this is an unintended consequence of a pgrminclude run. Try adding > this: > > #include "access/tupdesc.h" It doesn't work. The error is stil the same. Regards, Marco -- Marco Nenciarini - System manager @ Devise.IT marco.nenciar...@devise.it | http://www.devise.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On 19 mar 2012, at 22.45, Merlin Moncure wrote: > one pl/java based project that IMO is just screaming to be done is a > pl/java based FDW (wrapping JDBC) that would approximately reproduce > dblink...maybe with some extra features geared towards ETL type jobs > like a row based callback for transformations in java. Personally I'd love a type 2 JDBC driver for PostgreSQL. It should be feasible as a summer project. It's somewhere deep down on my TODO list so I'd be happy to help. Cheers, Claes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
Hello, Here is something we'd like to have: http://archives.postgresql.org/pgsql-hackers/2012-01/msg00650.php As we are quite busy and this issue hasn't a high priority, we haven't followed it until now :-( I'm only a Postgres user, not a hacker, so I don't have the knowledge to help on this nor to evaluate if this is might be a good Gssoc project. Just an idea for the case you are looking for another topic. best regards, Marc Mamin From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Qi Huang Sent: Samstag, 24. März 2012 05:20 To: cbbro...@gmail.com; kevin.gritt...@wicourts.gov Cc: pgsql-hackers@postgresql.org; and...@anarazel.de; alvhe...@commandprompt.com; neil.con...@gmail.com; dan...@heroku.com; j...@agliodbs.com Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema > Date: Thu, 22 Mar 2012 13:17:01 -0400 > Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema > From: cbbro...@gmail.com > To: kevin.gritt...@wicourts.gov > CC: pgsql-hackers@postgresql.org > > On Thu, Mar 22, 2012 at 12:38 PM, Kevin Grittner > wrote: > > Tom Lane wrote: > >> Robert Haas writes: > >>> Well, the standard syntax apparently aims to reduce the number of > >>> returned rows, which ORDER BY does not. Maybe you could do it > >>> with ORDER BY .. LIMIT, but the idea here I think is that we'd > >>> like to sample the table without reading all of it first, so that > >>> seems to miss the point. > >> > >> I think actually the traditional locution is more like > >> WHERE random() < constant > >> where the constant is the fraction of the table you want. And > >> yeah, the presumption is that you'd like it to not actually read > >> every row. (Though unless the sampling density is quite a bit > >> less than 1 row per page, it's not clear how much you're really > >> going to win.) > > > > It's all going to depend on the use cases, which I don't think I've > > heard described very well yet. > > > > I've had to pick random rows from, for example, a table of > > disbursements to support a financial audit. In those cases it has > > been the sample size that mattered, and order didn't. One > > interesting twist there is that for some of these financial audits > > they wanted the probability of a row being selected to be > > proportional to the dollar amount of the disbursement. I don't > > think you can do this without a first pass across the whole data > > set. > > This one was commonly called "Dollar Unit Sampling," though the > terminology has gradually gotten internationalized. > http://www.dummies.com/how-to/content/how-does-monetary-unit-sampling-work.html > > What the article doesn't mention is that some particularly large items > might wind up covering multiple samples. In the example, they're > looking for a sample every $3125 down the list. If there was a single > transaction valued at $3, that (roughly) covers 10 of the desired > samples. > > It isn't possible to do this without scanning across the entire table. > > If you want repeatability, you probably want to instantiate a copy of > enough information to indicate the ordering chosen. That's probably > something that needs to be captured as part of the work of the audit, > so not only does it need to involve a pass across the data, it > probably requires capturing a fair bit of data for posterity. > -- > When confronted by a difficult problem, solve it by reducing it to the > question, "How would the Lone Ranger handle this?" The discussion till now has gone far beyond my understanding. Could anyone explain briefly what is the idea for now? The designing detail for me is still unfamiliar. I can only take time to understand while possible after being selected and put time on it to read relevant material. For now, I'm still curious why Neil's implementation is no longer working? The Postgres has been patched a lot, but the general idea behind Neil's implementation should still work, isn't it? Besides, whether this query is needed is still not decided . Seems this is another hard to decide point. Is it that this topic is still not so prepared for the Gsoc yet? If really so, I think I still have time to switch to other topics. Any suggestion? Thanks. Best Regards and Thanks Huang Qi Victor Computer Science of National University of Singapore
Re: [HACKERS] foreign key locks, 2nd attempt
On Sat, Mar 17, 2012 at 10:45 PM, Alvaro Herrera wrote: > Here is v11. This version is mainly updated to add pg_upgrade support, > as discussed. It also contains the README file that was posted earlier > (plus wording fixes per Bruce), a couple of bug fixes, and some comment > updates. The main thing we're waiting on are the performance tests to confirm the lack of regression. You are working on that, right? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers