Re: [HACKERS] Review: query result history in psql
I just applied the patch to a clean branch from the latest master. I couldn't get a segfault from using the new feature. Could you provide a little more info to reproduce the segfault? Thanks On Thu, Jun 27, 2013 at 11:28 PM, Pavel Stehule wrote: > Hello > > after patching I god segfault > > Program terminated with signal 11, Segmentation fault. > #0 0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98 > 98 for (p = prompt_string; > Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686 > ncurses-libs-5.7-9.20100703.fc14.i686 readline-6.1-2.fc14.i386 > (gdb) bt > #0 0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98 > #1 0x0805786a in MainLoop (source=0xc45440) at mainloop.c:134 > #2 0x0805a68d in main (argc=2, argv=0xbfcf2894) at startup.c:336 > > Regards > > Pavel Stehule > > 2013/6/28 ian link : > >> It's better to post a review as a reply to the message which contains > >> the patch. > > > > Sorry about that, I did not have the email in my inbox and couldn't > figure > > out how to use the old message ID to send a reply. Here is the thread: > > > http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com > > > >> The 'EscapeForCopy' was meant to mean 'Escape string in a format require > >> by the COPY TEXT format', so 'copy' in the name refers to the escaping > >> format, not the action performed by the function. > > > > > > I see, that makes sense now. Keep it as you see fit, it's not a big deal > in > > my opinion. > > > >> Some mathematical toolkits, like Matlab or Mathematica, automatically > set > >> a variable called 'ans' (short for "answer") containing the result of > the > >> last operation. I was trying to emulate exactly this behaviour. > > > > > > I've actually been using Matlab lately, which must be why the name made > > sense to me intuitively. I don't know if this is the best name, however. > It > > kind of assumes that our users use Matlab/Octave/Mathematica. Maybe > 'qhist' > > or 'hist' or something? > > > >> The history is not erased. The history is always stored in the client's > >> memory. > > > > Ah, I did not pick up on that. Thank you for explaining it! That's > actually > > a very neat way of doing it. Sorry I did not realize that at first. > > > >> I was considering such a behaviour. But since the feature is turned off > by > >> default, I decided that whoever is using it, is aware of cost. Instead > of > >> truncating the history automatically (which could lead to a nasty > surprise), > >> I decided to equip the user with \ansclean , a command erasing the > history. > >> I believe that it is better to let the user decide when history should > be > >> erased, instead of doing it automatically. > > > > > > I think you are correct. However, if we turn on the feature by default > (at > > some point in the future) the discussion should probably be re-visited. > > > >> This is my first submitted patch, so I can't really comment on the > >> process. But if you could add the author's email to CC, the message > would be > >> much easier to spot. I replied after two days only because I missed the > >> message in the flood of other pgsql-hacker messages. I think I need to > scan > >> the list more carefully... > > > > My fault, I definitely should have CC'd you. > > > > As for the patch, I made a new version of the latest one you provided in > the > > original thread. Let me know if anything breaks, but it compiles fine on > my > > box. Thanks for the feedback! > > > > Ian > > > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > >
Re: [HACKERS] Review: query result history in psql
Hello after patching I god segfault Program terminated with signal 11, Segmentation fault. #0 0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98 98 for (p = prompt_string; Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686 ncurses-libs-5.7-9.20100703.fc14.i686 readline-6.1-2.fc14.i386 (gdb) bt #0 0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98 #1 0x0805786a in MainLoop (source=0xc45440) at mainloop.c:134 #2 0x0805a68d in main (argc=2, argv=0xbfcf2894) at startup.c:336 Regards Pavel Stehule 2013/6/28 ian link : >> It's better to post a review as a reply to the message which contains >> the patch. > > Sorry about that, I did not have the email in my inbox and couldn't figure > out how to use the old message ID to send a reply. Here is the thread: > http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com > >> The 'EscapeForCopy' was meant to mean 'Escape string in a format require >> by the COPY TEXT format', so 'copy' in the name refers to the escaping >> format, not the action performed by the function. > > > I see, that makes sense now. Keep it as you see fit, it's not a big deal in > my opinion. > >> Some mathematical toolkits, like Matlab or Mathematica, automatically set >> a variable called 'ans' (short for "answer") containing the result of the >> last operation. I was trying to emulate exactly this behaviour. > > > I've actually been using Matlab lately, which must be why the name made > sense to me intuitively. I don't know if this is the best name, however. It > kind of assumes that our users use Matlab/Octave/Mathematica. Maybe 'qhist' > or 'hist' or something? > >> The history is not erased. The history is always stored in the client's >> memory. > > Ah, I did not pick up on that. Thank you for explaining it! That's actually > a very neat way of doing it. Sorry I did not realize that at first. > >> I was considering such a behaviour. But since the feature is turned off by >> default, I decided that whoever is using it, is aware of cost. Instead of >> truncating the history automatically (which could lead to a nasty surprise), >> I decided to equip the user with \ansclean , a command erasing the history. >> I believe that it is better to let the user decide when history should be >> erased, instead of doing it automatically. > > > I think you are correct. However, if we turn on the feature by default (at > some point in the future) the discussion should probably be re-visited. > >> This is my first submitted patch, so I can't really comment on the >> process. But if you could add the author's email to CC, the message would be >> much easier to spot. I replied after two days only because I missed the >> message in the flood of other pgsql-hacker messages. I think I need to scan >> the list more carefully... > > My fault, I definitely should have CC'd you. > > As for the patch, I made a new version of the latest one you provided in the > original thread. Let me know if anything breaks, but it compiles fine on my > box. Thanks for the feedback! > > Ian > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: query result history in psql
> > It's better to post a review as a reply to the message which contains > the patch. Sorry about that, I did not have the email in my inbox and couldn't figure out how to use the old message ID to send a reply. Here is the thread: http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com The 'EscapeForCopy' was meant to mean 'Escape string in a format require by > the COPY TEXT format', so 'copy' in the name refers to the escaping format, > not the action performed by the function. > I see, that makes sense now. Keep it as you see fit, it's not a big deal in my opinion. Some mathematical toolkits, like Matlab or Mathematica, automatically set > a variable called 'ans' (short for "answer") containing the result of the > last operation. I was trying to emulate exactly this behaviour. I've actually been using Matlab lately, which must be why the name made sense to me intuitively. I don't know if this is the best name, however. It kind of assumes that our users use Matlab/Octave/Mathematica. Maybe 'qhist' or 'hist' or something? The history is not erased. The history is always stored in the client's > memory. Ah, I did not pick up on that. Thank you for explaining it! That's actually a very neat way of doing it. Sorry I did not realize that at first. I was considering such a behaviour. But since the feature is turned off by > default, I decided that whoever is using it, is aware of cost. Instead of > truncating the history automatically (which could lead to a nasty > surprise), I decided to equip the user with \ansclean , a command erasing > the history. I believe that it is better to let the user decide when > history should be erased, instead of doing it automatically. I think you are correct. However, if we turn on the feature by default (at some point in the future) the discussion should probably be re-visited. This is my first submitted patch, so I can't really comment on the > process. But if you could add the author's email to CC, the message would > be much easier to spot. I replied after two days only because I missed the > message in the flood of other pgsql-hacker messages. I think I need to scan > the list more carefully... My fault, I definitely should have CC'd you. As for the patch, I made a new version of the latest one you provided in the original thread. Let me know if anything breaks, but it compiles fine on my box. Thanks for the feedback! Ian query-history-review1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: enable new error fields in plpgsql (9.4)
Hello 2013/6/28 Noah Misch : > On Tue, Jun 25, 2013 at 03:56:27PM +0200, Pavel Stehule wrote: >> cleaned patch is in attachment > > Of the five options you're adding to GET STACKED DIAGNOSTICS, four of them > appear in the SQL standard. DATATYPE_NAME does not; I think we should call it > PG_DATATYPE_NAME in line with our other extensions in this area. > yes, but It should be fixed in 9.3 enhanced fields too - it should be consistent with PostgreSQL fields > >> >> 2013/2/1 Pavel Stehule : >> >> > 2013/2/1 Peter Eisentraut : >> >> >> On 2/1/13 8:00 AM, Pavel Stehule wrote: >> >> >>> 2013/2/1 Marko Tiikkaja : >> >> Is there a compelling reason why we wouldn't provide these already in >> >> 9.3? >> >> >>> >> >> >>> a time for assign to last commitfest is out. >> >> >>> >> >> >>> this patch is relative simple and really close to enhanced error >> >> >>> fields feature - but depends if some from commiters will have a time >> >> >>> for commit to 9.3 - so I am expecting primary target 9.4, but I am not >> >> >>> be angry if it will be commited early. >> >> >> >> >> >> If we don't have access to those fields on PL/pgSQL, what was the point >> >> >> of the patch to begin with? Surely, accessing them from C wasn't the >> >> >> main use case? >> >> >> >> >> > >> >> > These fields are available for application developers now. But is a >> >> > true, so without this patch, GET STACKED DIAGNOSTICS statement will >> >> > not be fully consistent, because some fields are accessible and others >> >> > not > > I am inclined to back-patch this to 9.3. The patch is fairly mechanical, and > I think the risk of introducing bugs is less than the risk that folks will be > confused by these new-in-9.3 error fields being accessible from libpq and the > protocol, yet inaccessible from PL/pgSQL. > +1 > > The existing protocol documentation says things like this: > > Table name: if the error was associated with a specific table, the > name of the table. (When this field is present, the schema name field > provides the name of the table's schema.) > > The way you have defined RAISE does not enforce this; the user can set > TABLE_NAME without setting SCHEMA_NAME at all. I see a few options here: > > 1. Change RAISE to check the invariants thoroughly. For example, TABLE_NAME > would require SCHEMA name, and the pair would need to name an actual table. > > 2. Change RAISE to check the invariants simply. For example, it could check > that SCHEMA_NAME is present whenever TABLE_NAME is present but provide no > validation that the pair names an actual table. (I think the protocol > language basically allows this, though a brief note wouldn't hurt.) > > 3. Tweak the protocol documentation to clearly permit what the patch has RAISE > allow, namely the free-form use of these fields. This part of the protocol is > new in 9.3, so it won't be a big deal to change it. The libpq documentation > has similar verbiage to update. > > I suppose I prefer #3. It seems fair for user code to employ these fields for > applications slightly broader than their core use, like a constraint name that > represents some userspace notion of a constraint rather than normal, cataloged > constraint. I can also imagine an application like replaying logs from > another server, recreating the messages as that server emitted them; #2 or #3 > would suffice for that. > I like #3 too. These fields should be used in custom code freely - and I don't would create some limits. Developer can use it for application code how he likes. It was designed for this purpose. > Looking beyond the immediate topic of PL/pgSQL, it seems fair for a FDW to use > these error fields to name remote objects not known locally. Suppose a > foreign INSERT fires a remote trigger, and that trigger violates a constraint > of some other remote table. Naming the remote objects would be a reasonable > design choice. postgres_fdw might have chosen to just copy fields from the > remote error (it does not do this today for the fields in question, though). > The FDW might not even have a notion of a schema, at which point it would > legitimately choose to leave that field unpopulated. Once we allow any part > of the system to generate such errors, we should let PL/pgSQL do the same. +1 Regards Pavel > > > Thoughts on that plan? > > -- > Noah Misch > EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila wrote: > On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote: >> Hi, >> >> > >> >> So our proposal on this problem is that we must ensure that master >> should >> > not make any file system level changes without confirming that the >> >> corresponding WAL record is replicated to the standby. >> > >> > How will you take care of extra WAL on old master during recovery. >> If it >> > plays the WAL which has not reached new-master, it can be a problem. >> > >> >> I am trying to understand how there would be extra WAL on old master >> that it would replay and cause inconsistency. Consider how I am >> picturing it and correct me if I am wrong. >> >> 1) Master crashes. So a failback standby becomes new master forking the >> WAL. >> 2) Old master is restarted as a standby (now with this patch, without >> a new base backup). >> 3) It would try to replay all the WAL it has available and later >> connect to the new master also following the timeline switch (the >> switch might happen using archived WAL and timeline history file OR >> the new switch-over-streaming-replication-connection as of 9.3, >> right?) >> >> * in (3), when the new standby/old master is replaying WAL, from where >> is it picking the WAL? >Yes, this is the point which can lead to inconsistency, new standby/old > master >will replay WAL after the last successful checkpoint, for which he get > info from >control file. It is picking WAL from the location where it was logged when > it was active (pg_xlog). > >> Does it first replay all the WAL in pg_xlog >> before archive? Should we make it check for a timeline history file in >> archive before it starts replaying any WAL? > > I have really not thought what is best solution for problem. > >> * And, would the new master, before forking the WAL, replay all the >> WAL that is necessary to come to state (of data directory) that the >> old master was just before it crashed? > > I don't think new master has any correlation with old master's data directory, > Rather it will replay the WAL it has received/flushed before start acting as > master. when old master fail over, WAL which ahead of new master might be broken data. so that when user want to dump from old master, there is possible to fail dump. it is just idea, we extend parameter which is used in recovery.conf like 'follow_master_force'. this parameter accepts 'on' and 'off', is effective only when standby_mode is set to on. if both parameters 'follow_master_force' and 'standby_mode' is set to 'on', 1. when standby server starts and starts to recovery, standby server skip to apply WAL which is in pg_xlog, and request WAL from latest checkpoint LSN to master server. 2. master server receives LSN which is standby server latest checkpoint, and compare between LSN of standby and LSN of master latest checkpoint. if those LSN match, master will send WAL from latest checkpoint LSN. if not, master will inform standby that failed. 3. standby will fork WAL, and apply WAL which is sent from master continuity. in this approach, user who want to dump from old master will set 'off' to follow_master_force and standby_mode, and gets the dump of old master after master started. OTOH, user who want to starts replication force will set 'on' to both parameter. please give me feedback. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On Thursday, June 27, 2013 5:54 PM Robert Haas wrote: > On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila > wrote: > > Configuration Details > > O/S - Suse-11 > > RAM - 128GB > > Number of Cores - 16 > > Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min, > > synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off > Pgbench - > > Select-only Scalefactor - 1200 Time - 30 mins > > > > 8C-8T16C-16T32C-32T64C- > 64T > > Head 62403101810 99516 94707 > > Patch 62827101404 99109 94744 > > > > On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB > shared > > buffers, this is no major difference. > > One of the reasons could be that there is no much swapping in shared > buffers > > as most data already fits in shared buffers. > > I'd like to just back up a minute here and talk about the broader > picture here. What are we trying to accomplish with this patch? Last > year, I did some benchmarking on a big IBM POWER7 machine (16 cores, > 64 hardware threads). Here are the results: > > http://rhaas.blogspot.com/2012/03/performance-and-scalability-on- > ibm.html > > Now, if you look at these results, you see something interesting. > When there aren't too many concurrent connections, the higher scale > factors are only modestly slower than the lower scale factors. But as > the number of connections increases, the performance continues to rise > at the lower scale factors, and at the higher scale factors, this > performance stops rising and in fact drops off. So in other words, > there's no huge *performance* problem for a working set larger than > shared_buffers, but there is a huge *scalability* problem. Now why is > that? > > As far as I can tell, the answer is that we've got a scalability > problem around BufFreelistLock. Contention on the buffer mapping > locks may also be a problem, but all of my previous benchmarking (with > LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant > in the room. My interest in having the background writer add buffers > to the free list is basically around solving that problem. It's a > pretty dramatic problem, as the graph above shows, and this patch > doesn't solve it. There may be corner cases where this patch improves > things (or, equally, makes them worse) but as a general point, the > difficulty I've had reproducing your test results and the specificity > of your instructions for reproducing them suggests to me that what we > have here is not a clear improvement on general workloads. Yet such > an improvement should exist, because there are other products in the > world that have scalable buffer managers; we currently don't. Instead > of spending a lot of time trying to figure out whether there's a small > win in narrow cases here (and there may well be), I think we should > back up and ask why this isn't a great big win, and what we'd need to > do to *get* a great big win. I don't see much point in tinkering > around the edges here if things are broken in the middle; things that > seem like small wins or losses now may turn out otherwise in the face > of a more comprehensive solution. > > One thing that occurred to me while writing this note is that the > background writer doesn't have any compelling reason to run on a > read-only workload. It will still run at a certain minimum rate, so > that it cycles the buffer pool every 2 minutes, if I remember > correctly. But it won't run anywhere near fast enough to keep up with > the buffer allocation demands of 8, or 32, or 64 sessions all reading > data not all of which is in shared_buffers at top speed. In fact, > we've had reports that the background writer isn't too effective even > on read-write workloads. The point is - if the background writer > isn't waking up and running frequently enough, what it does when it > does wake up isn't going to matter very much. I think we need to > spend some energy poking at that. Currently it wakes up based on bgwriterdelay config parameter which is by default 200ms, so you means we should think of waking up bgwriter based on allocations and number of elements left in freelist? As per my understanding Summarization of points raised by you and Andres which this patch should address to have a bigger win: 1. Bgwriter needs to be improved so that it can help in reducing usage count and finding next victim buffer (run the clock sweep and add buffers to the free list). 2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are less. 3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer (a spinlock for the freelist, and an lwlock for the clock sweep). 4. Separate processes for writing dirty buffers and moving buffers to freelist 5. Bgwriter needs to be more aggressive, logic based on which it calculates how many buffers it needs to process needs to be improved. 6. Th
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
Andres Freund wrote: > Hm. There were some issues with the test_logical_decoding > Makefile not cleaning up the regression installation properly. > Which might have caused the issue. > > Could you try after applying the patches and executing a clean > and then rebuild? Tried, and problem persists. > Otherwise, could you try applying my git tree so we are sure we > test the same thing? > > $ git remote add af > git://git.postgresql.org/git/users/andresfreund/postgres.git > $ git fetch af > $ git checkout -b xlog-decoding af/xlog-decoding-rebasing-cf4 > $ ./configure ... > $ make Tried that, too, and problem persists. The log shows the last commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb. Because you mention possible problems with the regression test cleanup for test_logical_decoding I also tried: rm -fr contrib/test_logical_decoding/ git reset --hard HEAD make world make check-world I get the same failure, with primary key or unique index column showing as 0 in results. I am off on vacation tomorrow and next week. Will dig into this with gdb if not solved when I get back -- unless you have a better suggestion for how to figure it out. Once this is solved, I will be working with testing the final result of all these layers, including creating a second output plugin. I want to confirm that multiple plugins play well together. I'm glad to see other eyes also on this patch set. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MVCC catalog access
Robert Haas escribió: > All right, so here's a patch that does something along those lines. > We have to always take a new snapshot when somebody scans a catalog > that has no syscache, because there won't be any invalidation messages > to work off of in that case. The only catalog in that category that's > accessed during backend startup (which is mostly what your awful test > case is banging on) is pg_db_role_setting. We could add a syscache > for that catalog or somehow force invalidation messages to be sent > despite the lack of a syscache, but what I chose to do instead is > refactor things slightly so that we use the same snapshot for all four > scans of pg_db_role_setting, instead of taking a new one each time. I > think that's unimpeachable on correctness grounds; it's no different > than if we'd finished all four scans in the time it took us to finish > the first one, and then gotten put to sleep by the OS scheduler for as > long as it took us to scan the other three. Point being that there's > no interlock there. That seems perfectly acceptable to me, yeah. > The difference is 3-4%, which is quite a lot less than what you > measured before, although on different hardware, so results may vary. 3-4% on that synthetic benchmark sounds pretty acceptable to me, as well. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation/help for materialized and recursive views
Robert Haas wrote: > Magnus Hagander wrote: >>> The functionality of materialized views will (over time) totally swamp >>> that of normal views, so mixing all the corresponding documentation >>> with the documentation for normal views probably doesn’t make things >>> easier for people that are only interested in normal views. >> >> That's a better point I think. That said, it would be very useful if >> it actually showed up in "\h CREATE VIEW" in psql - I wonder if we >> should just add the syntax to that page, and then link said future >> information on a separate page somehow? > > IMHO, it's better to keep them separate; they are very different beasts. +1 Although I wonder whether we shouldn't cross-link those pages -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Thu, Jun 27, 2013 at 09:54:45PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > Should we be using entab -s 3? > > IIUC, that wouldn't affect leading whitespace at all. What it would > affect I think (outside of comment blocks) is whitespace between code > and a same-line /* ... */ comment. Personally I'd prefer that a > tab-stop-aligned /* ... */ comment always have a tab before it, even > if the expansion of the tab is only *one* space. That is, in > > foo(bar,/* comment */ > bar1, /* comment */ > bar12, /* comment */ > bar123, /* comment */ > baz); /* comment */ > > I think each of those comments ought to have a tab before it, not > space(s). pgindent currently does this inconsistently --- the bar123 > line will have a space instead. Moving to -s 3 would presumably make > this worse (even less consistent) not better, since now the bar12 line > would also be emitted with spaces not a tab. I think you have grasped the issue. :-) > Inside a comment, though, probably the behavior of -s 3 would be just > fine. So the real problem here IMO is that use of tabs ought to be > context sensitive (behavior inside a comment different from outside), > and I don't think entab can do that. I see though that it understands > about C quoted strings, so maybe we could teach it about comments too? Yes, that could be done. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
On Thu, Jun 27, 2013 at 6:18 PM, Tom Lane wrote: > Alvaro Herrera writes: >> I'm looking at the combined patches 0003-0005, which are essentially all >> about adding a function to obtain relation OID from (tablespace, >> filenode). It takes care to look through the relation mapper, and uses >> a new syscache underneath for performance. > >> One question about this patch, originally, was about the usage of >> that relfilenode syscache. It is questionable because it would be the >> only syscache to apply on top of a non-unique index. > > ... which, I assume, is on top of a pg_class index that doesn't exist > today. Exactly what is the argument that says performance of this > function is sufficiently critical to justify adding both the maintenance > overhead of a new pg_class index, *and* a broken-by-design syscache? > > Lose the cache and this probably gets a lot easier to justify. As is, > I think I'd vote to reject altogether. I already voted that way, and nothing's happened since to change my mind. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Bruce Momjian writes: > Should we be using entab -s 3? IIUC, that wouldn't affect leading whitespace at all. What it would affect I think (outside of comment blocks) is whitespace between code and a same-line /* ... */ comment. Personally I'd prefer that a tab-stop-aligned /* ... */ comment always have a tab before it, even if the expansion of the tab is only *one* space. That is, in foo(bar,/* comment */ bar1, /* comment */ bar12, /* comment */ bar123, /* comment */ baz); /* comment */ I think each of those comments ought to have a tab before it, not space(s). pgindent currently does this inconsistently --- the bar123 line will have a space instead. Moving to -s 3 would presumably make this worse (even less consistent) not better, since now the bar12 line would also be emitted with spaces not a tab. Inside a comment, though, probably the behavior of -s 3 would be just fine. So the real problem here IMO is that use of tabs ought to be context sensitive (behavior inside a comment different from outside), and I don't think entab can do that. I see though that it understands about C quoted strings, so maybe we could teach it about comments too? No idea whether astyle is any smarter about this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Thu, Jun 27, 2013 at 05:31:54PM -0400, Tom Lane wrote: > Alvaro Herrera writes: > > Noah Misch wrote: > >> Note that emacs and pgindent remain at odds over interior tabs in comments. > >> When pgindent finds a double-space (typically after a sentence) ending at a > >> tab stop, it replaces the double-space with a tab. c-fill-paragraph will > >> convert that tab to a *single* space, and that can be enough to change many > >> line break positions. > > > We should really stop pgindent from converting those double-spaces to > > tabs. Those tabs are later changed to three or four spaces when wording > > of the comment is changed, and things start looking very odd. > > +1. That's probably the single most annoying bit of behavior in pgindent. > Being a two-spaces-after-a-period kind of guy, it might bite me more > often than other people, but now that somebody else has brought it up... That might be caused by entab, actually. I think there are two cases here: * six-space indented line * two spaces in text that happen to land on a tab stop entab doesn't distinguish them, but it certainly could. In fact, it already has an option for that: -s Minimum spaces needed to replace with a tab (default = 2). Should we be using entab -s 3? That would make a six-space indent be a tab and two spaces, but it would certainly handle the period-space-space case. I actually don't remember people complaining about this. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
> The result of the current patch using lead Actually, I think I agree with you (and, FWIW, so does Oracle: http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#autoId18). I've refactored the window function's implementation so that (e.g.) lead(5) means the 5th non-null value away in front of the current row (the previous implementation was the last non-null value returned if the 5th rows in front was null). These semantics are slower, as the require the function to scan through the tuples discarding non-null ones. I've made the implementation use a bitmap in the partition context to cache whether or not a given tuple produces a null. This seems correct (it passes the regression tests) but as it stores row offsets (which are int64s) I was careful not to use bitmap methods that use ints to refer to set members. I've added more explanation in the code's comments. Thanks - lead-lag-ignore-nulls.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fixing pg_ctl with relative paths
On Fri, Jun 28, 2013 at 12:47 AM, Fujii Masao wrote: > > Another corner case is, for example, pg_ctl -D test1 -o "-D test2", > that is, multiple -D specifications appear in the command-line. The patch handles this case properly. Sorry for noise.. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On Thu, Jun 27, 2013 at 6:20 PM, Jeff Janes wrote: > On Wed, Jun 26, 2013 at 11:14 AM, Claudio Freire > wrote: >> >> >> Now I just have two indices. One that indexes only hot tuples, it's >> very heavily queried and works blazingly fast, and one that indexes by >> (hotness, key). I include the hotness value on the query, and still >> works quite fast enough. Luckily, I know things become cold after an >> update to mark them cold, so I can do that. I included hotness on the >> index to cluster updates on the hot part of the index, but I could >> have just used a regular index and paid a small price on the updates. >> >> Indeed, for a while it worked without the hotness, and there was no >> significant trouble. I later found out that WAL bandwidth was >> noticeably decreased when I added that hotness column, so I did, helps >> a bit with replication. Has worked ever since. > > > > I'm surprised that clustering updates into the hot part of the index, > without also clustering them it into a hot part of the table heap, works > well enough to make a difference. Does clustering in the table just come > naturally under your usage patterns? Yes, hotness is highly correlated to age, while still not 100%. So most updates hit the tail of the table, about a week or two worth of it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
On Sat, Jun 22, 2013 at 12:46 AM, Stephen Frost wrote: > Noah, > > * Noah Misch (n...@leadboat.com) wrote: > > This patch introduces MemoryContextAllocHuge() and repalloc_huge() that > check > > a higher MaxAllocHugeSize limit of SIZE_MAX/2. > > Nice! I've complained about this limit a few different times and just > never got around to addressing it. > > > This was made easier by tuplesort growth algorithm improvements in commit > > 8ae35e91807508872cabd3b0e8db35fc78e194ac. The problem has come up before > > (TODO item "Allow sorts to use more available memory"), and Tom floated > the > > idea[1] behind the approach I've used. The next limit faced by sorts is > > INT_MAX concurrent tuples in memory, which limits helpful work_mem to > about > > 150 GiB when sorting int4. > > That's frustratingly small. :( > I've added a ToDo item to remove that limit from sorts as well. I was going to add another item to make nodeHash.c use the new huge allocator, but after looking at it just now it was not clear to me that it even has such a limitation. nbatch is limited by MaxAllocSize, but nbuckets doesn't seem to be. Cheers, Jeff
Re: [HACKERS] MD5 aggregate
On 27 June 2013 17:47, Peter Eisentraut wrote: > On 6/27/13 4:19 AM, Dean Rasheed wrote: >> I'd say there are clearly people who want it, and the nature of some >> of those answers suggests to me that we ought to have a better answer >> in core. > > It's not clear what these people wanted this functionality for. They > all wanted to analyze a table to compare with another table (or the same > table later). Either, they wanted this to detect data changes, in which > case the right tool is a checksum, not a cryptographic hash. We already > have several checksum implementations in core, so we could expose on of > them. Or they wanted this to protect their data from tampering, in > which case the right tool is a cryptographic hash, but Noah argues that > a sum of MD5 hashes is not cryptographically sound. (And in any case, > we don't put cryptographic functionality into the core.) > > The reason md5_agg is proposed here and in all those cited posts is > presumably because the md5() function was already there anyway. The the > md5() function is there because the md5 code was already there anyway > because of the authentication. Let's not add higher-order > already-there-anyway code. ;-) > OK fair enough. It's looking like there are more people who don't want md5_agg() in core, or want something different, than who do want it. Also, if we're taking the view that the existing md5() function is only for hashing passwords, then it's probably not worth trying to optimise it. At this stage it's probably best to mark this as returned with feedback, and I'll consider the options for rewriting it, but not during this commitfest. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
Peter Eisentraut wrote: > On 6/27/13 10:20 AM, Robert Haas wrote: >> So I'd like to endorse Josh's idea: subject to appropriate review, >> let's add these test cases. Then, if it really turns out to be too >> burdensome, we can take them out, or figure out a sensible way to >> split the suite. Pushing all of Robins work into a secondary suite, >> or throwing it out altogether, both seem to me to be things which will >> not be to the long-term benefit of the project. > > I agree. If it gets too big, let's deal with it then. But let's not > make things more complicated because they *might* get too big later. +1 -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: enable new error fields in plpgsql (9.4)
On Tue, Jun 25, 2013 at 03:56:27PM +0200, Pavel Stehule wrote: > cleaned patch is in attachment Of the five options you're adding to GET STACKED DIAGNOSTICS, four of them appear in the SQL standard. DATATYPE_NAME does not; I think we should call it PG_DATATYPE_NAME in line with our other extensions in this area. > >> 2013/2/1 Pavel Stehule : > >> > 2013/2/1 Peter Eisentraut : > >> >> On 2/1/13 8:00 AM, Pavel Stehule wrote: > >> >>> 2013/2/1 Marko Tiikkaja : > >> Is there a compelling reason why we wouldn't provide these already in > >> 9.3? > >> >>> > >> >>> a time for assign to last commitfest is out. > >> >>> > >> >>> this patch is relative simple and really close to enhanced error > >> >>> fields feature - but depends if some from commiters will have a time > >> >>> for commit to 9.3 - so I am expecting primary target 9.4, but I am not > >> >>> be angry if it will be commited early. > >> >> > >> >> If we don't have access to those fields on PL/pgSQL, what was the point > >> >> of the patch to begin with? Surely, accessing them from C wasn't the > >> >> main use case? > >> >> > >> > > >> > These fields are available for application developers now. But is a > >> > true, so without this patch, GET STACKED DIAGNOSTICS statement will > >> > not be fully consistent, because some fields are accessible and others > >> > not I am inclined to back-patch this to 9.3. The patch is fairly mechanical, and I think the risk of introducing bugs is less than the risk that folks will be confused by these new-in-9.3 error fields being accessible from libpq and the protocol, yet inaccessible from PL/pgSQL. The existing protocol documentation says things like this: Table name: if the error was associated with a specific table, the name of the table. (When this field is present, the schema name field provides the name of the table's schema.) The way you have defined RAISE does not enforce this; the user can set TABLE_NAME without setting SCHEMA_NAME at all. I see a few options here: 1. Change RAISE to check the invariants thoroughly. For example, TABLE_NAME would require SCHEMA name, and the pair would need to name an actual table. 2. Change RAISE to check the invariants simply. For example, it could check that SCHEMA_NAME is present whenever TABLE_NAME is present but provide no validation that the pair names an actual table. (I think the protocol language basically allows this, though a brief note wouldn't hurt.) 3. Tweak the protocol documentation to clearly permit what the patch has RAISE allow, namely the free-form use of these fields. This part of the protocol is new in 9.3, so it won't be a big deal to change it. The libpq documentation has similar verbiage to update. I suppose I prefer #3. It seems fair for user code to employ these fields for applications slightly broader than their core use, like a constraint name that represents some userspace notion of a constraint rather than normal, cataloged constraint. I can also imagine an application like replaying logs from another server, recreating the messages as that server emitted them; #2 or #3 would suffice for that. Looking beyond the immediate topic of PL/pgSQL, it seems fair for a FDW to use these error fields to name remote objects not known locally. Suppose a foreign INSERT fires a remote trigger, and that trigger violates a constraint of some other remote table. Naming the remote objects would be a reasonable design choice. postgres_fdw might have chosen to just copy fields from the remote error (it does not do this today for the fields in question, though). The FDW might not even have a notion of a schema, at which point it would legitimately choose to leave that field unpopulated. Once we allow any part of the system to generate such errors, we should let PL/pgSQL do the same. Thoughts on that plan? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
On Thu, Jun 27, 2013 at 9:51 AM, Atri Sharma wrote: > > > > commit_delay exists to artificially increase the window in which the > > leader backend waits for more group commit followers. At higher client > > counts, that isn't terribly useful because you'll naturally have > > enough clients anyway, but at lower client counts particularly where > > fsyncs have high latency, it can help quite a bit. I mention this > > because clearly commit_delay is intended to trade off latency for > > throughput. Although having said that, when I worked on commit_delay, > > the average and worse-case latencies actually *improved* for the > > workload in question, which consisted of lots of small write > > transactions. Though I wouldn't be surprised if you could produce a > > reasonable case where latency was hurt a bit, but throughput improved. > Throughput and average latency are strictly reciprocal, aren't they? I think when people talk about improving latency, they must mean something like "improve 95% latency", not average latency. Otherwise, it doesn't seem to make much sense to me, they are the same thing. > > Thanks for your reply. > > The logic says that latency will be hit when commit_delay is applied, > but I am really interested in why we get an improvement instead. > There is a spot on the disk to which the current WAL is destined to go. That spot on the disk is not going to be under the write-head for (say) another 6 milliseconds. Without commit_delay, I try to commit my record, but find that someone else is already on the lock (and on the fsync as well). I have to wait for 6 milliseconds before that person gets their commit done and releases the lock, then I can start mine, and have to wait another 8 milliseconds (7500 rpm disk) for the spot to come around again, for a total of 14 milliseconds of latency. With commit_delay, I get my record in under the nose of the person who is already doing the delay, and they wake up and flush it for me in time to make the 6 millisecond cutoff. Total 6 milliseconds latency for me. One thing I tried a while ago (before the recent group-commit changes were made) was to record in shared memory when the last fsync finished, and then the next time someone needed to fsync, they would sleep until just before the write spot was predicted to be under the write head again (previous_finish + rotation_time - safety_margin, where rotation_time - safety_margin were represented by a single guc). It worked pretty well on the system in which I wrote it, but seemed too brittle to be a general solution. Another thing I tried was to drop the WALWriteLock after the WAL write finished but before calling fsync. The theory was that process 1 could write its WAL and then block on the fsync, and then process 2 could also write its WAL and also block directly on the fsync, and the kernel/disk controller would be smart enough to realize that it could merge the two pending fsync requests into one. This did not work at all, possibly because my disk controller was very cheap and not very smart. Cheers, Jeff
Re: [HACKERS] updated emacs configuration
Tom Lane wrote: > Andres Freund writes: > > > Being at least one of the persons having mentioned astyle to Alvaro, I > > had tested that once and I thought the results were resembling something > > reasonable after an hour of fiddling or so. But there were certain > > things that I could not be make it do during that. The only thing I > > remember now was reducing the indentation of parameters to the left if > > the line length got to long. Now, I personally think that's an > > anti-feature, but I am not sure if others think differently. > > I never particularly cared for that behavior either. It probably made > sense back in the video-terminal days, when your view of a program was > 80 columns period. I've never liked that either; I am a fan of keeping things to 80 columns, but when things get longer I prefer my editor to wrap them to the next line without the silly de-indent (or not wrap, if I tell it not to.) Another benefit of more modern tools is that there's no need for a typedef file, which is great when you're trying to indent after a patch which adds some more typedefs that are not listed in the file. > These days I think most people can use a wider > window at need --- not that I want to adopt wider lines as standard, but > the readability tradeoff between not having lines wrap versus messing up > the indentation seems like it's probably different now. Agreed. I would be sad if we adopted a policy of sloppiness on width. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
Alvaro Herrera writes: > I'm looking at the combined patches 0003-0005, which are essentially all > about adding a function to obtain relation OID from (tablespace, > filenode). It takes care to look through the relation mapper, and uses > a new syscache underneath for performance. > One question about this patch, originally, was about the usage of > that relfilenode syscache. It is questionable because it would be the > only syscache to apply on top of a non-unique index. ... which, I assume, is on top of a pg_class index that doesn't exist today. Exactly what is the argument that says performance of this function is sufficiently critical to justify adding both the maintenance overhead of a new pg_class index, *and* a broken-by-design syscache? Lose the cache and this probably gets a lot easier to justify. As is, I think I'd vote to reject altogether. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
Andres Freund wrote: > On 2013-06-27 17:33:04 -0400, Alvaro Herrera wrote: > > One question about this patch, originally, was about the usage of > > that relfilenode syscache. It is questionable because it would be the > > only syscache to apply on top of a non-unique index. It is said that > > this doesn't matter because the only non-unique values that can exist > > would reference entries that have relfilenode = 0; and in turn this > > doesn't matter because those values would be queried through the > > relation mapper anyway, not from the syscache. (This is implemented in > > the higher-level function.) > > Well, you can even query the syscache without hurt for mapped relations, > you just won't get an answer. The only thing you may not do because it > would yield multiple results is to query the syscache with > (tablespace, InvalidOid/0). Which is still not nice although it doesn't > make much sense to query with InvalidOid. Yeah, I agree that it doesn't make sense to query for that. The problem is that something could reasonably be developed that uses the syscache directly without checking whether the relfilenode is 0. > > (I would much prefer for there to be a way to define partial > > indexes in BKI.) > > I don't think that's the hard part, it's that we don't use the full > machinery for updating indexes but rather the relatively simplistic > CatalogUpdateIndexes(). I am not sure we can guarantee that the required > infrastructure is available in all the cases to support doing generic > predicate evaluation. You're right, CatalogIndexInsert() doesn't allow for predicates, so fixing BKI would not help. I still wonder about having a separate cache. Right now pg_class has two indexes; adding this new one would mean a rather large decrease in insert performance (50% more indexes to update than previously), which is not good considering that it's inserted into for each and every temp table creation -- that would become slower. This would be a net loss for every user, even those that don't want logical replication. On the other hand, table creation also has to add tuples to pg_attribute, pg_depend, pg_shdepend and maybe other catalogs, so perhaps the difference is negligible. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Andres Freund writes: > I think before doing any serious testing we would need to lay out how > many changes and what changes in formatting we would allow and what kind > of enforced formatting rules we think are required. Well, we certainly never applied any such analysis to pgindent. I personally don't mind leaving corner cases to the judgment of the tool author ... of course, what's a corner case and what's important may be in the eye of the beholder. But it's surely a bug, for instance, that pgindent is so clueless about function pointers. > Being at least one of the persons having mentioned astyle to Alvaro, I > had tested that once and I thought the results were resembling something > reasonable after an hour of fiddling or so. But there were certain > things that I could not be make it do during that. The only thing I > remember now was reducing the indentation of parameters to the left if > the line length got to long. Now, I personally think that's an > anti-feature, but I am not sure if others think differently. I never particularly cared for that behavior either. It probably made sense back in the video-terminal days, when your view of a program was 80 columns period. These days I think most people can use a wider window at need --- not that I want to adopt wider lines as standard, but the readability tradeoff between not having lines wrap versus messing up the indentation seems like it's probably different now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
Hi, On 2013-06-27 17:33:04 -0400, Alvaro Herrera wrote: > One question about this patch, originally, was about the usage of > that relfilenode syscache. It is questionable because it would be the > only syscache to apply on top of a non-unique index. It is said that > this doesn't matter because the only non-unique values that can exist > would reference entries that have relfilenode = 0; and in turn this > doesn't matter because those values would be queried through the > relation mapper anyway, not from the syscache. (This is implemented in > the higher-level function.) Well, you can even query the syscache without hurt for mapped relations, you just won't get an answer. The only thing you may not do because it would yield multiple results is to query the syscache with (tablespace, InvalidOid/0). Which is still not nice although it doesn't make much sense to query with InvalidOid. > I'm not sure about the placing of the new SQL-callable function in > dbsize.c either. It is certainly not a function that has anything to do > with object sizes. Not happy with that myself. I only placed the function there because pg_relation_filenode() already was in it. Happy to change if somebody has a good idea. > (I would much prefer for there to be a way to define partial > indexes in BKI.) I don't think that's the hard part, it's that we don't use the full machinery for updating indexes but rather the relatively simplistic CatalogUpdateIndexes(). I am not sure we can guarantee that the required infrastructure is available in all the cases to support doing generic predicate evaluation. Should bki really be the problem we probably could create the index after bki based bootstrapping finished. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On 06/27/2013 11:13 PM, Jeff Janes wrote: > Wouldn't any IO system being used on a high-end system be fairly good > about making this work through interleaved read-ahead algorithms? To some extent, certainly. It cannot possibly get better than a fully sequential load, though. > That sounds like it would be much more susceptible to lock contention, > and harder to get bug-free, than dividing into bigger chunks, like whole > 1 gig segments. Maybe, yes. Splitting a known amount of work into equal pieces sounds like a pretty easy parallelization strategy. In case you don't know the total amount of work or the size of each piece in advance, it gets a bit harder. Choosing chunks that turn out to be too big certainly hurts. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On 2013-06-27 17:31:54 -0400, Tom Lane wrote: > > Really, we should get out of patched BSD indent entirely already. How > > about astyle, for instance? I'm told that with some sensible options, > > the diff to what we have now is not very large, and several things > > actually become sensible (such as function pointer decls, which are > > messed up rather badly by pgindent) > > AFAIR, no one has ever done a serious comparison to anything except GNU > indent, and (at least at the time) it seemed to have bugs as bad as > pgindent's, just different ones. I'm certainly open to another choice > as long as we don't lose on portability of the tool. But who will do > the legwork to test something else? I think before doing any serious testing we would need to lay out how many changes and what changes in formatting we would allow and what kind of enforced formatting rules we think are required. Being at least one of the persons having mentioned astyle to Alvaro, I had tested that once and I thought the results were resembling something reasonable after an hour of fiddling or so. But there were certain things that I could not be make it do during that. The only thing I remember now was reducing the indentation of parameters to the left if the line length got to long. Now, I personally think that's an anti-feature, but I am not sure if others think differently. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On Thu, Jun 27, 2013 at 9:35 AM, Nicolas Barbier wrote: > > My reasoning was: To determine which index block to update (typically > one in both the partitioned and non-partitioned cases), one needs to > walk the index first, which supposedly causes one additional (read) > I/O in the non-partitioned case on average, because there is one extra > level and the lower part of the index is not cached (because of the > size of the index). But the "extra level" is up at the top where it is well cached, not at the bottom where it is not. Cheers, Jeff
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
I'm looking at the combined patches 0003-0005, which are essentially all about adding a function to obtain relation OID from (tablespace, filenode). It takes care to look through the relation mapper, and uses a new syscache underneath for performance. One question about this patch, originally, was about the usage of that relfilenode syscache. It is questionable because it would be the only syscache to apply on top of a non-unique index. It is said that this doesn't matter because the only non-unique values that can exist would reference entries that have relfilenode = 0; and in turn this doesn't matter because those values would be queried through the relation mapper anyway, not from the syscache. (This is implemented in the higher-level function.) This means that there would be one syscache that is damn easy to misuse .. and we've setup things so that syscaches are very easy to use in the first place. From that perspective, this doesn't look good. However, it's an easy mistake to notice and fix, so perhaps this is not a serious problem. (I would much prefer for there to be a way to define partial indexes in BKI.) I'm not sure about the placing of the new SQL-callable function in dbsize.c either. It is certainly not a function that has anything to do with object sizes. The insides of it would belong more in lsyscache.c, I think, except then that file does not otherwise concern itself with the relation mapper so its scope would have to expand a bit. But this is no place for the SQL-callable portion, so that would have to find a different home as well. The other option, of course, it to provide a separate caching layer for these objects altogether, but given how concise this implementation is, it doesn't sound too palatable. Thoughts? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Alvaro Herrera writes: > Noah Misch wrote: >> Note that emacs and pgindent remain at odds over interior tabs in comments. >> When pgindent finds a double-space (typically after a sentence) ending at a >> tab stop, it replaces the double-space with a tab. c-fill-paragraph will >> convert that tab to a *single* space, and that can be enough to change many >> line break positions. > We should really stop pgindent from converting those double-spaces to > tabs. Those tabs are later changed to three or four spaces when wording > of the comment is changed, and things start looking very odd. +1. That's probably the single most annoying bit of behavior in pgindent. Being a two-spaces-after-a-period kind of guy, it might bite me more often than other people, but now that somebody else has brought it up... > Really, we should get out of patched BSD indent entirely already. How > about astyle, for instance? I'm told that with some sensible options, > the diff to what we have now is not very large, and several things > actually become sensible (such as function pointer decls, which are > messed up rather badly by pgindent) AFAIR, no one has ever done a serious comparison to anything except GNU indent, and (at least at the time) it seemed to have bugs as bad as pgindent's, just different ones. I'm certainly open to another choice as long as we don't lose on portability of the tool. But who will do the legwork to test something else? Probably the principal argument against switching to a different tool has been that whitespace changes would complicate back-patching, but I don't see why we couldn't solve that by re-indenting all the live back branches at the time of the switch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On Thu, Jun 27, 2013 at 2:12 AM, Nicolas Barbier wrote: > 2013/6/26 Heikki Linnakangas : > > > On 26.06.2013 16:41, Yuri Levinsky wrote: > > > >> Heikki, > >> As far as I understand the height of the btree will affect the number of > >> I/Os necessary. The height of the tree does not increase linearly with > >> the number of records. > > > > Now let's compare that with a hash partitioned table, with 1000 > partitions > > and a b-tree index on every partition. [..] This is almost equivalent to > > just having a b-tree that's one level taller [..] There certainly isn't > > any difference in the number of actual I/O performed. > > Imagine that there are a lot of indexes, e.g., 50. Although a lookup > (walking one index) is equally fast, an insertion must update al 50 > indexes. When each index requires one extra I/O (because each index is > one level taller), that is 50 extra I/Os. Except for pathological conditions like indexing the longest values that can be indexed, a btree insertion rarely needs to split even the lowest internal page, much less all pages up to the root. ... > Additionally: Imagine that the data can be partitioned along some > column that makes sense for performance reasons (e.g., some “date” > where most accesses are concentrated on rows with more recent dates). > The other indexes will probably not have such a performance > distribution. Using those other indexes (both for look-ups and > updates) in the non-partitioned case, will therefore pull a huge > portion of each index into cache (because of the “random distribution” > of the non-date data). In the partitioned case, more cache can be > spent on the indexes that correspond to the “hot partitions.” > This could well be true for range partitioning on date. It is hard to see how this would work well for hash partitioning on the date, unless you carefully arrange for the number of hash partitions to be about the same as the number of distinct dates in the table. Cheers, Jeff
Re: [HACKERS] checking variadic "any" argument in parser - should be array
Hello 2013/6/27 Jeevan Chalke : > Hi Pavel, > > I had a look over your new patch and it looks good to me. > > My review comments on patch: > > 1. It cleanly applies with patch -p1 command. > 2. make/make install/make check were smooth. > 3. My own testing didn't find any issue. > 4. I had a code walk-through and I am little bit worried or confused on > following changes: > > if (PG_ARGISNULL(argidx)) > return NULL; > > ! /* > ! * Non-null argument had better be an array. The parser doesn't > ! * enforce this for VARIADIC ANY functions (maybe it should?), so > that > ! * check uses ereport not just elog. > ! */ > ! arr_typid = get_fn_expr_argtype(fcinfo->flinfo, argidx); > ! if (!OidIsValid(arr_typid)) > ! elog(ERROR, "could not determine data type of concat() > input"); > ! > ! if (!OidIsValid(get_element_type(arr_typid))) > ! ereport(ERROR, > ! (errcode(ERRCODE_DATATYPE_MISMATCH), > ! errmsg("VARIADIC argument must be an array"))); > > - /* OK, safe to fetch the array value */ > arr = PG_GETARG_ARRAYTYPE_P(argidx); > > /* > --- 3820,3828 > if (PG_ARGISNULL(argidx)) > return NULL; > > ! /* Non-null argument had better be an array */ > ! > Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo->flinfo, > argidx; > > arr = PG_GETARG_ARRAYTYPE_P(argidx); > > We have moved checking of array type in parser (ParseFuncOrColumn()) which > basically verifies that argument type is indeed an array. Which exactly same > as that of second if statement in earlier code, which you replaced by an > Assert. > > However, what about first if statement ? Is it NO more required now? What if > get_fn_expr_argtype() returns InvalidOid, don't you think we need to throw > an error saying "could not determine data type of concat() input"? yes, If I understand well to question, a main differences is in stage of checking. When I do a check in parser stage, then I can expect so "actual_arg_types" array holds a valid values. > > Well, I tried hard to trigger that code, but didn't able to get any test > which fails with that error in earlier version and with your version. And > thus I believe it is a dead code, which you removed ? Is it so ? It is removed in this version :), and it is not a bug, so there is not reason for patching previous versions. Probably there should be a Assert instead of error. This code is relatively mature - so I don't expect a issue from SQL level now. On second hand, this functions can be called via DirectFunctionCall API from custom C server side routines, and there a developer can does a bug simply if doesn't fill necessary structs well. So, there can be Asserts still. > > Moreover, if in any case get_fn_expr_argtype() returns an InvalidOid, we > will hit an Assert rather than an error, is this what you expect ? > in this moment yes, small change can helps with searching of source of possible issues. so instead on line Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo->flinfo, argidx; use two lines Assert(OidIsValid(get_fn_expr_argtype(fcinfo->flinfo, argidx))); Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo->flinfo, argidx; what you think? > 5. This patch has user visibility, i.e. now we are throwing an error when > user only says "VARIADIC NULL" like: > > select concat(variadic NULL) is NULL; > > Previously it was working but now we are throwing an error. Well we are now > more stricter than earlier with using VARIADIC + ANY, so I have no issue as > such. But I guess we need to document this user visibility change. I don't > know exactly where though. I searched for VARIADIC and all related > documentation says it needs an array, so nothing harmful as such, so you can > ignore this review comment but I thought it worth mentioning it. yes, it is point for possible issues in RELEASE NOTES, I am thinking ??? Regards Pavel > > Thanks > > > > On Thu, Jun 27, 2013 at 12:35 AM, Pavel Stehule > wrote: >> >> Hello >> >> remastered version >> >> Regards >> >> Pavel >> >> 2013/6/26 Jeevan Chalke : >> > Hi Pavel >> > >> > >> > On Sat, Jan 26, 2013 at 9:22 AM, Pavel Stehule >> > wrote: >> >> >> >> Hello Tom >> >> >> >> you did comment >> >> >> >> ! <><--><--> * Non-null argument had better be an array. >> >> The parser doesn't >> >> ! <><--><--> * enforce this for VARIADIC ANY functions >> >> (maybe it should?), so >> >> ! <><--><--> * that check uses ereport not just elog. >> >> ! <><--><--> */ >> >> >> >> So I moved this check to parser. >> >> >> >> It is little bit stricter - requests typed NULL instead unknown NULL, >> >> but it can mark error better and early >> > >> > >> > Tom suggested that this check should be better done by parser. >> > This patch tries to accomplish t
Re: [HACKERS] Hash partitioning.
On Wed, Jun 26, 2013 at 11:14 AM, Claudio Freire wrote: > > Now I just have two indices. One that indexes only hot tuples, it's > very heavily queried and works blazingly fast, and one that indexes by > (hotness, key). I include the hotness value on the query, and still > works quite fast enough. Luckily, I know things become cold after an > update to mark them cold, so I can do that. I included hotness on the > index to cluster updates on the hot part of the index, but I could > have just used a regular index and paid a small price on the updates. > Indeed, for a while it worked without the hotness, and there was no > significant trouble. I later found out that WAL bandwidth was > noticeably decreased when I added that hotness column, so I did, helps > a bit with replication. Has worked ever since. > I'm surprised that clustering updates into the hot part of the index, without also clustering them it into a hot part of the table heap, works well enough to make a difference. Does clustering in the table just come naturally under your usage patterns? Cheers, Jeff
Re: [HACKERS] Hash partitioning.
On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner wrote: > On 06/26/2013 05:46 PM, Heikki Linnakangas wrote: > > We could also allow a large query to search a single table in parallel. > > A seqscan would be easy to divide into N equally-sized parts that can be > > scanned in parallel. It's more difficult for index scans, but even then > > it might be possible at least in some limited cases. > > So far reading sequentially is still faster than hopping between > different locations. Purely from the I/O perspective, that is. > Wouldn't any IO system being used on a high-end system be fairly good about making this work through interleaved read-ahead algorithms? Also, hopefully the planner would be able to predict when parallelization has nothing to add and avoid using it, although surely that is easier said than done. > > For queries where the single CPU core turns into a bottle-neck and which > we want to parallelize, we should ideally still do a normal, fully > sequential scan and only fan out after the scan and distribute the > incoming pages (or even tuples) to the multiple cores to process. > That sounds like it would be much more susceptible to lock contention, and harder to get bug-free, than dividing into bigger chunks, like whole 1 gig segments. Fanning out line by line (according to line_number % number_processes) was my favorite parallelization method in Perl, but those files were read only and so had no concurrency issues. Cheers, Jeff
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 06/27/2013 11:11 AM, Tom Lane wrote: > AFAICS, the threshold question here is whether the patch helps usefully > for tables with typical numbers of columns (ie, not 800 ;-)), and I wouldn't expect it to help at all for < 50 columns, and probably not measurably below 200. > doesn't hurt materially in any common scenario. If it does, I think Yeah, I think that's be bigger question. Ok, I'll start working on a new test case. Here's my thinking on performance tests: 1. run pgbench 10 times both with and without the patch. See if there's any measurable difference. There should not be. 2. Run with/without comparisons for the following scenarios: Each table would have a SERIAL pk, a timestamp, and: Chains of booleans: # attributesNULL probability 16 0% 50% 90% 128 0% 50% 90% 512 0% 50% 90% Chains of text: 16 0% 50% 90% 256 0% 50% 90% ... for 100,000 rows each. Comparisons would include: 1) table size before and after testing 2) time required to read 1000 rows, by key 3) time required to read rows with each of 100 random column positions = some value 4) time required to insert 1000 rows 5) time required to update 1000 rows Geez, I feel tired just thinking about it. Jamie, can your team do any of this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Josh Berkus wrote: > I wasn't thinking about doing it every year -- just for 9.3, in order to > encourage more reviewers, and encourage reviewers to do more reviews. - -1. It's not cool to set it up and then stop it the next go round. You want more reviewers? Start by streamlining the process as much as possible. I pretended I was new to the project and tried to figure out how to review something. The homepage has no mention of reviewers, not even if you drill down on some subpages. A Google search does lead one to: http://wiki.postgresql.org/wiki/Reviewing_a_Patch It has some good "you can do it" wordage. However, there is no clear path on how to actually start reviewing. There is this paragraph with two links in it: "The current commitfest is here[1] and has plenty of room for you to help. You can sign up to become a Round Robin Reviewer here[2]. Once you have, write a mail to the list introducing yourself." [1] Leads to the commitfest, with a nice summary, but no way for new people to know what to do. [2] This link is even worse (http://www.postgresql.org/list/pgsql-rrreviewers/) It's an archive list for pgsql-rrreviewers, with no way to subscribe and certainly no indication on it or the previous page that "sign up" means (one might guess) join the mailing list. Anyway, just food for thought as far as attracting new people. It should be much easier and more intuitive. As far as "rewarding" current reviewers, put the names in the release notes, after each item. Full stop. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201306271636 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAlHMoqIACgkQvJuQZxSWSsgCPACgovKYtxJV59Xro0MlxPDEHIy6 pmAAoOLOAlpO/dPlJbyHypdcY4ZxLCit =RwMh -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Thu, Jun 27, 2013 at 03:51:15PM -0400, Alvaro Herrera wrote: > Noah Misch wrote: > > > Note that emacs and pgindent remain at odds over interior tabs in comments. > > When pgindent finds a double-space (typically after a sentence) ending at a > > tab stop, it replaces the double-space with a tab. c-fill-paragraph will > > convert that tab to a *single* space, and that can be enough to change many > > line break positions. > > We should really stop pgindent from converting those double-spaces to > tabs. Those tabs are later changed to three or four spaces when wording > of the comment is changed, and things start looking very odd. > > Really, we should get out of patched BSD indent entirely already. How > about astyle, for instance? I'm told that with some sensible options, > the diff to what we have now is not very large, and several things > actually become sensible (such as function pointer decls, which are > messed up rather badly by pgindent) Sounds good to me. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ASYNC Privileges proposal
On 06/27/2013 02:49 AM, Chris Farmiloe wrote: > So I would think that if this was to go further then "channels" would need > to be more of a first class citizen and created explicitly, with CREATE > CHANNEL, DROP CHANNEL etc: > > CREATE CHANNEL channame; > GRANT LISTEN ON CHANNEL channame TO rolename; > GRANT NOTIFY ON CHANNEL channame TO rolename; > LISTEN channame; -- OK > NOTIFY channame, 'hi'; -- OK > LISTEN ; -- exception: no channel named "" > NOTIFY , 'hi'; -- exception: no channel named "" > > Personally I think explicitly creating channels would be beneficial; I have > hit issues where an typo in a channel name has caused a bug to go unnoticed > for a while. > But of course this would break backwards compatibility with the current > model (with implicit channel names) so unless we wanted to force everyone > to add "CREATE CHANNEL" statements during their upgrade then, maybe there > would need to be some kind of system to workaround this I agree, but that would make this a MUCH bigger patch than it is now. So the question is whether you're willing to go there. If nobody wants to work on that, I don't find it unreasonable to have some permissions on LISTEN/NOTIFY period. However, I'd like to see separate permissions on LISTEN and NOTIFY; I can easily imagine wanting to grant a user one without the other. Also, someone would need to do performance tests to see how much the permissions check affects performance. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Noah Misch wrote: > Note that emacs and pgindent remain at odds over interior tabs in comments. > When pgindent finds a double-space (typically after a sentence) ending at a > tab stop, it replaces the double-space with a tab. c-fill-paragraph will > convert that tab to a *single* space, and that can be enough to change many > line break positions. We should really stop pgindent from converting those double-spaces to tabs. Those tabs are later changed to three or four spaces when wording of the comment is changed, and things start looking very odd. Really, we should get out of patched BSD indent entirely already. How about astyle, for instance? I'm told that with some sensible options, the diff to what we have now is not very large, and several things actually become sensible (such as function pointer decls, which are messed up rather badly by pgindent) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On 06/27/2013 06:35 PM, Nicolas Barbier wrote: > I am assuming that this (comparatively very small and super-hot) index > is cached all the time, while for the other indexes (that are > supposedly super-huge) only the top part stays cached. > > I am mostly just trying to find out where Yuri’s “partitioning is > needed for super-huge tables” experience might come from, and noting > that Heikki’s argument might not be 100% valid. I think the OP made that clear by stating that his index has relatively low selectivity. That seems to be a case that Postgres doesn't handle very well. > I think that the > “PostgreSQL-answer” to this problem is to somehow cluster the data on > the “hotness column” (so that all hot rows are close to one another, > thereby improving the efficiency of the caching of relation blocks) + > partial indexes for the hot rows (as first mentioned by Claudio; to > improve the efficiency of the caching of index blocks). Agreed, sounds like a sane strategy. > My reasoning was: To determine which index block to update (typically > one in both the partitioned and non-partitioned cases), one needs to > walk the index first, which supposedly causes one additional (read) > I/O in the non-partitioned case on average, because there is one extra > level and the lower part of the index is not cached (because of the > size of the index). I think that pokes a hole in Heikki’s argument of > “it really doesn’t matter, partitioning == using one big table with > big non-partial indexes.” Heikki's argument holds for the general case, where you cannot assume a well defined hot partition. In that case, the lowest levels of all the b-trees of the partitions don't fit in the cache, either. A single index performs better in that case, because it has lower overhead. I take your point that in case you *can* define a hot partition and apply partitioning, the hot(test) index(es) are more likely to be cached and thus require less disk I/O. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
On Wed, Jun 26, 2013 at 03:48:23PM -0700, Jeff Janes wrote: > On Mon, May 13, 2013 at 7:26 AM, Noah Misch wrote: > > This patch introduces MemoryContextAllocHuge() and repalloc_huge() that > > check > > a higher MaxAllocHugeSize limit of SIZE_MAX/2. Chunks don't bother > > recording > > whether they were allocated as huge; one can start with palloc() and then > > repalloc_huge() to grow the value. > > > Since it doesn't record the size, I assume the non-use as a varlena is > enforced only by coder discipline and not by the system? We will rely on coder discipline, yes. The allocator actually does record a size. I was referring to the fact that it can't distinguish the result of repalloc(p, 7) from the result of repalloc_huge(p, 7). > What is likely to happen if I accidentally let a pointer to huge memory > escape to someone who then passes it to varlena constructor without me > knowing it? (I tried sabotaging the code to make this happen, but I could > not figure out how to). Is there a place we can put an Assert to catch > this mistake under enable-cassert builds? Passing a too-large value gives a modulo effect. We could inject an AssertMacro() into SET_VARSIZE(). But it's a hot path, and I don't think this mistake is too likely. > The only danger I can think of is that it could sometimes make some sorts > slower, as using more memory than is necessary can sometimes slow down an > "external" sort (because the heap is then too big for the fastest CPU > cache). If you use more tapes, but not enough more to reduce the number of > passes needed, then you can get a slowdown. Interesting point, though I don't fully understand it. The fastest CPU cache will be a tiny L1 data cache; surely that's not the relevant parameter here? > I can't imagine that it would make things worse on average, though, as the > benefit of doing more sorts as quicksorts rather than merge sorts, or doing > mergesort with fewer number of passes, would outweigh sometimes doing a > slower mergesort. If someone has a pathological use pattern for which the > averages don't work out favorably for them, they could probably play with > work_mem to correct the problem. Whereas without the patch, people who > want more memory have no options. Agreed. > People have mentioned additional things that could be done in this area, > but I don't think that applying this patch will make those things harder, > or back us into a corner. Taking an incremental approach seems suitable. Committed with some cosmetic tweaks discussed upthread. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_resetxlog -m documentation not up to date
Peter Eisentraut wrote: > It was introduced in 0ac5ad5134f2769ccbaefec73844f8504c4d6182 "Improve > concurrency of foreign key locking", but there is also no explanation > there. It is apparently needed in pg_upgrade. > > This should be documented, and I think the help line should be changed > to something like > > -m XID,XID set next and oldest multitransaction ID I have come up with the attached patch. Does this match your expectations? Also, I'm a bit annoyed that there's no easy "append N zeroes to a number" recipe for multixact members such as there is for pg_clog and multixact offsets. Any suggestions on how to improve the current recommendation? (Also, I found out that if the last multixact/offsets file is not exactly 32 pages long, the server will fail to create further mxacts after following the recipe in docs; IIRC because it looks up the mxact previous to the next one.) *** a/doc/src/sgml/ref/pg_resetxlog.sgml --- b/doc/src/sgml/ref/pg_resetxlog.sgml *** *** 27,33 PostgreSQL documentation -o oid -x xid -e xid_epoch !-m mxid -O mxoff -l xlogfile datadir --- 27,33 -o oid -x xid -e xid_epoch !-m mxid,mxid -O mxoff -l xlogfile datadir *** *** 81,87 PostgreSQL documentation -m, -O, and -l options allow the next OID, next transaction ID, next transaction ID's !epoch, next multitransaction ID, next multitransaction offset, and WAL starting address values to be set manually. These are only needed when pg_resetxlog is unable to determine appropriate values by reading pg_control. Safe values can be determined as --- 81,87 -m, -O, and -l options allow the next OID, next transaction ID, next transaction ID's !epoch, next and oldest multitransaction ID, next multitransaction offset, and WAL starting address values to be set manually. These are only needed when pg_resetxlog is unable to determine appropriate values by reading pg_control. Safe values can be determined as *** *** 104,115 PostgreSQL documentation ! A safe value for the next multitransaction ID (-m) can be determined by looking for the numerically largest file name in the directory pg_multixact/offsets under the ! data directory, adding one, and then multiplying by 65536. As above, ! the file names are in hexadecimal, so the easiest way to do this is to ! specify the option value in hexadecimal and add four zeroes. --- 104,119 ! A safe value for the next multitransaction ID (first part of -m) can be determined by looking for the numerically largest file name in the directory pg_multixact/offsets under the ! data directory, adding one, and then multiplying by 65536. ! Conversely, a safe value for the oldest multitransaction ID (second part of ! -m) ! can be determined by looking for the numerically smallest ! file name in the same directory and multiplying by 65536. ! As above, the file names are in hexadecimal, so the easiest way to do ! this is to specify the option value in hexadecimal and append four zeroes. *** *** 118,126 PostgreSQL documentation A safe value for the next multitransaction offset (-O) can be determined by looking for the numerically largest file name in the directory pg_multixact/members under the ! data directory, adding one, and then multiplying by 65536. As above, ! the file names are in hexadecimal, so the easiest way to do this is to ! specify the option value in hexadecimal and add four zeroes. --- 122,130 A safe value for the next multitransaction offset (-O) can be determined by looking for the numerically largest file name in the directory pg_multixact/members under the ! data directory, adding one, and then multiplying by 52352. As above, ! the file names are in hexadecimal. There is no simple recipe such as ! the ones above of appending zeroes. *** a/src/bin/pg_resetxlog/pg_resetxlog.c --- b/src/bin/pg_resetxlog/pg_resetxlog.c *** *** 1036,1042 usage(void) printf(_(" -e XIDEPOCH set next transaction ID epoch\n")); printf(_(" -f force update to be done\n")); printf(_(" -l XLOGFILE force minimum WAL starting location for new transaction log\n")); ! printf(_(" -m XID,OLDESTset next multitransaction ID and oldest value\n")); printf(_(" -n no update, just show extracted control values (for testing)\n")); printf(_(" -o OID set next OID\n")); printf(_(" -O OFFSETset next multitransaction offset\n")); --- 1036,1042 printf(_(" -e XI
Re: [HACKERS] Computer VARSIZE_ANY(PTR) during debugging
Amit Langote escribió: > Looking at the attlen == -1 value in tupDescriptor of the > ResultTupleSlot, VARSIZE_ANY() is used to calculate the length and > added to offset, but I find no way to calculate that while I am > dubugging. I guess you could just add a few "macro define" lines to your .gdbinit, containing definitions equivalent to those in postgres.h. Haven't tried this for the varlena macros, though I do have a couple of others in there and they ease work at times. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
Andrew Dunstan escribió: > > On 06/27/2013 02:38 PM, Josh Berkus wrote: > >>If we're going to try harder to ensure that reviewers are credited, > >>it'd probably be better to take both the commit log and the release > >>notes out of that loop. > >I'd pull the reviewers out of the CF app. Even in its current > >implementation, that's the easiest route. > > I have not honestly found these to be terribly accurate. Yeah, they're not. I do take care to review past threads when crediting reviewers in commit messages, and I don't even bother to look at the CF app. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Thu, Jun 27, 2013 at 02:17:25PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Thu, Jun 27, 2013 at 10:10:23AM -0700, Josh Berkus wrote: > >> What I would be opposed to is continuing to list the original authors in > >> the release notes and putting reviewers, testers, co-authors, etc. on a > >> separate web page. If we're gonna move people, let's move *all* of > >> them. Also, it needs to be on something more trustworthy than the wiki, > >> like maybe putting it at postgresql.org/developers/9.3/ > > > I think you will have trouble keeping those two lists synchronized. I > > think you will need to create a release note document that includes all > > names, then copy that to a website and remove the names just before the > > release is packaged. > > Unless Bruce is doing more work than I think he is, the attribution data > going into the release notes is just coming from whatever the committer > said in the commit log message. I believe that we've generally been Yes, that's all I do. "Bruce is doing more work than I think he is" gave me a chuckle. ;-) > careful to credit the patch author, but I'm less confident that everyone > who merited a "review credit" always gets mentioned --- that would > require going through the entire review thread at commit time, and I for > one can't say that I do that. > > If we're going to try harder to ensure that reviewers are credited, > it'd probably be better to take both the commit log and the release > notes out of that loop. I assume you are suggesting we _not_ use the commit log and release notes for reviewer credit. Good point; we might be able to pull that from the commit-fest app, though you then need to match that with the release note text. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On 06/27/2013 02:38 PM, Josh Berkus wrote: If we're going to try harder to ensure that reviewers are credited, it'd probably be better to take both the commit log and the release notes out of that loop. I'd pull the reviewers out of the CF app. Even in its current implementation, that's the easiest route. I have not honestly found these to be terribly accurate. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
> If we're going to try harder to ensure that reviewers are credited, > it'd probably be better to take both the commit log and the release > notes out of that loop. I'd pull the reviewers out of the CF app. Even in its current implementation, that's the easiest route. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
On Thu, Jun 27, 2013 at 12:54 PM, Atri Sharma wrote: >> Well, it does take longer to fsync a larger byte range to disk than a >> smaller byte range, in some cases. But it's generally more efficient >> to write one larger range than many smaller ranges, so you come out >> ahead on the whole. > > Right, that does make sense. > > So, the overhead of writing a lot of WAL buffers is mitigated because > one large write is better than lots of smaller rights? Yep. To take a degenerate case, suppose that you had many small WAL records, say 64 bytes each, so more than 100 per 8K block. If you flush those one by one, you're going to rewrite that block 100 times. If you flush them all at once, you write that block once. But even when the range is more than the minimum write size (8K for WAL), there are still wins. Writing 16K or 24K or 32K submitted as a single request can likely be done in a single revolution of the disk head. But if you write 8K and wait until it's done, and then write another 8K and wait until that's done, the second request may not arrive until after the disk head has passed the position where the second block needs to go. Now you have to wait for the drive to spin back around to the right position. The details of course vary with the hardware in use, but there are very few I/O operations where batching smaller requests into larger chunks doesn't help to some degree. Of course, the optimal transfer size does vary considerably based on the type of I/O and the specific hardware in use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Please find attached a v12, which under timer_exceeded interrupts clients which are being throttled instead of waiting for the end of the transaction, as the transaction is not started yet. Please find attached a v13 which fixes conflicts introduced by the long options patch committed by Robert Haas. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 80203d6..da88bd7 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -137,6 +137,12 @@ int unlogged_tables = 0; double sample_rate = 0.0; /* + * When threads are throttled to a given rate limit, this is the target delay + * to reach that rate in usec. 0 is the default and means no throttling. + */ +int64 throttle_delay = 0; + +/* * tablespace selection */ char *tablespace = NULL; @@ -200,11 +206,13 @@ typedef struct int listen; /* 0 indicates that an async query has been * sent */ int sleeping; /* 1 indicates that the client is napping */ +boolthrottling; /* whether nap is for throttling */ int64 until; /* napping until (usec) */ Variable *variables; /* array of variable definitions */ int nvariables; instr_time txn_begin; /* used for measuring transaction latencies */ instr_time stmt_begin; /* used for measuring statement latencies */ + bool throttled; /* whether current transaction was throttled */ int use_file; /* index in sql_files for this client */ bool prepared[MAX_FILES]; } CState; @@ -222,6 +230,10 @@ typedef struct instr_time *exec_elapsed; /* time spent executing cmds (per Command) */ int *exec_count; /* number of cmd executions (per Command) */ unsigned short random_state[3]; /* separate randomness for each thread */ +int64 throttle_trigger; /* previous/next throttling (us) */ + int64 throttle_lag; /* total transaction lag behind throttling */ + int64 throttle_lag_max; /* max transaction lag */ + } TState; #define INVALID_THREAD ((pthread_t) 0) @@ -230,6 +242,8 @@ typedef struct { instr_time conn_time; int xacts; + int64 throttle_lag; + int64 throttle_lag_max; } TResult; /* @@ -353,6 +367,7 @@ usage(void) " -n, --no-vacuum do not run VACUUM before tests\n" " -N, --skip-some-updates skip updates of pgbench_tellers and pgbench_branches\n" " -r, --report-latencies report average latency per command\n" + " -R, --rate SPEC target rate in transactions per second\n" " -s, --scale=NUM report this scale factor in output\n" " -S, --select-onlyperform SELECT-only transactions\n" " -t, --transactions number of transactions each client runs " @@ -895,17 +910,58 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa { PGresult *res; Command **commands; + booldo_throttle = false; top: commands = sql_files[st->use_file]; + /* handle throttling once per transaction by inserting a sleep. + * this is simpler than doing it at the end. + */ + if (throttle_delay && ! st->throttled) + { + /* compute delay to approximate a Poisson distribution + * 100 => 13.8 .. 0 multiplier + * 10 => 11.5 .. 0 + * 1 => 9.2 .. 0 + *1000 => 6.9 .. 0 + * if transactions are too slow or a given wait shorter than + * a transaction, the next transaction will start right away. + */ + int64 wait = (int64) + throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); + + thread->throttle_trigger += wait; + + st->until = thread->throttle_trigger; + st->sleeping = 1; + st->throttling = true; + st->throttled = true; + if (debug) + fprintf(stderr, "client %d throttling "INT64_FORMAT" us\n", + st->id, wait); + } + if (st->sleeping) { /* are we sleeping? */ instr_time now; + int64 now_us; INSTR_TIME_SET_CURRENT(now); - if (st->until <= INSTR_TIME_GET_MICROSEC(now)) + now_us = INSTR_TIME_GET_MICROSEC(now); + if (st->until <= now_us) + { st->sleeping = 0; /* Done sleeping, go ahead with next command */ + if (st->throttling) + { +/* measure lag of throttled transaction */ +int64 lag = now_us - st->until; +thread->throttle_lag += lag; +if (lag > thread->throttle_lag_max) + thread->throttle_lag_max = lag; +st->throttling = false; + } + } else return true; /* Still sleeping, nothing to do here */ } @@ -1034,7 +1090,7 @@ top: * This is more than we really ought to know about * instr_time */ - fprintf(logfile, "%d %d %.0f %d %ld %ld\n", + fprintf(logfile, "%d %d %.0f %d %ld.%06ld\n", st->id, st->cnt, usec, st->use_file, (long) now.tv_sec, (long) now.tv_usec); #else @@ -1043,7 +1099,7 @@ top: * On Windows, instr_time doesn't provide a timestamp * anyway */ - fprintf(logfile, "%d %d %.0f %d 0 0\n", + fprintf(logfile, "%d %d %.0f %d 0.0\n", st->id, st->cnt, u
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 17, 2013 at 7:48 AM, Simon Riggs wrote: >> I am told, one of the very popular setups for DR is to have one >> local sync standby and one async (may be cascaded by the local sync). Since >> this new feature is more useful for DR because taking a fresh backup on a >> slower link is even more challenging, IMHO we should support such setups. > > ...which still doesn't make sense to me. Lets look at that in detail. > > Take 3 servers, A, B, C with A and B being linked by sync rep, and C > being safety standby at a distance. > > Either A or B is master, except in disaster. So if A is master, then B > would be the failover target. If A fails, then you want to failover to > B. Once B is the target, you want to failback to A as the master. C > needs to follow the new master, whichever it is. > > If you set up sync rep between A and B and this new mode between A and > C. When B becomes the master, you need to failback from B from A, but > you can't because the new mode applied between A and C only, so you > have to failback from C to A. So having the new mode not match with > sync rep means you are forcing people to failback using the slow link > in the common case. It's true that in this scenario that doesn't really make sense, but I still think they are separate properties. You could certainly want synchronous replication without this new property, if you like the data-loss guarantees that sync rep provides but don't care about failback. You could also want this new property without synchronous replication, if you don't need the data-loss guarantees that sync rep provides but you do care about fast failback. I admit it seems unlikely that you would use both features but not target them at the same machines, although maybe: perhaps you have a sync standby and an async standby and want this new property with respect to both of them. In my admittedly limited experience, the use case for a lot of this technology is in the cloud. The general strategy seems to be: at the first sign of trouble, kill the offending instance and fail over. This can result in failing over pretty frequently, and needing it to be fast. There may be no real hardware problem; indeed, the failover may be precipitated by network conditions or overload of the physical host backing the virtual machine or any number of other nonphysical problems. I can see this being useful in that environment, even for async standbys. People can apparently tolerate a brief interruption while their primary gets killed off and connections are re-established with the new master, but they need the failover to be fast. The problem with the status quo is that, even if the first failover is fast, the second one isn't, because it has to wait behind rebuilding the original master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
Bruce Momjian writes: > On Thu, Jun 27, 2013 at 10:10:23AM -0700, Josh Berkus wrote: >> What I would be opposed to is continuing to list the original authors in >> the release notes and putting reviewers, testers, co-authors, etc. on a >> separate web page. If we're gonna move people, let's move *all* of >> them. Also, it needs to be on something more trustworthy than the wiki, >> like maybe putting it at postgresql.org/developers/9.3/ > I think you will have trouble keeping those two lists synchronized. I > think you will need to create a release note document that includes all > names, then copy that to a website and remove the names just before the > release is packaged. Unless Bruce is doing more work than I think he is, the attribution data going into the release notes is just coming from whatever the committer said in the commit log message. I believe that we've generally been careful to credit the patch author, but I'm less confident that everyone who merited a "review credit" always gets mentioned --- that would require going through the entire review thread at commit time, and I for one can't say that I do that. If we're going to try harder to ensure that reviewers are credited, it'd probably be better to take both the commit log and the release notes out of that loop. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
Otherwise, he simplest possible adaptation, if it is required to have the progress feature under fork emulation to pass it, is that under "fork emulation" each processus reports its current progress instead of having a collective summing. Perhaps that's worth doing. I agree with Fabien that full support of this feature in the process model is more trouble than it's worth, though, and I wouldn't scream loudly if we just didn't support it. --disable-thread-safety doesn't have to be entirely penalty-free. Attached is patch version 5. It includes this solution for fork emulation, one report per thread instead of a global report. Some code duplication for that. It also solves conflicts introduced by the long options patch. Finally, I've added a latency measure as defended by Mitsumasa. However the formula must be updated for the throttling patch. Maybe I should have submitted a bunch of changes to pgbench in one patch. I thought that separating orthogonal things made reviewing simpler so the patches were more likely to pass, but I'm not so sure that the other strategy would have been that bad. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 80203d6..84b969a 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -74,7 +74,7 @@ static int pthread_join(pthread_t th, void **thread_return); #include #else /* Use emulation with fork. Rename pthread identifiers to avoid conflicts */ - +#define PTHREAD_FORK_EMULATION #include #define pthread_tpg_pthread_t @@ -164,6 +164,8 @@ bool use_log; /* log transaction latencies to a file */ bool use_quiet; /* quiet logging onto stderr */ int agg_interval; /* log aggregates instead of individual * transactions */ +int progress = 0; /* thread progress report every this seconds */ +int progress_nclients = 0; /* number of clients for progress report */ bool is_connect; /* establish connection for each transaction */ bool is_latencies; /* report per-command latencies */ int main_pid; /* main process id used in log filename */ @@ -352,6 +354,7 @@ usage(void) "(default: simple)\n" " -n, --no-vacuum do not run VACUUM before tests\n" " -N, --skip-some-updates skip updates of pgbench_tellers and pgbench_branches\n" + " -P, --progress SEC show thread progress report every SEC seconds\n" " -r, --report-latencies report average latency per command\n" " -s, --scale=NUM report this scale factor in output\n" " -S, --select-onlyperform SELECT-only transactions\n" @@ -2136,6 +2139,7 @@ main(int argc, char **argv) {"unlogged-tables", no_argument, &unlogged_tables, 1}, {"sampling-rate", required_argument, NULL, 4}, {"aggregate-interval", required_argument, NULL, 5}, + {"progress", required_argument, NULL, 'P'}, {NULL, 0, NULL, 0} }; @@ -2202,7 +2206,7 @@ main(int argc, char **argv) state = (CState *) pg_malloc(sizeof(CState)); memset(state, 0, sizeof(CState)); - while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1) + while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:", long_options, &optindex)) != -1) { switch (c) { @@ -2357,6 +2361,16 @@ main(int argc, char **argv) exit(1); } break; + case 'P': +progress = atoi(optarg); +if (progress <= 0) +{ + fprintf(stderr, + "thread progress delay (-P) must not be negative (%s)\n", + optarg); + exit(1); +} +break; case 0: /* This covers long options which take no argument. */ break; @@ -2482,6 +2496,7 @@ main(int argc, char **argv) * changed after fork. */ main_pid = (int) getpid(); + progress_nclients = nclients; if (nclients > 1) { @@ -2733,6 +2748,11 @@ threadRun(void *arg) int nstate = thread->nstate; int remains = nstate; /* number of remaining clients */ int i; + /* for reporting progress: */ + int64 thread_start = INSTR_TIME_GET_MICROSEC(thread->start_time); + int64 last_report = thread_start; + int64 next_report = last_report + progress * 100; + int64 last_count = 0; AggVals aggs; @@ -2896,6 +2916,68 @@ threadRun(void *arg) st->con = NULL; } } + +#ifdef PTHREAD_FORK_EMULATION + /* each process reports its own progression */ + if (progress) + { + instr_time now_time; + int64 now; + INSTR_TIME_SET_CURRENT(now_time); + now = INSTR_TIME_GET_MICROSEC(now_time); + if (now >= next_report) + { +/* generate and show report */ +int64 count = 0; +int64 run = now - last_report; +float tps, total_run, latency; + +for (i = 0 ; i < nstate ; i++) + count += state[i].cnt; + +total_run = (now - thread_start) / 100.0; +tps = 100.0 * (count-last_count) / run; +latency = 1000.0 * nstate / tps; + +fprintf(stderr, "progress %d
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
Fabien COELHO escribió: > For the text formatting, I tried to keep the screen width under 80 > characters, because if the line is too wide it is harder to read as > the eye may loose the alignment. But being homogeneous with other > commands is fine with me as well. The format chosen by Robert fits in 80 columns and looks very good to me. Thanks to both. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Josh Berkus writes: > So, is this patch currently depending on performance testing, or not? > Like I said, it'll be a chunk of time to set up what I beleive is a > realistic performance test, so I don't want to do it if the patch is > likely to be bounced for other reasons. AFAICS, the threshold question here is whether the patch helps usefully for tables with typical numbers of columns (ie, not 800 ;-)), and doesn't hurt materially in any common scenario. If it does, I think we'd take it. I've not read the patch, so I don't know if there are minor cosmetic or correctness issues, but at bottom this seems to be a very straightforward performance tradeoff. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: query result history in psql
Maciej Gajewski escribió: > > Those issues aside - I think it's a great feature! I can add the > > grammatical fixes I made whenever the final patch is ready. Or earlier, > > whatever works for you. Also, this is my first time reviewing a patch, so > > please let me know if I can improve on anything. Thanks! > > This is my first submitted patch, so I can't really comment on the > process. But if you could add the author's email to CC, the message would > be much easier to spot. I replied after two days only because I missed the > message in the flood of other pgsql-hacker messages. I think I need to scan > the list more carefully... It's better to post a review as a reply to the message which contains the patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogInsert scaling, revisited
On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote: > There's this in the comment near the top of the file: Btw, I find the 'you' used in the comment somewhat irritating. Not badly so, but reading something like: * When you are about to write * out WAL, it is important to call WaitXLogInsertionsToFinish() *before* * acquiring WALWriteLock, to avoid deadlocks. Otherwise you might get stuck * waiting for an insertion to finish (or at least advance to next * uninitialized page), while you're holding WALWriteLock. just seems strange to me. If this directed at plugin authors, maybe, but it really isn't... But that's probably a question for a native speaker... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Thu, Jun 27, 2013 at 10:10:23AM -0700, Josh Berkus wrote: > On 06/27/2013 08:56 AM, Bruce Momjian wrote:> Adding the names to each > release note item is not a problem; the > > problem is the volume of names that overwhelms the release note text. If > > we went that direction, I predict we would just remove _all_ names from > > the release notes. > > That's not a realistic objection. We'd be talking about one or two more > names per patch, with a handful of exceptions. It is realistic because I did this for 9.2 beta and people didn't like it; how much more realistic do you want? > If you're going to make an objection, object to the amount of extra work > it would be, which is significant with the current state of our > technology. Realistically, it'll take the help of additional people. No extra work required, it is all copy/paste for me. > On 06/27/2013 09:19 AM, Tom Lane wrote: > > Yeah. Keep in mind that the overwhelming majority of the audience for > > the release notes doesn't actually give a darn who implemented what. > > There is some argument to be made about moving the whole "list of names" > to some other resource, like a web page. In fact, if the mythical land > where we have a new CF application, that other resource could even be > generated by the CF app with some hand-tweaking (this would also require > some tweaks to community profiles etc.). Yes, I am fine with moving all names. However, I predict you will do more to demotivate patch submitters than you will to motivate patch reviewers. That is fine with me, as motivating developers/reviewers is not our top priority. > What I would be opposed to is continuing to list the original authors in > the release notes and putting reviewers, testers, co-authors, etc. on a > separate web page. If we're gonna move people, let's move *all* of > them. Also, it needs to be on something more trustworthy than the wiki, > like maybe putting it at postgresql.org/developers/9.3/ I think you will have trouble keeping those two lists synchronized. I think you will need to create a release note document that includes all names, then copy that to a website and remove the names just before the release is packaged. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
All, So, is this patch currently depending on performance testing, or not? Like I said, it'll be a chunk of time to set up what I beleive is a realistic performance test, so I don't want to do it if the patch is likely to be bounced for other reasons. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On 06/27/2013 08:56 AM, Bruce Momjian wrote:> Adding the names to each release note item is not a problem; the > problem is the volume of names that overwhelms the release note text. If > we went that direction, I predict we would just remove _all_ names from > the release notes. That's not a realistic objection. We'd be talking about one or two more names per patch, with a handful of exceptions. If you're going to make an objection, object to the amount of extra work it would be, which is signigficant with the current state of our technology. Realistically, it'll take the help of additional people. On 06/27/2013 09:19 AM, Tom Lane wrote: > Yeah. Keep in mind that the overwhelming majority of the audience for > the release notes doesn't actually give a darn who implemented what. There is some argument to be made about moving the whole "list of names" to some other resource, like a web page. In fact, if the mythical land where we have a new CF application, that other resource could even be generated by the CF app with some hand-tweaking (this would also require some tweaks to community profiles etc.). What I would be opposed to is continuing to list the original authors in the release notes and putting reviewers, testers, co-authors, etc. on a separate web page. If we're gonna move people, let's move *all* of them. Also, it needs to be on something more trustworthy than the wiki, like maybe putting it at postgresql.org/developers/9.3/ On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan wrote: > Encouraging new people is good, but recognizing sustained, long-term > contributions is good, too. I think we should do more of that, too. I wasn't thinking about doing it every year -- just for 9.3, in order to encourage more reviewers, and encourage reviewers to do more reviews. But that would only work if we're also giving reviewers credit; as several people have said, they care more about credit than they do about prizes. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
Fabien COELHO writes: >>> Here is a v4 that takes into account most of your points: The report is >>> performed for all threads by thread 0, however --progress is not supported >>> under thread fork emulation if there are more than one thread. The report >>> time does not slip anymore. >> I don't believe that to be an acceptable restriction. > My first proposal is to remove the fork emulation altogether, which would > remove many artificial limitations to pgbench and simplify the code > significantly. That would be an improvement. I would object strongly to that, as it would represent a significant movement of the goalposts on what is required to build Postgres at all, ie platforms on which --enable-thread-safety is unavailable or expensive would be out in the cold. Perhaps that set is approaching empty, but a project that's still standardized on C89 has little business making such a choice IMO. > Otherwise, he simplest possible adaptation, if it is required to have the > progress feature under fork emulation to pass it, is that under "fork > emulation" each processus reports its current progress instead of having a > collective summing. Perhaps that's worth doing. I agree with Fabien that full support of this feature in the process model is more trouble than it's worth, though, and I wouldn't scream loudly if we just didn't support it. --disable-thread-safety doesn't have to be entirely penalty-free. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extensible external toast tuple support & snappy prototype
On 2013-06-27 09:17:10 -0700, Hitoshi Harada wrote: > On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas wrote: > > > On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund > > wrote: > > > There will be a newer version of the patch coming today or tomorrow, so > > > there's probably no point in looking at the one linked above before > > > that... > > > > This patch is marked as "Ready for Committer" in the CommitFest app. > > But it is not clear to me where the patch is that is being proposed > > for commit. > > > > > > > > No, this conversation is about patch #1153, pluggable toast compression, > which is in Needs Review, and you may be confused with #1127, extensible > external toast tuple support. Well, actually this is the correct thread, and pluggable toast compression developed out of it. You had marked #1127 as ready for committer although you had noticed an omission in heap_tuple_fetch_attr... Answered with the new patch version toplevel in the thread, to make this hopefully less confusing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
[...] So I changed that, and committed this, with some further cosmetic changes. [...] Thanks for the commit & the style improvements. For the text formatting, I tried to keep the screen width under 80 characters, because if the line is too wide it is harder to read as the eye may loose the alignment. But being homogeneous with other commands is fine with me as well. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extensible external toast tuple support
Hi, Please find attached the next version of the extensible toast support. There basically are two changes: * handle indirect toast tuples properly in heap_tuple_fetch_attr and related places * minor comment adjustments Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 5c7079067f870a65d65ee09cccbf743c88b64c82 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 11 Jun 2013 23:25:26 +0200 Subject: [PATCH] Add support for multiple kinds of external toast datums There are several usecases where our current representation of external toast datums is limiting: * adding new compression schemes * avoidance of repeated detoasting * externally decoded toast tuples For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the current va_len_1be field to store the tag (or type) of a varlena. To determine the actual length a macro VARTAG_SIZE(tag) is added which can be used to map from a tag to the actual length. This patch adds support for 'indirect' tuples which point to some externally allocated memory containing a toast tuple. It also implements the stub for a different compression algorithm. --- src/backend/access/heap/tuptoaster.c | 142 +++ src/include/c.h | 2 + src/include/postgres.h | 84 +++-- 3 files changed, 191 insertions(+), 37 deletions(-) diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c index fc37ceb..5ee5e33 100644 --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -87,11 +87,11 @@ static struct varlena *toast_fetch_datum_slice(struct varlena * attr, * heap_tuple_fetch_attr - * * Public entry point to get back a toasted value from - * external storage (possibly still in compressed format). + * external source (possibly still in compressed format). * * This will return a datum that contains all the data internally, ie, not - * relying on external storage, but it can still be compressed or have a short - * header. + * relying on external storage or memory, but it can still be compressed or + * have a short header. -- */ struct varlena * @@ -99,13 +99,35 @@ heap_tuple_fetch_attr(struct varlena * attr) { struct varlena *result; - if (VARATT_IS_EXTERNAL(attr)) + if (VARATT_IS_EXTERNAL_ONDISK(attr)) { /* * This is an external stored plain value */ result = toast_fetch_datum(attr); } + else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) + { + /* + * copy into the caller's memory context. That's not required in all + * cases but sufficient for now since this is mainly used when we need + * to persist a Datum for unusually long time, like in a HOLD cursor. + */ + struct varatt_indirect redirect; + VARATT_EXTERNAL_GET_POINTER(redirect, attr); + attr = (struct varlena *)redirect.pointer; + + /* nested indirect Datums aren't allowed */ + Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr)); + + /* doesn't make much sense, but better handle it */ + if (VARATT_IS_EXTERNAL_ONDISK(attr)) + return heap_tuple_fetch_attr(attr); + + /* copy datum verbatim */ + result = (struct varlena *) palloc(VARSIZE_ANY(attr)); + memcpy(result, attr, VARSIZE_ANY(attr)); + } else { /* @@ -128,7 +150,7 @@ heap_tuple_fetch_attr(struct varlena * attr) struct varlena * heap_tuple_untoast_attr(struct varlena * attr) { - if (VARATT_IS_EXTERNAL(attr)) + if (VARATT_IS_EXTERNAL_ONDISK(attr)) { /* * This is an externally stored datum --- fetch it back from there @@ -145,6 +167,17 @@ heap_tuple_untoast_attr(struct varlena * attr) pfree(tmp); } } + else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) + { + struct varatt_indirect redirect; + VARATT_EXTERNAL_GET_POINTER(redirect, attr); + attr = (struct varlena *)redirect.pointer; + + /* nested indirect Datums aren't allowed */ + Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr)); + + attr = heap_tuple_untoast_attr(attr); + } else if (VARATT_IS_COMPRESSED(attr)) { /* @@ -191,7 +224,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr, char *attrdata; int32 attrsize; - if (VARATT_IS_EXTERNAL(attr)) + if (VARATT_IS_EXTERNAL_ONDISK(attr)) { struct varatt_external toast_pointer; @@ -204,6 +237,17 @@ heap_tuple_untoast_attr_slice(struct varlena * attr, /* fetch it back (compressed marker will get set automatically) */ preslice = toast_fetch_datum(attr); } + else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) + { + struct varatt_indirect redirect; + VARATT_EXTERNAL_GET_POINTER(redirect, attr); + + /* nested indirect Datums aren't allowed */ + Assert(!VARATT_IS_EXTERNAL_INDIRECT(redirect.pointer)); + + return heap_tuple_untoast_attr_slice(redirect.pointer, + sliceoffset, slicelength); + } else preslice = attr; @@ -267,7 +311,7 @@ toast_raw_datum_size(Datum value) struct varlena *attr = (str
Re: [HACKERS] Kudos for Reviewers -- straw poll
On 6/25/13 2:44 PM, Josh Berkus wrote: On the other hand, I will point out that we currently have a shortage of reviewers, and we do NOT have a shortage of patch submitters. That's because reviewing is harder than initial development. The only people who think otherwise are developers who don't do enough review. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
> Well, it does take longer to fsync a larger byte range to disk than a > smaller byte range, in some cases. But it's generally more efficient > to write one larger range than many smaller ranges, so you come out > ahead on the whole. Right, that does make sense. So, the overhead of writing a lot of WAL buffers is mitigated because one large write is better than lots of smaller rights? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
> > commit_delay exists to artificially increase the window in which the > leader backend waits for more group commit followers. At higher client > counts, that isn't terribly useful because you'll naturally have > enough clients anyway, but at lower client counts particularly where > fsyncs have high latency, it can help quite a bit. I mention this > because clearly commit_delay is intended to trade off latency for > throughput. Although having said that, when I worked on commit_delay, > the average and worse-case latencies actually *improved* for the > workload in question, which consisted of lots of small write > transactions. Though I wouldn't be surprised if you could produce a > reasonable case where latency was hurt a bit, but throughput improved. Thanks for your reply. The logic says that latency will be hit when commit_delay is applied, but I am really interested in why we get an improvement instead. Can small writes be the reason? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On 6/27/13 4:19 AM, Dean Rasheed wrote: > I'd say there are clearly people who want it, and the nature of some > of those answers suggests to me that we ought to have a better answer > in core. It's not clear what these people wanted this functionality for. They all wanted to analyze a table to compare with another table (or the same table later). Either, they wanted this to detect data changes, in which case the right tool is a checksum, not a cryptographic hash. We already have several checksum implementations in core, so we could expose on of them. Or they wanted this to protect their data from tampering, in which case the right tool is a cryptographic hash, but Noah argues that a sum of MD5 hashes is not cryptographically sound. (And in any case, we don't put cryptographic functionality into the core.) The reason md5_agg is proposed here and in all those cited posts is presumably because the md5() function was already there anyway. The the md5() function is there because the md5 code was already there anyway because of the authentication. Let's not add higher-order already-there-anyway code. ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Andres Freund writes: > On 2013-06-26 20:07:40 -0400, Tom Lane wrote: >> I still want to delete the test for SOCK_ERRNO == 0. I traced that back >> to commit da9501bddb4dc33c031b1db6ce2133bcee7b, but I can't find >> anything in the mailing list archives to explain that. I have an >> inquiry in to Jan to see if he can remember the reason ... > That looks strange, yes. Committed that way. We can always undo it if Jan can defend the logic. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
Dear Robert, Here is a v4 that takes into account most of your points: The report is performed for all threads by thread 0, however --progress is not supported under thread fork emulation if there are more than one thread. The report time does not slip anymore. I don't believe that to be an acceptable restriction. The "pthread fork emulation" is just an ugly hack to run pgbench on a host that does not have pthreads (portable threads). I'm not sure that it applies on any significant system, but I can assure you that it imposes severe limitations about how to do things properly in pgbench: As there is no threads, there is no shared memory, no locking mecanism, nothing really. So it is hard to generated a shared report in such conditions. My first proposal is to remove the fork emulation altogether, which would remove many artificial limitations to pgbench and simplify the code significantly. That would be an improvement. Otherwise, he simplest possible adaptation, if it is required to have the progress feature under fork emulation to pass it, is that under "fork emulation" each processus reports its current progress instead of having a collective summing. Note that it is possible to implement the feature with interprocess communications, but really generating many pipes will add a lot of complexity to the code, and I do not thing that the code nor this simple feature deserve that. Another option is to have each thread to report its progression indenpently with all implementations, that what I did in the first instance. It is much less interesting, but it would be homogeneous although poor for every versions. We generally require features to work on all platforms we support. We have made occasional compromises there, but generally only when the restriction is fundamental to the platform rather than for developer convenience. I agree with this kind of "generally", but please consider that "pthread fork emulation" really means "processes", so that simple things with threads become significantly more complex to implement. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
2013/6/27 Markus Wanner : > On 06/27/2013 11:12 AM, Nicolas Barbier wrote: > >> Imagine that there are a lot of indexes, e.g., 50. Although a lookup >> (walking one index) is equally fast, an insertion must update al 50 >> indexes. When each index requires one extra I/O (because each index is >> one level taller), that is 50 extra I/Os. In the partitioned case, >> each index would require the normal smaller amount of I/Os. Choosing >> which partition to use must only be done once: The result “counts” for >> all indexes that are to be updated. > > I think you're underestimating the cost of partitioning. After all, the > lookup of what index to update for a given partition is a a lookup in > pg_index via pg_index_indrelid_index - a btree index. I am assuming that this (comparatively very small and super-hot) index is cached all the time, while for the other indexes (that are supposedly super-huge) only the top part stays cached. I am mostly just trying to find out where Yuri’s “partitioning is needed for super-huge tables” experience might come from, and noting that Heikki’s argument might not be 100% valid. I think that the “PostgreSQL-answer” to this problem is to somehow cluster the data on the “hotness column” (so that all hot rows are close to one another, thereby improving the efficiency of the caching of relation blocks) + partial indexes for the hot rows (as first mentioned by Claudio; to improve the efficiency of the caching of index blocks). > Additionally, the depth of an index doesn't directly translate to the > number of I/O writes per insert (or delete). I'd rather expect the avg. > number of I/O writes per insert into a b-tree to be reasonably close to > one - depending mostly on the number of keys per page, not depth. My reasoning was: To determine which index block to update (typically one in both the partitioned and non-partitioned cases), one needs to walk the index first, which supposedly causes one additional (read) I/O in the non-partitioned case on average, because there is one extra level and the lower part of the index is not cached (because of the size of the index). I think that pokes a hole in Heikki’s argument of “it really doesn’t matter, partitioning == using one big table with big non-partial indexes.” Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
On Thu, Jun 27, 2013 at 12:56 AM, Atri Sharma wrote: > Now, with group commits, do we see a spike in that disk write latency, > especially in the cases where the user has set wal_buffers to a high > value? commit_delay exists to artificially increase the window in which the leader backend waits for more group commit followers. At higher client counts, that isn't terribly useful because you'll naturally have enough clients anyway, but at lower client counts particularly where fsyncs have high latency, it can help quite a bit. I mention this because clearly commit_delay is intended to trade off latency for throughput. Although having said that, when I worked on commit_delay, the average and worse-case latencies actually *improved* for the workload in question, which consisted of lots of small write transactions. Though I wouldn't be surprised if you could produce a reasonable case where latency was hurt a bit, but throughput improved. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
Bruce Momjian writes: > On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote: >> It could be pretty satisfactory to have a simple listing, in the >> release notes, of the set of reviewers. That's a lot less >> bookkeeping than tracking this for each and every change. > Adding the names to each release note item is not a problem; the > problem is the volume of names that overwhelms the release note text. If > we went that direction, I predict we would just remove _all_ names from > the release notes. Yeah. Keep in mind that the overwhelming majority of the audience for the release notes doesn't actually give a darn who implemented what. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On 06/27/2013 12:12 PM, Tom Lane wrote: Bruce Momjian writes: On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote: It could be pretty satisfactory to have a simple listing, in the release notes, of the set of reviewers. That's a lot less bookkeeping than tracking this for each and every change. Adding the names to each release note item is not a problem; the problem is the volume of names that overwhelms the release note text. If we went that direction, I predict we would just remove _all_ names from the release notes. Yeah. Keep in mind that the overwhelming majority of the audience for the release notes doesn't actually give a darn who implemented what. Maybe we should have a Kudos / Bragging rights wiki page, with a table something like this: Release Feature Name Principal Author(s) Contributing Author(s) Code Reviewer(s) Tester(s) Constructing it going backwards would be an interesting task :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extensible external toast tuple support & snappy prototype
On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas wrote: > On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund > wrote: > > There will be a newer version of the patch coming today or tomorrow, so > > there's probably no point in looking at the one linked above before > > that... > > This patch is marked as "Ready for Committer" in the CommitFest app. > But it is not clear to me where the patch is that is being proposed > for commit. > > > > No, this conversation is about patch #1153, pluggable toast compression, which is in Needs Review, and you may be confused with #1127, extensible external toast tuple support. -- Hitoshi Harada
Re: [HACKERS] Min value for port
On Thu, Jun 27, 2013 at 9:22 AM, Andres Freund wrote: > On 2013-06-27 15:11:26 +0200, Magnus Hagander wrote: > > On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut > wrote: > > > On 6/27/13 6:34 AM, Magnus Hagander wrote: > > >> Is there a reason why we have set the min allowed value for port to 1, > > >> not 1024? Given that you can't actually start postgres with a value of > > >> <1024, shoulnd't the entry in pg_settings reference that as well? > > > > > > Are you thinking of the restriction that you need to be root to use > > > ports <1024? That restriction is not necessarily universal. We can > let > > > the kernel tell us at run time if it doesn't like our port. > > > > Yes, that's the restriction I was talking about. It's just a bit > > annoying that if you look at pg_settings.min_value it doesn't actually > > tell you the truth. But yeah, I believe Windows actually lets you use > > a lower port number, so it'd at least have to be #ifdef'ed for that if > > we wanted to change it. > > You can easily change the setting on linux as well. And you can grant > specific binaries the permission to bind to restricted ports without > being root. > I don't think the additional complexity to get a sensible value in there > is warranted. > With that large a set of local policies that can change the "usual < 1024" policy, yep, I agree that it's not worth trying too hard on this one. And supposing something like SE-Linux can grant bindings for a particular user/binary to access a *specific* port, that represents a model that is pretty incompatible with the notion of a "minimum value." On the one hand, the idea of having to add a lot of platform-specific code (which may further be specific to a framework like SE-Linux) is not terribly appealing. Further, if the result is something that doesn't really fit with a "minimum," is it much worth fighting with the platform localities? Indeed, I begin to question whether indicating a "minimum" is actually meaningful. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote: > b) It would be a pretty good thing to mention reviewers within commit notes; > that provides some direct trace-back as to who it was that either validated > that the change was good, or that let a bad one slip through. > > c) The release notes indicate authors of changes; to have a list of reviewers > would be a fine thing. > > If it requires inordinate effort to get the reviewers directly attached to > each > and every change, perhaps it isn't worthwhile to go to extreme efforts to that > end. > > It could be pretty satisfactory to have a simple listing, in the release > notes, > of the set of reviewers. That's a lot less bookkeeping than tracking this for > each and every change. Adding the names to each release note item is not a problem; the problem is the volume of names that overwhelms the release note text. If we went that direction, I predict we would just remove _all_ names from the release notes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Thu, Jun 27, 2013 at 10:37 AM, Robert Haas wrote: > On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan > wrote: > > I'd like to see prizes each release for "best contribution" and "best > > reviewer" - I've thought for years something like this would be worth > > trying. Committers and core members should not be eligible - this is > about > > encouraging new people. > > Encouraging new people is good, but recognizing sustained, long-term > contributions is good, too. I think we should do more of that, too. > Conforming with David Fetter's pointer to the notion that sometimes attempts to reward can backfire, I'm not sure that it will be super-helpful to create "special" rewards. On the other hand, to recognize reviewer contributions in places relevant to where they take place seems pretty apropos, which could include: a) Obviously we already capture this in the CommitFest web site (but it's worth mentioning when trying to do a "census") b) It would be a pretty good thing to mention reviewers within commit notes; that provides some direct trace-back as to who it was that either validated that the change was good, or that let a bad one slip through. c) The release notes indicate authors of changes; to have a list of reviewers would be a fine thing. If it requires inordinate effort to get the reviewers directly attached to each and every change, perhaps it isn't worthwhile to go to extreme efforts to that end. It could be pretty satisfactory to have a simple listing, in the release notes, of the set of reviewers. That's a lot less bookkeeping than tracking this for each and every change. The statement of such a list is a public acknowledgement of those that help assure that the quality of PostgreSQL code remains excellent. (And that may represent a good way to sell this "kudo".) This allows organizations that are sponsoring PostgreSQL development to have an extra metric by which *they* can recognize that their staff that do such work are being recognized as contributors. It seems to me that this is way more useful than a free t-shirt or the like.
Re: [HACKERS] fixing pg_ctl with relative paths
On Thu, Jun 27, 2013 at 10:36 AM, Josh Kupershmidt wrote: > On Wed, Jun 26, 2013 at 12:22 PM, Fujii Masao wrote: >> On Wed, Jun 26, 2013 at 2:36 PM, Hari Babu wrote: >>> On June 26, 2013 5:02 AM Josh Kupershmidt wrote: Thanks for the feedback. Attached is a rebased version of the patch with >>> the two small issues noted fixed. >> >> The following description in the document of pg_ctl needs to be modified? >> >> restart might fail if relative paths specified were specified on >> the command-line during server start. > > Right, that caveat could go away. > >> +#define DATADIR_SPEC "\"-D\" \"" >> + >> + datadir = strstr(post_opts, DATADIR_SPEC); >> >> Though this is a corner case, the patch doesn't seem to handle properly the >> case >> where "-D" appears as other option value, e.g., -k option value, in >> postmaster.opts >> file. > > Could I see a command-line example of what you mean? postmaster -k "-D", for example. Of course, it's really a corner case :) Another corner case is, for example, pg_ctl -D test1 -o "-D test2", that is, multiple -D specifications appear in the command-line. Can we overlook these cases? >> Just idea to work around that problem is to just append the specified -D >> option >> and value to post_opts. IOW, -D option and value appear twice in post_opts. >> In this case, posteriorly-located ones are used in the end. Thought? > > Hrm, I think we'd have to be careful that postmaster.opts doesn't > accumulate an additional -D specification with every restart. Yes. Oh, I was thinking that postmaster writes only -D specification which postmaster actually uses, in the opts file. So that accumulation would not happen, I thought. But that's not true. Postmaster writes all the specified arguments in the opts file. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
On Thu, Jun 27, 2013 at 5:49 AM, Dimitri Fontaine wrote: > I think that's a limitation of the old model and we don't want to turn > templates for extensions into being shared catalogs. At least that's my > understanding of the design consensus. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On Thu, Jun 27, 2013 at 7:29 AM, Marko Kreen wrote: > On Thu, Jun 27, 2013 at 11:28 AM, Dean Rasheed > wrote: >> On 26 June 2013 21:46, Peter Eisentraut wrote: >>> On 6/26/13 4:04 PM, Dean Rasheed wrote: A quick google search reveals several people asking for something like this, and people recommending md5(string_agg(...)) or md5(string_agg(md5(...))) based solutions, which are doomed to failure on larger tables. >>> >>> The thread discussed several other options of checksumming tables that >>> did not have the air of a crytographic offering, as Noah put it. >>> >> >> True but md5 has the advantage of being directly comparable with the >> output of Unix md5sum, which would be useful if you loaded data from >> external files and wanted to confirm that your import process didn't >> mangle it. > > The problem with md5_agg() is that it's only useful in toy scenarios. > > It's more useful give people script that does same sum(hash(row)) > on dump file than try to run MD5 on ordered rows. > > Also, I don't think anybody actually cares about MD5(table-as-bytes), instead > people want way to check if 2 tables or table and dump are same. I think you're trying to tell Dean to write the patch that you want instead of the patch that he wants. There are certainly other things that could be done that some people might sometimes prefer, but that doesn't mean what he did isn't useful. That having been said, I basically agree with Noah: I think this would be a useful extension (perhaps even in contrib?) but I don't think we need to install it by default. It's useful, but it's also narrow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce maximum error in tuples estimation after vacuum.
On Thu, Jun 27, 2013 at 7:27 AM, Amit Kapila wrote: > Now I can look into it further, I have still not gone through in detail > about your new approach to calculate the reltuples, but I am wondering > whether there can be anyway with which estimates can be improved with > different calculation in vac_estimate_reltuples(). I think this is getting at the threshold question for this patch, which is whether it's really making things better or just moving the problems around. I mean, I have no problem accepting that the new algorithm is (1) reasonably cheap and (2) better in some cases. But if it's worse in other cases, which AFAICS hasn't been discussed, then it's no good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
On 6/27/13 10:57 AM, Tom Lane wrote: > Peter Eisentraut writes: >> On 6/26/13 12:17 PM, Tom Lane wrote: >>> (I like to >>> point at mysql's regression tests, which take well over an hour even on >>> fast machines. If lots of tests are so helpful, why is their bug rate >>> no better than ours?) > >> Tests are not (primarily) there to prevent bugs. > > Really? Pray tell, what do you think they're for? Particularly code > coverage tests, which is what these are? Tests document the existing functionality of the code, to facilitate refactoring. (paraphrased, Uncle Bob) Case in point, the existing ALTER DDL code could arguably use even more refactoring. But no one wants to do it because it's unclear what will break. With the proposed set of tests (which I have not read to completion), this could become quite a bit easier, both for the coder and the reviewer. But these tests probably won't detect, say, locking bugs in such new code. That can only be prevented by careful code review and a clean architecture. Perhaps, MySQL has neither. ;-) Code coverage is not an end itself, it's a tool. In this sense, tests prevent existing functionality being broken, which might be classified as a bug. But it's wrong to infer that because system X has a lot of bugs and a lot of tests, having a lot of tests must be useless. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation/help for materialized and recursive views
On Thu, Jun 27, 2013 at 5:29 AM, Magnus Hagander wrote: >> The functionality of materialized views will (over time) totally swamp >> that of normal views, so mixing all the corresponding documentation >> with the documentation for normal views probably doesn’t make things >> easier for people that are only interested in normal views. > > That's a better point I think. That said, it would be very useful if > it actually showed up in "\h CREATE VIEW" in psql - I wonder if we > should just add the syntax to that page, and then link said future > information on a separate page somehow? IMHO, it's better to keep them separate; they are very different beasts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
On Thu, Jun 27, 2013 at 3:56 AM, Atri Sharma wrote: > When we do a commit, WAL buffers are written to the disk. This has a > disk latency for the required I/O. Check. > Now, with group commits, do we see a spike in that disk write latency, > especially in the cases where the user has set wal_buffers to a high > value? Well, it does take longer to fsync a larger byte range to disk than a smaller byte range, in some cases. But it's generally more efficient to write one larger range than many smaller ranges, so you come out ahead on the whole. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
On Wed, Jun 26, 2013 at 7:16 AM, Fabien COELHO wrote: > Here is a v4 that takes into account most of your points: The report is > performed for all threads by thread 0, however --progress is not supported > under thread fork emulation if there are more than one thread. The report > time does not slip anymore. I don't believe that to be an acceptable restriction. We generally require features to work on all platforms we support. We have made occasional compromises there, but generally only when the restriction is fundamental to the platform rather than for developer convenience. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error code returned by lock_timeout
2013-06-27 17:03 keltezéssel, Fujii Masao írta: On Thu, Jun 27, 2013 at 2:22 PM, Boszormenyi Zoltan wrote: Hi, I just realized that in the original incarnation of lock_timeout, I used ERRCODE_LOCK_NOT_AVAILABLE (to be consistent with NOWAIT) but the patch that was accepted into 9.3 contained ERRCODE_QUERY_CANCELED which is the same as for statement_timeout. Which would be more correct? ISTM that ERRCODE_LOCK_NOT_AVAILABLE is more suitable... I think, too. Can someone commit this one-liner to master and 9.3? Best regards, Zoltán Bsöszörményi -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ --- src/backend/tcop/postgres.c~ 2013-06-27 07:05:08.0 +0200 +++ src/backend/tcop/postgres.c 2013-06-27 17:10:28.040972642 +0200 @@ -2900,7 +2900,7 @@ DisableNotifyInterrupt(); DisableCatchupInterrupt(); ereport(ERROR, - (errcode(ERRCODE_QUERY_CANCELED), + (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("canceling statement due to lock timeout"))); } if (get_timeout_indicator(STATEMENT_TIMEOUT, true)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
On Tue, Jun 25, 2013 at 4:28 PM, Heikki Linnakangas wrote: >> The only feedback we have on how bad things are is how long it took >> the last fsync to complete, so I actually think that's a much better >> way to go than any fixed sleep - which will often be unnecessarily >> long on a well-behaved system, and which will often be far too short >> on one that's having trouble. I'm inclined to think think Kondo-san >> has got it right. > > Quite possible, I really don't know. I'm inclined to first try the simplest > thing possible, and only make it more complicated if that's not good enough. > Kondo-san's patch wasn't very complicated, but nevertheless a fixed sleep > between every fsync, unless you're behind the schedule, is even simpler. I'm pretty sure Greg Smith tried it the fixed-sleep thing before and it didn't work that well. I have also tried it and the resulting behavior was unimpressive. It makes checkpoints take a long time to complete even when there's very little data to flush out to the OS, which is annoying; and when things actually do get ugly, the sleeps aren't long enough to matter. See the timings Kondo-san posted downthread: 100ms delays aren't going let the system recover in any useful way when the fsync can take 13 s for one file. On a system that's badly weighed down by I/O, the fsync times are often *extremely* long - 13 s is far from the worst you can see. You have to give the system a meaningful time to recover from that, allowing other processes to make meaningful progress before you hit it again, or system performance just goes down the tubes. Greg's test, IIRC, used 3 s sleeps rather than your proposal of 100 ms, but it still wasn't enough. > In > particular, it's easier to tie that into the checkpoint scheduler - I'm not > sure how you'd measure progress or determine how long to sleep unless you > assume that every fsync is the same. I think the thing to do is assume that the fsync phase will take 10% or so of the total checkpoint time, but then be prepared to let the checkpoint run a bit longer if the fsyncs end up being slow. As Greg has pointed out during prior discussions of this, the normal scenario when things get bad here is that there is no way in hell you're going to fit the checkpoint into the originally planned time. Once all of the write caches between PostgreSQL and the spinning rust are full, the system is in trouble and things are going to suck. The hope is that we can stop beating the horse while it is merely in intensive care rather than continuing until the corpse is fully skeletized. Fixed delays don't work because - to push an already-overdone metaphor a bit further - we have no idea how much of a beating the horse can take; we need something adaptive so that we respond to what actually happens rather than making predictions that will almost certainly be wrong a large fraction of the time. To put this another way, when we start the fsync() phase, it often consumes 100% of the available I/O on the machine, completing starving every other process that might need any. This is certainly a deficiency in the Linux I/O scheduler, but as they seem in no hurry to fix it we'll have to cope with it as best we can. If you do the fsyncs in fast succession (and 100ms separation might as well be no separation at all), then the I/O starvation of the entire system persists through the entire fsync phase. If, on the other hand, you sleep for the same amount of time the previous fsync took, then on the average, 50% of the machine's I/O capacity will be available for all other system activity throughout the fsync phase, rather than 0%. Now, unfortunately, this is still not that good, because it's often the case that all of the fsyncs except one are reasonably fast, and there's one monster one that is very slow. ext3 has a known bad behavior that dumps all dirty data for the entire *filesystem* when you fsync, which tends to create these kinds of effects. But even on better-behaved filesystem, like ext4, it's fairly common to have one fsync that is painfully longer than all the others. So even with this patch, there are still going to be cases where the whole system becomes unresponsive. I don't see any way to to do better without a better kernel API, or a better I/O scheduler, but that doesn't mean we shouldn't do at least this much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 doc fix] ECPG VAR command does not describe the actual specification
> Looking around the 9.3 doc, I found a small, but not-insignificant > error in the documentation. Thanks for finding and fixing. Patch committed. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error code returned by lock_timeout
On Thu, Jun 27, 2013 at 2:22 PM, Boszormenyi Zoltan wrote: > Hi, > > I just realized that in the original incarnation of lock_timeout, > I used ERRCODE_LOCK_NOT_AVAILABLE (to be consistent with NOWAIT) > but the patch that was accepted into 9.3 contained ERRCODE_QUERY_CANCELED > which is the same as for statement_timeout. > > Which would be more correct? ISTM that ERRCODE_LOCK_NOT_AVAILABLE is more suitable... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
Peter Eisentraut writes: > On 6/26/13 12:17 PM, Tom Lane wrote: >> (I like to >> point at mysql's regression tests, which take well over an hour even on >> fast machines. If lots of tests are so helpful, why is their bug rate >> no better than ours?) > Tests are not (primarily) there to prevent bugs. Really? Pray tell, what do you think they're for? Particularly code coverage tests, which is what these are? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
Andres Freund wrote: > I don't think I actually found any workload where the bgwriter > actually wroute out a relevant percentage of the necessary pages. I had one at Wisconsin Courts. The database which we targeted with logical replication from the 72 circuit court databases (plus a few others) on six database connection pool with about 20 to (at peaks) hundreds of transactions per second modifying the database (the average transaction involving about 20 modifying statements with potentially hundreds of affected rows), with maybe 2000 to 3000 queries per second on a 30 connection pool, wrote about one-third each of the dirty buffers with checkpoints, background writer, and backends needing to read a page. I shared my numbers with Greg, who I believe used them as one of his examples for how to tune memory, checkpoints, and background writer, so you might want to check with him if you want more detail. Of course, we set bgwriter_lru_maxpages = 1000 and bgwriter_lru_multiplier = 4, and kept shared_buffers to 2GB to hit that. Without the reduced shared_buffers and more aggressive bgwriter we hit the problem with writes overwhelming the RAID controller's cache and causing everything in the database to "freeze" until it cleared some cache space. I'm not saying this invalidates your general argument; just that such cases do exist. Hopefully this data point is useful. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
On 6/26/13 12:17 PM, Tom Lane wrote: > (I like to > point at mysql's regression tests, which take well over an hour even on > fast machines. If lots of tests are so helpful, why is their bug rate > no better than ours?) Tests are not (primarily) there to prevent bugs. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
On 6/27/13 10:20 AM, Robert Haas wrote: > So I'd like to endorse Josh's idea: subject to appropriate review, > let's add these test cases. Then, if it really turns out to be too > burdensome, we can take them out, or figure out a sensible way to > split the suite. Pushing all of Robins work into a secondary suite, > or throwing it out altogether, both seem to me to be things which will > not be to the long-term benefit of the project. I agree. If it gets too big, let's deal with it then. But let's not make things more complicated because they *might* get too big later. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for CREATE OPERATOR
Robins Tharakan writes: > 2. Should I assume that all database objects that get created, need to be > dropped explicitly? Or is this point specifically about ROLES? It's about any global objects (that wouldn't get dropped by dropping the regression database). As far as local objects go, there are benefits to leaving them around, particularly if they present interesting test cases for pg_dump/pg_restore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 6/23/13 10:50 PM, Tom Lane wrote: > It'd sure be interesting to know what the SQL committee's target parsing > algorithm is. It's whatever Oracle and IBM implement. > Or maybe they really don't give a damn about breaking > applications every time they invent a new reserved word? Well, yes, I think that policy was built into the language at the very beginning. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan wrote: > I'd like to see prizes each release for "best contribution" and "best > reviewer" - I've thought for years something like this would be worth > trying. Committers and core members should not be eligible - this is about > encouraging new people. Encouraging new people is good, but recognizing sustained, long-term contributions is good, too. I think we should do more of that, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
On Thu, Jun 27, 2013 at 10:24 AM, Fujii Masao wrote: > On Thu, Jun 27, 2013 at 10:02 PM, Robert Haas wrote: >> On Tue, Jun 25, 2013 at 3:09 PM, Fabien COELHO wrote: I think --quiet-log should be spelled --quiet. >>> >>> ISTM that --quiet usually means "not verbose on stdout", so I added log >>> because this was specific to the log output, and that there may be a need >>> for a --quiet option for stdout at some time. >> >> The output that is quieted by -q is not the log output produced by >> --log; it's the regular progress output on stdout/stderr. >> >> So I changed that, and committed this, with some further cosmetic >> changes. I made the formatting of the help message more like psql's >> help message, including adjusting pgbench to start the description of >> each option in the same column that psql does. This got rid of a lot >> of line breaks and IMHO makes the output of pgbench --help quite a bit >> more readable. I made stylistic adjustments to the documentation >> portion of the patch as well, again to match the markup used for psql. > > In help messages: > > + " -s NUM, --scale NUM scaling factor\n" > > This should be "-s, --scale=NUM" for the sake of consistency with other > options. Woops, missed that one. Fixed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Wed, Jun 26, 2013 at 5:49 PM, Jeff Janes wrote: > On Tue, Jun 18, 2013 at 12:09 AM, Heikki Linnakangas > wrote: >> Jeff's patch seems to somewhat alleviate the huge fall in performance I'm >> otherwise seeing without the nonlocked-test patch. With the nonlocked-test >> patch, if you squint you can see a miniscule benefit. >> >> I wasn't expecting much of a gain from this, just wanted to verify that >> it's not making things worse. So looks good to me. > > Hi Heikki, > > Thanks for trying out the patch. > > I see in the commitfest app it is set to "Waiting on Author" (but I don't > know who "maiku41" is). > > Based on the comments so far, I don't know what I should be doing on it at > the moment, and I thought perhaps your comment above meant it should be > "ready for committer". > > If we think the patch has a risk of introducing subtle errors, then it > probably can't be justified based on the small performance gains you saw. > > But if we think it has little risk, then I think it is justified simply > based on cleaner code, and less confusion for people who are tracing a > problematic process. If we want it to do a random escalation, it should > probably look like a random escalation to the interested observer. I think it has little risk. If it turns out to be worse for performance, we can always revert it, but I expect it'll be better or a wash, and the results so far seem to bear that out. Another interesting question is whether we should fool with the actual values for minimum and maximum delays, but that's a separate and much harder question, so I think we should just do this for now and call it good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On 2013-06-27 09:50:32 -0400, Robert Haas wrote: > On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund wrote: > > Contention wise I aggree. What I have seen is that we have a huge > > amount of cacheline bouncing around the buffer header spinlocks. > > How did you measure that? perf record -e cache-misses. If you want it more detailed looking at {L1,LLC}-{load,store}{s,misses} can sometimes be helpful too. Also, running perf stat -vvv postgres -D ... for a whole benchmark can be useful to compare how much a change influences cache misses and such. For very detailed analysis running something under valgrind/cachegrind can be helpful too, but I usually find perf to be sufficient. > > I have previously added some adhoc instrumentation that printed the > > amount of buffers that were required (by other backends) during a > > bgwriter cycle and the amount of buffers that the buffer manager could > > actually write out. > > I think you can see how many are needed from buffers_alloc. No? Not easily correlated with bgwriter activity. If we cannot keep up because it's 100% busy writing out buffers I don't have many problems with that. But I don't think we often are. > > Problems with the current code: > > > > * doesn't manipulate the usage_count and never does anything to used > > pages. Which means it will just about never find a victim buffer in a > > busy database. > > Right. I was thinking that was part of this patch, but it isn't. I > think we should definitely add that. In other words, the background > writer's job should be to run the clock sweep and add buffers to the > free list. We might need to split it into two for that. One process to writeout dirty pages, one to populate the freelist. Otherwise we will probably regularly hit the current scalability issues because we're currently io contended. Say during a busy or even immediate checkpoint. > I think we should also split the lock: a spinlock for the > freelist, and an lwlock for the clock sweep. Yea, thought about that when writing the thing about the exclusive lock during the clocksweep. > > * by far not aggressive enough, touches only a few buffers ahead of the > > clock sweep. > > Check. Fixing this might be a separate patch, but then again maybe > not. The changes we're talking about here provide a natural feedback > mechanism: if we observe that the freelist is empty (or less than some > length, like 32 buffers?) set the background writer's latch, because > we know it's not keeping up. Yes, that makes sense. Also provides adaptability to bursty workloads which means we don't have too complex logic in the bgwriter for that. > > There's another thing we could do to noticeably improve scalability of > > buffer acquiration. Currently we do a huge amount of work under the > > freelist lock. > > ... > > So, we perform the entire clock sweep until we found a single buffer we > > can use inside a *global* lock. At times we need to iterate over the > > whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all > > the usage counts enough (if the database is busy it can take even > > longer...). > > In a busy database where usually all the usagecounts are high the next > > backend will touch a lot of those buffers again which causes massive > > cache eviction & bouncing. > > > > It seems far more sensible to only protect the clock sweep's > > nextVictimBuffer with a spinlock. With some care all the rest can happen > > without any global interlock. > > That's a lot more spinlock acquire/release cycles, but it might work > out to a win anyway. Or it might lead to the system suffering a > horrible spinlock-induced death spiral on eviction-heavy workloads. I can't imagine it to be worse that what we have today. Also, nobody requires us to only advance the clocksweep by one page, we can easily do it say 29 pages at a time or so if we detect the lock is contended. Alternatively it shouldn't be too hard to make it into an atomic increment, although that requires some trickery to handle the wraparound sanely. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers