Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Sat, Jun 7, 2014 at 1:28 AM, Andres Freund wrote: > > On 2014-06-06 15:44:25 -0400, Tom Lane wrote: > > I figured it'd be easy enough to get a better estimate by adding another > > counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively > > assuming that in-progress inserts and deletes will both commit). I did > > that, and found that it helped Tim's test case not at all :-(. A bit of > > sleuthing revealed that HeapTupleSatisfiesVacuum actually returns > > INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of > > whether the transaction has since marked it for deletion: > > > > /* > > * It'd be possible to discern between INSERT/DELETE in progress > > * here by looking at xmax - but that doesn't seem beneficial for > > * the majority of callers and even detrimental for some. We'd > > * rather have callers look at/wait for xmin than xmax. It's > > * always correct to return INSERT_IN_PROGRESS because that's > > * what's happening from the view of other backends. > > */ > > return HEAPTUPLE_INSERT_IN_PROGRESS; > > That's only the case of a couple of days ago. I really wasn't sure > wheter to go that way or discern the two cases. That changed in the wake > of: > http://www.postgresql.org/message-id/20140530143150.GA11051@localhost Won't this change impact the calculation of number of live rows for analyze (acquire_sample_rows() considers the HEAPTUPLE_DELETE_IN_PROGRESS tuples as liverows for tuples updated by transactions other than current transaction)? Even if we think that estimates are okay, the below comment in acquire_same_rows() doesn't seem to suggest it. /* * We count delete-in-progress rows as still live, using * the same reasoning given above; but we don't bother to * include them in the sample. * .. */ With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension
On 09/06/14 14:47, David G Johnston wrote: Ian Barwick wrote Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). This is a widely-used feature, however the implementation has significant performance drawbacks. Currently this feature is implemented in the JDBC driver by appending "RETURNING *" to the supplied statement. However this means all columns of affected rows will be returned to the client, which causes significant performance problems, particularly on wide tables. To mitigate this, it would be desirable to enable the JDBC driver to request only the primary key value(s). Seems like a good idea. ERROR: Relation does not have any primary key(s) "Relation does not have a primary key." or "Relation has no primary key." (preferred) By definition it cannot have more than one so it must have none. Ah yes, amazing what a fresh pair of eyes does :). The plural is the vestige of an earlier iteration which said something about the relation not having any primary key column(s). Will fix, thanks. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension
David G Johnston wrote > > Ian Barwick wrote >> Hi, >> >> The JDBC API provides the getGeneratedKeys() method as a way of >> retrieving >> primary key values without the need to explicitly specify the primary key >> column(s). This is a widely-used feature, however the implementation has >> significant >> performance drawbacks. >> >> Currently this feature is implemented in the JDBC driver by appending >> "RETURNING *" to the supplied statement. However this means all columns >> of >> affected rows will be returned to the client, which causes significant >> performance problems, particularly on wide tables. To mitigate this, it >> would >> be desirable to enable the JDBC driver to request only the primary key >> value(s). > Seems like a good idea. >> ERROR: Relation does not have any primary key(s) > "Relation does not have a primary key." > or > "Relation has no primary key." (preferred) > > By definition it cannot have more than one so it must have none. > > It could have multiple unique constraints but I do not believe they are > considered if not tagged as primary. Also, I did see where you account for auto-updatable views but what about complex views with instead of triggers? These can still be the target of DML queries but are not guaranteed (though can possibly) to return a well-defined primary key. At worse an explicit error about the view itself, not the apparent lack of primary key, should be emitted. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/RETURNING-PRIMARY-KEY-syntax-extension-tp5806462p5806464.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension
Ian Barwick wrote > Hi, > > The JDBC API provides the getGeneratedKeys() method as a way of retrieving > primary key values without the need to explicitly specify the primary key > column(s). This is a widely-used feature, however the implementation has > significant > performance drawbacks. > > Currently this feature is implemented in the JDBC driver by appending > "RETURNING *" to the supplied statement. However this means all columns of > affected rows will be returned to the client, which causes significant > performance problems, particularly on wide tables. To mitigate this, it > would > be desirable to enable the JDBC driver to request only the primary key > value(s). Seems like a good idea. > ERROR: Relation does not have any primary key(s) "Relation does not have a primary key." or "Relation has no primary key." (preferred) By definition it cannot have more than one so it must have none. It could have multiple unique constraints but I do not believe they are considered if not tagged as primary. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/RETURNING-PRIMARY-KEY-syntax-extension-tp5806462p5806463.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "RETURNING PRIMARY KEY" syntax extension
Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). This is a widely-used feature, however the implementation has significant performance drawbacks. Currently this feature is implemented in the JDBC driver by appending "RETURNING *" to the supplied statement. However this means all columns of affected rows will be returned to the client, which causes significant performance problems, particularly on wide tables. To mitigate this, it would be desirable to enable the JDBC driver to request only the primary key value(s). One possible solution would be to have the driver request the primary key for a table, but this could cause a race condition where the primary key could change, and even if it does not, it would entail extra overhead. A more elegant and universal solution, which would allow the JDBC driver to request the primary key in a single request, would be to extend the RETURNING clause syntax with the option PRIMARY KEY. This resolves during parse analysis into the columns of the primary key, which can be done unambiguously because the table is already locked by that point and the primary key cannot change. A patch is attached which implements this, and will be added to the next commitfest. A separate patch will be submitted to the JDBC project. Example usage shown below. Regards Ian Barwick /* -- */ postgres=# CREATE TABLE foo (id SERIAL PRIMARY KEY); CREATE TABLE postgres=# INSERT INTO foo VALUES(DEFAULT) RETURNING PRIMARY KEY; id 1 (1 row) INSERT 0 1 postgres=# CREATE TABLE bar (id1 INT NOT NULL, id2 INT NOT NULL, PRIMARY KEY(id1, id2)); CREATE TABLE postgres=# INSERT INTO bar VALUES(1,2) RETURNING PRIMARY KEY; id1 | id2 -+- 1 | 2 (1 row) INSERT 0 1 postgres=# INSERT INTO bar VALUES(2,1),(2,2) RETURNING PRIMARY KEY; id1 | id2 -+- 2 | 1 2 | 2 (2 rows) INSERT 0 2 postgres=# CREATE TABLE no_pkey (id SERIAL NOT NULL); CREATE TABLE postgres=# INSERT INTO no_pkey VALUES(DEFAULT) RETURNING id; id 1 (1 row) INSERT 0 1 postgres=# INSERT INTO no_pkey VALUES(DEFAULT) RETURNING PRIMARY KEY; ERROR: Relation does not have any primary key(s) /* -- */ -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml index 74ea907..45295d1 100644 --- a/doc/src/sgml/ref/delete.sgml +++ b/doc/src/sgml/ref/delete.sgml @@ -25,7 +25,7 @@ PostgreSQL documentation DELETE FROM [ ONLY ] table_name [ * ] [ [ AS ] alias ] [ USING using_list ] [ WHERE condition | WHERE CURRENT OF cursor_name ] -[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] +[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] | PRIMARY KEY ] @@ -182,6 +182,17 @@ DELETE FROM [ ONLY ] table_name [ * + + +PRIMARY KEY + + + Returns the table's primary key column(s) after each row is deleted. + Cannot be combined with an output_expression. + + + + @@ -208,7 +219,9 @@ DELETE count clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) deleted by the - command. + command. PRIMARY KEY can be specified to return the + primary key value(s) for each deleted row. An error will be raised + if the table does not have a primary key. diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml index a3cccb9..9fbd859 100644 --- a/doc/src/sgml/ref/insert.sgml +++ b/doc/src/sgml/ref/insert.sgml @@ -24,7 +24,7 @@ PostgreSQL documentation [ WITH [ RECURSIVE ] with_query [, ...] ] INSERT INTO table_name [ ( column_name [, ...] ) ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query } -[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] +[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] | PRIMARY KEY ] @@ -65,7 +65,9 @@ INSERT INTO table_name [ ( RETURNING list is identical to that of the output list - of SELECT. + of SELECT. Alternatively, PRIMARY KEY will + return the primary key value(s) for each inserted row. An error will + be raised if the table does not have a primary key. @@ -186,6 +188,17 @@ INSERT INTO table_name [ ( + + +PRIMARY KEY + + + Returns the table's primary key column(s) after each row is inserted. + Cannot be combined with an output_expression. + + + + diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
Re: [HACKERS] Scaling shared buffer eviction
On Sun, Jun 8, 2014 at 7:21 PM, Kevin Grittner wrote: > Amit Kapila wrote: > > > I have improved the patch by making following changes: > > a. Improved the bgwriter logic to log for xl_running_xacts info and > >removed the hibernate logic as bgwriter will now work only when > >there is scarcity of buffer's in free list. Basic idea is when the > >number of buffers on freelist drops below the low threshold, the > >allocating backend sets the latch and bgwriter wakesup and begin > >adding buffer's to freelist until it reaches high threshold and then > >again goes back to sleep. > > The numbers from your benchmarks are very exciting, but the above > concerns me. My tuning of the bgwriter in production has generally > *not* been aimed at keeping pages on the freelist, but toward > preventing shared_buffers from accumulating a lot of dirty pages, > which were leading to cascades of writes between caches and thus to > write stalls. By pushing dirty pages into the (*much* larger) OS > cache, and letting write combining happen there, where the OS could > pace based on the total number of dirty pages instead of having > some hidden and appearing rather suddenly, latency spikes were > avoided while not causing any noticeable increase in the number of > OS writes to the RAID controller's cache. > > Essentially I was able to tune the bgwriter so that a dirty page > was always push out to the OS cache within three seconds, which led > to a healthy balance of writes between the checkpoint process and > the bgwriter. I think it would have been better if bgwriter does writes based on the amount of buffer's that get dirtied to achieve the balance of writes. > Backend processes related to user connections still > performed about 30% of the writes, and this work shows promise > toward bringing that down, which would be great; but please don't > eliminate the ability to prevent write stalls in the process. I agree that for some cases as explained by you, the current bgwriter logic does satisfy the need, however there are other cases as well where actually it doesn't help much, one of such cases I am trying to improve (ease backend buffer allocations) and other may be when there is constant write activity for which I am not sure how much it really helps. Part of the reason for trying to make bgwriter respond mainly to ease backend allocations is the previous discussion for the same, refer below link: http://www.postgresql.org/message-id/ca+tgmoz7dvhc4h-ffjmzcff6vwynfoeapz021vxw61uh46r...@mail.gmail.com However if we want to retain current property of bgwriter, we can do the same by one of below ways: a. Have separate processes for writing dirty buffers and moving buffers to freelist. b. In the current bgwriter, separate the two works based on the need. The need can be decided based on whether bgwriter has been waked due to shortage of buffers on free list or if it has been waked due to BgWriterDelay. Now as populating freelist and balance writes by writing dirty buffers are two separate responsibilities, so not sure if doing that by one process is a good idea. I am planing to take some more performance data, part of which will be write load as well, but I am now sure if that can anyway show the need as mentioned by you. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] wrapping in extended mode doesn't work well with default pager
On Fri, May 23, 2014 at 10:10:23AM -0400, Alvaro Herrera wrote: > Sergey Muraviov wrote: > > I found some new bugs and fix them. > > And I had to make many changes. > > This version fixes some bugs I had noticed in expanded mode too. For > instance, > the original looked like this (five lines plus header): > > -[ RECORD 49 > ]-+- > pg_identify_object | (rule,,,"""_RETURN"" on > pg_catalog.pg_available_extension_versions") > > pg_identify_object | > (view,pg_catalog,pg_available_extension_versions,pg_catalog.pg_availabl > e. > |._extension_versions) > > > > whereas it's correctly only three lines plus header with this patch > applied. I had noticed a similar-looking behavior change with aligned format and expanded display, so I gave psql-wrapped-expanded-fix-v4.patch a spin with this test case: \pset expanded on \pset format aligned select repeat('a', 2) union all select repeat('a', 500); The patch did not restore 9.3 behavior for that one. Starting with commit 6513633, the first line of letters is space-padded on the right to the width of the second line of letters. To illustrate, I have attached raw psql output from both commit 6513633 and its predecessor. Also note that psql-wrapped-expanded-fix-v4.patch expands each [ RECORD x ] header from 509 bytes to 511 bytes; 509 is the longstanding width. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com Expanded display (expanded) is on. Output format (format) is aligned. -[ RECORD 1 ] repeat | aa -[ RECORD 2 ] repeat | Expanded display (expanded) is on. Output format (format) is aligned. -[ RECORD 1 ] repeat | aa -[ RECORD 2 ] repeat | aaa
Re: [HACKERS] updated emacs configuration
On Wed, Aug 07, 2013 at 07:57:53AM -0400, Peter Eisentraut wrote: > On 7/2/13 8:42 PM, Peter Eisentraut wrote: > > Updated files with changes: > > > > - adjusted fill-column to 78, per Noah > > - added c-file-style, per Andrew > > - support both "postgresql" and "postgres" directory names > > - use defun instead of lambda, per Dimitri > > - put Perl configuration back into emacs.samples, for Tom > > > > I also added configuration of c-auto-align-backslashes as well as label > > and statement-case-open to c-offsets-alist. With those changes, the > > result of indent-region is now very very close to pgindent, with the > > main exception of the end-of-line de-indenting that pgindent does, which > > nobody likes anyway. > > Did anyone have any outstanding concerns about this latest version? I > thought it looked ready to commit. After upgrading to GNU Emacs 23.4.1 from a version predating directory-local variables, I saw switch/case indentation go on the fritz. My hooks were issuing (c-set-style "postgresql"), but ".dir-locals.el" set it back to plain "bsd" style. The most-reasonable fix I found was to locally add c-file-style to ignored-local-variables. c-file-style is the only setting appearing in both emacs.samples and .dir-locals.el with non-identical values, so it alone calls for this treatment. Behavior looks sane in both Emacs 21.4.1 and Emacs 24.3.1, the version extrema I have at hand. Emacs 24.3.1 is fine without this due to recent c-before-hack-hook changes, but this does no harm. Shall I add this workaround to emacs.samples? -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/src/tools/editors/emacs.samples b/src/tools/editors/emacs.samples index e469d55..a510afd 100644 --- a/src/tools/editors/emacs.samples +++ b/src/tools/editors/emacs.samples @@ -30,7 +30,12 @@ (add-hook 'c-mode-hook (defun postgresql-c-mode-hook () (when (string-match "/postgres\\(ql\\)?/" buffer-file-name) - (c-set-style "postgresql" + (c-set-style "postgresql") + ;; Don't override the style we just set with the style in + ;; `dir-locals-file'. Emacs 23.4.1 needs this; it is obsolete, + ;; albeit harmless, by Emacs 24.3.1. + (set (make-local-variable 'ignored-local-variables) + (append '(c-file-style) ignored-local-variables) ;;; Perl files -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backup_label revisited
On Wed, Jun 04, 2014 at 06:17:29PM +0100, Greg Stark wrote: > On Mon, Jun 2, 2014 at 2:10 PM, Fujii Masao wrote: > > What about the case where we restore the backup to another server and > > start the recovery? In this case, ISTM inode can be the same. No? > > Hm, I was about to comment that that seems very unlikely. Even on the > same server if you delete the old database root and then unpack a > backup it could possibly reuse the exact same inode again. But it's > really not likely to happen. > > However in the brave new world of filesystem snapshots there is the > possibility someone has "restored" a backup by opening one of their > snapshots read-write. In which case the backup-label would have the > same inode number. That would still be fine if the snapshot is > consistent but if they have tablespaces and those tablespaces were > snapshotted separately then it wouldn't be ok. > > I think that's a fatal flaw unless anyone can see a way to distinguish > a filesystem from a snapshot of the filesystem. Implementations of the "dump" program likely have that property of preserving inode metadata while not promising a consistent snapshot. If we keep such backup methods fully supported, I agree with your conclusion. PostgreSQL can't, in general, distinguish an almost-snapshot from a master that crashed during a backup. But the administrator can track the difference. If you use pg_start_backup(), your init.d script that gets control after a crash can have 'rm -f "$PGDATA"/backup_label'. Use a different script to recover hot backups. Perhaps what ought to change is the documentation and contrib/start-scripts? Maybe even add a --definitely-not-a-backup postmaster option, so scripts need not hardcode knowledge of a semi-internal fact like the name of the file to delete. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MinGW/Cygwin build snags
On Sun, Jun 08, 2014 at 06:04:46PM -0400, Tom Lane wrote: > Noah Misch writes: > > Since src/makefiles/Makefile.win32 does not set BUILDING_DLL for src/port, > > PGDLLIMPORT is set improperly for code to be linked directly into the > > backend. > > Makefile.win32 does set BUILDING_DLL for src/common. (Makefile.cygwin has > > the > > same discrepancy, though I haven't checked whether it causes an actual build > > failure there. The MSVC build system sets BUILDING_DLL for both src/port > > and > > src/common files.) This affects any reference to a data symbol from > > src/port. > > The fix is straightforward enough: cause Makefile.{win32,cygwin} to treat > > src/port like src/common. > > I wonder whether these cases shouldn't distinguish between the "frontend" > and "backend" builds of src/port/ and src/common/. In particular, it > seems odd that we're getting this type of failure in the backend build. Good question; they need not distinguish. !BUILDING_DLL means that the code being compiled will access backend symbols through dynamic linking to postgres.exe. Server modules, such as plpgsql, build with !BUILDING_DLL, and normal backend code builds with BUILDING_DLL. The setting is irrelevant for "frontend"/non-server code, which does not link to backend symbols at all. > > Second, src/template/{win32,cygwin} completely replaces LDFLAGS, so > > overriding > > LDFLAGS on the "configure" command line is ineffective. Those scripts > > should > > instead add to the existing LDFLAGS, like other templates do for CPPFLAGS. > > Several other templates completely override CFLAGS; that's undesirable for > > the > > same reason. I don't have ready access to those affected configurations, so > > I'm leaving the CFLAGS damage alone. > > +1 for doing something about CFLAGS while we're at it. Scratch that; configure.in has logic to discard template script CFLAGS changes if the user had specified CFLAGS. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] NUMA packaging and patch
I ran into a situation where a machine with 4 NUMA memory nodes and 40 cores had performance problems due to NUMA. The problems were worst right after they rebooted the OS and warmed the cache by running a script of queries to read all tables. These were all run on a single connection. As it turned out, the size of the database was just over one-quarter of the size of RAM, and with default NUMA policies both the OS cache for the database and the PostgreSQL shared memory allocation were placed on a single NUMA segment, so access to the CPU package managing that segment became a bottleneck. On top of that, processes which happened to run on the CPU package which had all the cached data had to allocate memory for local use on more distant memory because there was none left in the more local memory. Through normal operations, things eventually tended to shift around and get better (after several hours of heavy use with substandard performance). I ran some benchmarks and found that even in long-running tests, spreading these allocations among the memory segments showed about a 2% benefit in a read-only load. The biggest difference I saw in a long-running read-write load was about a 20% hit for unbalanced allocations, but I only saw that once. I talked to someone at PGCon who managed to engineer much worse performance hits for an unbalanced load, although the circumstances were fairly artificial. Still, fixing this seems like something worth doing if further benchmarks confirm benefits at this level. By default, the OS cache and buffers are allocated in the memory node with the shortest "distance" from the CPU a process is running on. This is determined by a the "cpuset" associated with the process which reads or writes the disk page. Typically a NUMA machine starts with a single cpuset with a policy specifying this behavior. Fixing this aspect of things seems like an issue for packagers, although we should probably document it for those running from their own source builds. To set an alternate policy for PostgreSQL, you first need to find or create the location for cpuset specification, which uses a filesystem in a way similar to the /proc directory. On a machine with more than one memory node, the appropriate filesystem is probably already mounted, although different distributions use different filesystem names and mount locations. I will illustrate the process on my Ubuntu machine. Even though it has only one memory node (and so, this makes no difference), I have it handy at the moment to confirm the commands as I put them into the email. # Sysadmin must create the root cpuset if not already done. (On a # system with NUMA memory, this will probably already be mounted.) # Location and options can vary by distro. sudo sudo mkdir /dev/cpuset sudo mount -t cpuset none /dev/cpuset # Sysadmin must create a cpuset for postgres and configure # resources. This will normally be all cores and all RAM. This is # where we specify that this cpuset will spread pages among its # memory nodes. sudo mkdir /dev/cpuset/postgres sudo /bin/bash -c "echo 0-3 >/dev/cpuset/postgres/cpus" sudo /bin/bash -c "echo 0 >/dev/cpuset/postgres/mems" sudo /bin/bash -c "echo 1 >/dev/cpuset/postgres/memory_spread_page" # Sysadmin must grant permissions to the desired setting(s). # This could be by user or group. sudo chown postgres /dev/cpuset/postgres/tasks # The pid of postmaster or an ancestor process must be written to # the tasks "file" of the cpuset. This can be a shell from which # pg_ctl is run, at least for bash shells. It could also be # written by the postmaster itself, essentially as an extra pid # file. Possible snippet from a service script: echo $$ >/dev/cpuset/postgres/tasks pg_ctl start ... Where the OS cache is larger than shared_buffers, the above is probably more important than the attached patch, which causes the main shared memory segment to be spread among all available memory nodes. This patch only compiles in the relevant code if configure is run using the --with-libnuma option, in which case a dependency on the numa library is created. It is v3 to avoid confusion with earlier versions I have shared with a few people off-list. (The only difference from v2 is fixing bitrot.) I'll add it to the next CF. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/configure b/configure index ed1ff0a..79a0dea 100755 --- a/configure +++ b/configure @@ -702,6 +702,7 @@ EGREP GREP with_zlib with_system_tzdata +with_libnuma with_libxslt with_libxml XML2_CONFIG @@ -831,6 +832,7 @@ with_uuid with_ossp_uuid with_libxml with_libxslt +with_libnuma with_system_tzdata with_zlib with_gnu_ld @@ -1518,6 +1520,7 @@ Optional Packages: --with-ossp-uuidobsolete spelling of --with-uuid=ossp --with-libxml build with XML support --with-libxslt use XSLT support when building contrib/xml2 + --with-libnuma use libnuma f
Re: [HACKERS] MinGW/Cygwin build snags
Noah Misch writes: > First, when I tried to add an Assert() call to a file in src/port, a MinGW-w64 > build failed like this: > Creating library file: libpostgres.a > ../../src/port/libpgport_srv.a(fls_srv.o): In function `fls': > /home/nm/src/pg/mingw-postgresql/src/port/fls.c:63: undefined reference to > `__imp_assert_enabled' FWIW, I think we had consensus to remove the assert_enabled variable entirely. Not that it wouldn't be good to fix this, but Assert per se won't need it after we do that. > Since src/makefiles/Makefile.win32 does not set BUILDING_DLL for src/port, > PGDLLIMPORT is set improperly for code to be linked directly into the backend. > Makefile.win32 does set BUILDING_DLL for src/common. (Makefile.cygwin has the > same discrepancy, though I haven't checked whether it causes an actual build > failure there. The MSVC build system sets BUILDING_DLL for both src/port and > src/common files.) This affects any reference to a data symbol from src/port. > The fix is straightforward enough: cause Makefile.{win32,cygwin} to treat > src/port like src/common. I wonder whether these cases shouldn't distinguish between the "frontend" and "backend" builds of src/port/ and src/common/. In particular, it seems odd that we're getting this type of failure in the backend build. > Second, src/template/{win32,cygwin} completely replaces LDFLAGS, so overriding > LDFLAGS on the "configure" command line is ineffective. Those scripts should > instead add to the existing LDFLAGS, like other templates do for CPPFLAGS. > Several other templates completely override CFLAGS; that's undesirable for the > same reason. I don't have ready access to those affected configurations, so > I'm leaving the CFLAGS damage alone. +1 for doing something about CFLAGS while we're at it. > Both of these changes fix bugs, but I plan not to back-patch. Agreed; the lack of complaints to date suggests that we should leave well enough alone in the back branches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] MinGW/Cygwin build snags
First, when I tried to add an Assert() call to a file in src/port, a MinGW-w64 build failed like this: Creating library file: libpostgres.a ../../src/port/libpgport_srv.a(fls_srv.o): In function `fls': /home/nm/src/pg/mingw-postgresql/src/port/fls.c:63: undefined reference to `__imp_assert_enabled' Since src/makefiles/Makefile.win32 does not set BUILDING_DLL for src/port, PGDLLIMPORT is set improperly for code to be linked directly into the backend. Makefile.win32 does set BUILDING_DLL for src/common. (Makefile.cygwin has the same discrepancy, though I haven't checked whether it causes an actual build failure there. The MSVC build system sets BUILDING_DLL for both src/port and src/common files.) This affects any reference to a data symbol from src/port. The fix is straightforward enough: cause Makefile.{win32,cygwin} to treat src/port like src/common. Second, src/template/{win32,cygwin} completely replaces LDFLAGS, so overriding LDFLAGS on the "configure" command line is ineffective. Those scripts should instead add to the existing LDFLAGS, like other templates do for CPPFLAGS. Several other templates completely override CFLAGS; that's undesirable for the same reason. I don't have ready access to those affected configurations, so I'm leaving the CFLAGS damage alone. A fix for the Makefile.win32 half of the original problem appeared as part of a larger patch: http://www.postgresql.org/message-id/flat/86sk8845pl@ds4.des.no Both of these changes fix bugs, but I plan not to back-patch. Builds that work today won't see any change, and I found no other user reports. -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/src/makefiles/Makefile.cygwin b/src/makefiles/Makefile.cygwin index bd83e5f..bb2efed 100644 --- a/src/makefiles/Makefile.cygwin +++ b/src/makefiles/Makefile.cygwin @@ -28,6 +28,10 @@ ifneq (,$(findstring src/common,$(subdir))) override CPPFLAGS+= -DBUILDING_DLL endif +ifneq (,$(findstring src/port,$(subdir))) +override CPPFLAGS+= -DBUILDING_DLL +endif + ifneq (,$(findstring timezone,$(subdir))) override CPPFLAGS+= -DBUILDING_DLL endif diff --git a/src/makefiles/Makefile.win32 b/src/makefiles/Makefile.win32 index b18621b..9cb84f2 100644 --- a/src/makefiles/Makefile.win32 +++ b/src/makefiles/Makefile.win32 @@ -27,6 +27,10 @@ ifneq (,$(findstring src/common,$(subdir))) override CPPFLAGS+= -DBUILDING_DLL endif +ifneq (,$(findstring src/port,$(subdir))) +override CPPFLAGS+= -DBUILDING_DLL +endif + ifneq (,$(findstring timezone,$(subdir))) override CPPFLAGS+= -DBUILDING_DLL endif diff --git a/src/template/cygwin b/src/template/cygwin index e484fe6..b6ea0de 100644 --- a/src/template/cygwin +++ b/src/template/cygwin @@ -6,4 +6,4 @@ SRCH_LIB="/usr/local/lib" # pg_toupper() etc. in both libpq and pgport # we'd prefer to use --disable-auto-import to match MSVC linking behavior, # but support for it in Cygwin is too haphazard -LDFLAGS="-Wl,--allow-multiple-definition -Wl,--enable-auto-import" +LDFLAGS="$LDFLAGS -Wl,--allow-multiple-definition -Wl,--enable-auto-import" diff --git a/src/template/win32 b/src/template/win32 index dc5b77e..7da9719 100644 --- a/src/template/win32 +++ b/src/template/win32 @@ -3,4 +3,4 @@ # --allow-multiple-definition is required to link pg_dump because it finds # pg_toupper() etc. in both libpq and pgport # --disable-auto-import is to ensure we get MSVC-like linking behavior -LDFLAGS="-Wl,--allow-multiple-definition -Wl,--disable-auto-import" +LDFLAGS="$LDFLAGS -Wl,--allow-multiple-definition -Wl,--disable-auto-import" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure does not check for bison or flex
Eric L wrote > I am installing postgresql from source on 64 bit Ubuntu 14.04 and when I > run the ./configure script, it is successful, but when I run make it fails > with an error: > > "ERROR: `flex' is missing on your system. It is needed to create the file > `bootscanner.c'. You can either get flex from a GNU mirror site or > download an official distribution of PostgreSQL, which contains > pre-packaged flex output" > I get this same error for "bison". I can install it with "sudo apt-get > install bison" and "sudo apt-get install flex". However shouldn't the > ./configure script be checking for these first? > Eric Google: postgresql bison http://postgresql.1045698.n5.nabble.com/bison-flex-and-configure-td5789215.html David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/configure-does-not-check-for-bison-or-flex-tp5806447p5806449.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] configure does not check for bison or flex
I am installing postgresql from source on 64 bit Ubuntu 14.04 and when I run the ./configure script, it is successful, but when I run make it fails with an error: "ERROR: `flex' is missing on your system. It is needed to create the file `bootscanner.c'. You can either get flex from a GNU mirror site or download an official distribution of PostgreSQL, which contains pre-packaged flex output" I get this same error for "bison". I can install it with "sudo apt-get install bison" and "sudo apt-get install flex". However shouldn't the ./configure script be checking for these first? Eric
Re: [HACKERS] Securing "make check" (CVE-2014-0067)
On Sun, Mar 23, 2014 at 07:04:20PM -0400, Noah Misch wrote: > On Thu, Mar 06, 2014 at 11:52:22PM -0500, Noah Misch wrote: > > On Thu, Mar 06, 2014 at 12:44:34PM -0500, Tom Lane wrote: > > > I'm inclined to suggest that we should put the socket under $CWD by > > > default, but provide some way for the user to override that choice. > > > If they want to put it in /tmp, it's on their head as to how secure > > > that is. On most modern platforms it'd be fine. > > > > I am skeptical about the value of protecting systems with non-sticky /tmp, > > but > > long $CWD isn't of great importance, either. I'm fine with your suggestion. > > Though the $CWD or one of its parents could be world-writable, that would > > typically mean an attacker could just replace the test cases directly. > > Here's the patch. The temporary data directory makes for a convenient socket > directory; initdb already gives it mode 0700, and we have existing > arrangements to purge it when finished. One can override the socket directory > by defining PG_REGRESS_SOCK_DIR in the environment. Socket path length limitations thwarted that patch: http://www.postgresql.org/message-id/flat/e1wtnv2-00047s...@gemulon.postgresql.org Here's an update that places the socket in a temporary subdirectory of /tmp. The first attached patch adds NetBSD mkdtemp() to libpgport. The second, principal, patch uses mkdtemp() to implement this design in pg_regress. The corresponding change to contrib/pg_upgrade/test.sh is based on the "configure" script's arrangements for its temporary directory. NetBSD's mkdtemp() has assertions, and I initially mapped its assertion macro to our Assert(). However, a bug in our MinGW build process causes build failures if an Assert() call appears in libpgport. I will post about that in a new thread. The affected assertions were uncompelling, so I dropped them. -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/configure b/configure index ed1ff0a..f8232db 100755 --- a/configure +++ b/configure @@ -11650,6 +11650,19 @@ esac fi +ac_fn_c_check_func "$LINENO" "mkdtemp" "ac_cv_func_mkdtemp" +if test "x$ac_cv_func_mkdtemp" = xyes; then : + $as_echo "#define HAVE_MKDTEMP 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" mkdtemp.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS mkdtemp.$ac_objext" + ;; +esac + +fi + ac_fn_c_check_func "$LINENO" "random" "ac_cv_func_random" if test "x$ac_cv_func_random" = xyes; then : $as_echo "#define HAVE_RANDOM 1" >>confdefs.h diff --git a/configure.in b/configure.in index 80df1d7..c95e2cd 100644 --- a/configure.in +++ b/configure.in @@ -1357,7 +1357,7 @@ else AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break]) fi -AC_REPLACE_FUNCS([crypt fls getopt getrusage inet_aton random rint srandom strerror strlcat strlcpy]) +AC_REPLACE_FUNCS([crypt fls getopt getrusage inet_aton mkdtemp random rint srandom strerror strlcat strlcpy]) case $host_os in diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 5ff9e41..4fb7288 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -330,6 +330,9 @@ /* Define to 1 if the system has the type `MINIDUMP_TYPE'. */ #undef HAVE_MINIDUMP_TYPE +/* Define to 1 if you have the `mkdtemp' function. */ +#undef HAVE_MKDTEMP + /* Define to 1 if you have the header file. */ #undef HAVE_NETINET_IN_H diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32 index e6e3c8d..58777ca 100644 --- a/src/include/pg_config.h.win32 +++ b/src/include/pg_config.h.win32 @@ -249,6 +249,9 @@ /* Define to 1 if the system has the type `MINIDUMP_TYPE'. */ #define HAVE_MINIDUMP_TYPE 1 +/* Define to 1 if you have the `mkdtemp' function. */ +/* #undef HAVE_MKDTEMP */ + /* Define to 1 if you have the header file. */ #define HAVE_NETINET_IN_H 1 diff --git a/src/include/port.h b/src/include/port.h index c9226f3..3d97481 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -462,6 +462,9 @@ extern int pg_check_dir(const char *dir); /* port/pgmkdirp.c */ extern int pg_mkdir_p(char *path, int omode); +/* port/mkdtemp.c */ +extern char *mkdtemp(char *path); + /* port/pqsignal.c */ typedef void (*pqsigfunc) (int signo); extern pqsigfunc pqsignal(int signo, pqsigfunc func); diff --git a/src/port/mkdtemp.c b/src/port/mkdtemp.c new file mode 100644 index 000..a5e991f --- /dev/null +++ b/src/port/mkdtemp.c @@ -0,0 +1,293 @@ +/*- + * + * mkdtemp.c + * create a mode-0700 temporary directory + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/port/mkdtemp.c + * + * This code was taken from NetBSD to provide an implementation for platforms + * that lack it. (Among compatibly-licensed implementations, the OpenBSD + * version better resists denial-of-service attacks. However, it has a + * cryptograp
Re: [HACKERS] Scaling shared buffer eviction
Amit Kapila wrote: > I have improved the patch by making following changes: > a. Improved the bgwriter logic to log for xl_running_xacts info and > removed the hibernate logic as bgwriter will now work only when > there is scarcity of buffer's in free list. Basic idea is when the > number of buffers on freelist drops below the low threshold, the > allocating backend sets the latch and bgwriter wakesup and begin > adding buffer's to freelist until it reaches high threshold and then > again goes back to sleep. The numbers from your benchmarks are very exciting, but the above concerns me. My tuning of the bgwriter in production has generally *not* been aimed at keeping pages on the freelist, but toward preventing shared_buffers from accumulating a lot of dirty pages, which were leading to cascades of writes between caches and thus to write stalls. By pushing dirty pages into the (*much* larger) OS cache, and letting write combining happen there, where the OS could pace based on the total number of dirty pages instead of having some hidden and appearing rather suddenly, latency spikes were avoided while not causing any noticeable increase in the number of OS writes to the RAID controller's cache. Essentially I was able to tune the bgwriter so that a dirty page was always push out to the OS cache within three seconds, which led to a healthy balance of writes between the checkpoint process and the bgwriter. Backend processes related to user connections still performed about 30% of the writes, and this work shows promise toward bringing that down, which would be great; but please don't eliminate the ability to prevent write stalls in the process. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allowing NOT IN to use ANTI joins
Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs. A simple example of this (without a subquery) is: select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is unknown. The attached patch allows an ANTI-join plan to be generated in cases like: CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL); CREATE TABLE b (id INT NOT NULL); SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b); To generate a plan like: QUERY PLAN - Hash Anti Join (cost=64.00..137.13 rows=1070 width=8) Hash Cond: (a.b_id = b.id) -> Seq Scan on a (cost=0.00..31.40 rows=2140 width=8) -> Hash (cost=34.00..34.00 rows=2400 width=4) -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) But if we then do: ALTER TABLE b ALTER COLUMN id DROP NOT NULL; The plan will go back to the current behaviour of: QUERY PLAN - Seq Scan on a (cost=40.00..76.75 rows=1070 width=8) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) Comments are welcome Regards David Rowley not_in_anti_join_v0.4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposing pg_hibernate
On Fri, Jun 6, 2014 at 5:31 PM, Gurjeet Singh wrote: > On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila wrote: > > Buffer saver process itself can crash while saving or restoring > > buffers. > > True. That may lead to partial list of buffers being saved. And the > code in Reader process tries hard to read only valid data, and punts > at the first sight of data that doesn't make sense or on ERROR raised > from Postgres API call. Inspite of Reader Process trying hard, I think we should ensure by some other means that file saved by buffer saver is valid (may be first write in tmp file and then rename it or something else). > > IIUC on shutdown request, postmaster will send signal to BG Saver > > and BG Saver will save the buffers and then postmaster will send > > signal to checkpointer to shutdown. So before writing Checkpoint > > record, BG Saver can crash (it might have saved half the buffers) > > Case handled as described above. > > > or may BG saver saves buffers, but checkpointer crashes (due to > > power outage or any such thing). > > Checkpointer process' crash seems to be irrelevant to Postgres > Hibernator's workings. Yeap, but if it crashes before writing checkpoint record, it will lead to recovery which is what we are considering. > I think you are trying to argue the wording in my claim "save-files > are created only on clean shutdowons; not on a crash or immediate > shutdown", by implying that a crash may occur at any time during and > after the BufferSaver processing. I agree the wording can be improved. Not only wording, but in your above mail Case 2 and 1b would need to load buffer's and perform recovery as well, so we need to decide which one to give preference. So If you agree that we should have consideration for recovery data along with saved files data, then I think we have below options to consider: 1. Have an provision for user to specify which data (recovery or previous cached blocks) should be considered more important and then load buffers before or after recovery based on that input. 2. Always perform before recovery and mention in docs that users can expect more time for servers to start in case they enable this extension along with the advantages of the same. 3. Always perform after recovery and mention in docs that enabling this extension might discard cached data by recovery or initial few operations done by user. 4. Have an exposed API by BufMgr module such that Buffer loader will only consider buffers in freelist to load buffers. Based on opinion of others, I think we can decide on one of these or if any other better way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com