Re: [HACKERS] Extension Templates S03E11
On Tue, 2013-12-03 at 08:44 -0500, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > On 3 December 2013 02:02, Dimitri Fontaine wrote: > > > ISTM that the real solution to this particular problem is to decouple > > > the extensions that are currently in contrib from a specific postgres > > > version. > > > > "Problem"? It's not a bug that you get hstore 1.2 when you dump from 9.2 > > and reload into 9.3; that's a feature. You wanted an upgrade, presumably, > > I don't buy this argument at *all* and it's not going to fly when we've > got multiple versions of an extension available concurrently. It seems there's a use case for both behaviors; perhaps we should include it in the control information? preserve_version_on_dump (boolean) FWIW, I find the current behavior surprising when in the mindset of a SQL extension. But it makes sense for things more closely tied to the backend. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ruleutils vs. empty targetlists
On 3 December 2013 23:37, Tom Lane wrote: > Thinking some more about bug #8648, it occurred to me that ruleutils.c > isn't exactly prepared for the case either: > > regression=# create table nocols(); > CREATE TABLE > regression=# create view vv1 as select exists (select * from nocols); > CREATE VIEW > regression=# \d+ vv1 > View "public.vv1" > Column | Type | Modifiers | Storage | Description > +-+---+-+- > exists | boolean | | plain | > View definition: > SELECT (EXISTS ( SELECT >FROM nocols)) AS "exists"; > > which of course is illegal syntax. I thought at first that this could be > fixed trivially by emitting "*" if get_target_list found no columns. > However, that doesn't quite work; consider > > create view vv2 as select exists (select nocols.* from nocols, somecols); > > If we regurgitate this as "exists (select * from nocols, somecols)", it > wouldn't mean the same thing. > > But on the third hand, at least in the context of an EXISTS() subquery, > it doesn't really matter because the tlist is irrelevant, at least as long > as it only contains Vars. So you could argue that emitting "*" is plenty > good enough even in the above example. I'm not certain that that applies > everywhere that a tlist could appear, though. > > I experimented with code that would attempt to regurgitate "nocols.*" > in the above example; see attached draft patch. I don't like this patch > much: it's ugly, it'd be a pain to backport (because it's digging into > data structures that have changed repeatedly), and I'm not sure how much > I trust it anyway. So I'm leaning towards just doing > > + if (colno == 0) > + appendStringInfoString(buf, " *"); > > at least till such time as somebody shows a case where this isn't good > enough. > Well here's a contrived example with grouping: create table nocols(); create table somecols(a int, b int); create view v as select exists(select nocols.* from nocols, somecols group by somecols.a); Simply turning that tlist into "*" makes the query invalid because somecols.b doesn't appear in the group by clause. So in this case, it needs to try to output a tlist that contains no columns. Regards, Dean > Note that I wouldn't be thinking of this as something to back-patch > at all, except that if someone did have a view that looked like this, > they'd find that pg_dump output wouldn't restore. You could construct > scenarios where that could seem like a denial of service, particularly > if the DBA wasn't smart enough to figure out what was wrong with his > dump. (And who wants to deal with such a thing at 4AM anyway ...) > > Comments? > > regards, tom lane > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
> >I've cleaned this up - revision attached - and marked it "ready for > committer". > Thank you for this. > I did the basic hygiene test. The patch applies correctly and compiles with no warnings. Did not find anything broken in basic functionality. In the documentation i have a minor suggestion of replacing phrase "might judge to be a non-distinct " with ->" may judge to be non- distinct". regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5781577.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] logical changeset generation v6.7
Hello, this is cont'd comments. > 0008 and after to come later.. I had nothing to comment for patch 0008. = 0009: - In repl_scanner.l, you omitted double-doublequote handling for replication but it should be implemented. Zero-length identifier check might be needed depending on the upper-layer. - In walsender.c, the log messages "Initiating logical rep.." and "Starting logical replication.." should be INFO or LOG in loglevel, not WARNING. And 'rep' in the former message would be better not abbreviated since not done so in the latter. - In walsender.c, StartLogicalReplication seems trying to abort itself for timeline change. But timeline changes in 9.3+ don't need such an aid. You'd better consult StartReplication in current master for detail. There might be other defferences. - In walsender.c, the typedef name WalSndSendData doesn't seem to be a function pointer. I suppose passing bare function pointer to WanSndLoop and WalSndDone is not a good deed. It'd be better to wrap it in any struct for callback, say, LogicalDecodingContext. It'd be even better if it could be a common struct with 'physycal' replication. - In walsender.c, I wonder if the differences are necessary between logical and physical replication in fetching latest WALs, construction of WAL sending loop and so on .. Logical walsender seems to be implimentated in somewhat ad-hoc way on the whole. I belive it could be more commonize in the base structure. - In procarray.c, the added two includes which is not accompanied by any other modification are needless. make emits no error or warning without them. ...Time's up. It'll be continued for later from 0010.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Mon, 2013-12-02 at 15:44 -0500, Stephen Frost wrote: > How are we going to handle new keywords > being added in new major versions? A pg_dump of the extension template > script is then going to be loaded into the new major version but will > not actually be able to be run because it'll error out... Elsewhere in the thread you argued that the version of an extension should be preserved across dump/reload. Surely a given version of the extension corresponds to a specific set of SQL commands (specifically, the SQL text blob on PGXN), so it *should* error out. Otherwise you end up with a weird situation where upgrading a 9.4 install to 9.5 allows you to keep version 1.2 of some extension, but 1.2 won't install directly to 9.5. (By the way, I think this is a problem with pg_upgrade currently.) You're fighting pretty hard against text blobs, but at the same time saying that we should be able to fully make use of existing PGXN extensions, which contain text blobs of SQL. And extension authors are versioning their SQL blobs, not some abstract concepts internal to postgres and the catalogs. Just because we start with blobs from PGXN doesn't mean we need to use blobs everywhere; but I think you're too quick to rule them out. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, 2013-12-03 at 10:23 -0500, Robert Haas wrote: > In more normal cases, however, the system can (and probably should) > figure out what was intended by choosing the *shortest* path to get to > the intended version. For example, if someone ships 1.0, 1.0--1.1, > 1.1, and 1.1--1.2, the system should choose to run 1.1 and then > 1.1--1.2, not 1.0 and then 1.0--1.1 and then 1.1--1.2. But that can > be automatic: only if there are two paths of equal length (as in the > example in the previous paragraph) do we need help from the user to > figure out what to do. Why do we need help from the user? Just pick a path. For an extension update, I understand why someone wouldn't want to accidentally downgrade 5 versions (dropping all of their dependent objects) before updating to the latest. But this doesn't apply to creation. And it just seems really awkward to document, and it's a constant maintenance burden on extension authors to specify their upgrade paths every time they release a new version. > Putting all that together, I'm inclined to suggest that what we really > need is a LIST of version numbers, rather than just one. If there one > path to the version we're installing is shorter than any other, we > choose that, period. If there are multiple paths of equal length, we > break the tie by choosing which version number appears first in the > aforementioned list. If that still doesn't break the tie, either > because none of the starting points are mentioned in that list or > because there are multiple equal-length paths starting in the same > place, we give up and emit an error. That seems like extreme overkill, and still doesn't give users full control over upgrade paths. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 01:08 AM, Tom Lane wrote: Magnus Hagander writes: On Tue, Dec 3, 2013 at 11:44 PM, Josh Berkus wrote: Would certainly be nice. Realistically, getting good automated performace tests will require paying someone like Greg S., Mark or me for 6 solid months to develop them, since worthwhile open source performance test platforms currently don't exist. That money has never been available; maybe I should do a kickstarter. So in order to get *testing* we need to pay somebody. But to build a great database server, we can rely on volunteer efforts or sponsorship from companies who are interested in moving the project forward? And even more to the point, volunteers to reinvent the kernel I/O stack can be found on every street corner? Actually, yes, I think so. That's a lot more exciting to work on than a regression test suite. And those volunteers won't need any test scaffolding to be sure that *their* version never has performance regressions? (Well, no, they won't, because no such thing will ever be built. But we do need better test scaffolding for real problems.) Maybe we should lie, and *say* that we want direct I/O, but require that all submissions come with a test suite to prove that it's a gain. Then someone might actually write one, as a sidekick of a direct I/O patch. Then we could toss out the direct I/O stuff and take only the test framework. FWIW, I also think that it'd be a folly to reimplement the I/O stack. The kernel does a lot of things for us. It might not do a great job, but it's good enough. As one datapoint, before my time, the VMware vPostgres team actually did use direct I/O in vPostgres. We shipped that in a few releases. It was a lot of effort to get the code right, and for DBAs, it made correct tuning of shared_buffers a lot more important - set it too low and you want take full advantage of your RAM, set it too high and you won't have memory available for other things. To be a good VM citizen, they also had to implement a memory ballooning module inside Postgres, to release shared buffers if the system hosting the VM is under memory pressure. What did we gain by doing all that, compared to just letting the kernel handle? Some extra performance in some use cases, and a loss in others. Not worth the trouble. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.7
On 2013-12-04 17:31:50 +0900, Kyotaro HORIGUCHI wrote: > = 0009: > > - In repl_scanner.l, you omitted double-doublequote handling for >replication but it should be implemented. Zero-length >identifier check might be needed depending on the upper-layer. I am not sure what you mean here. IDENT can be double quoted, and so can the option names? > - In walsender.c, the log messages "Initiating logical rep.." >and "Starting logical replication.." should be INFO or LOG in >loglevel, not WARNING. And 'rep' in the former message would >be better not abbreviated since not done so in the latter. Agreed. > - In walsender.c, StartLogicalReplication seems trying to abort >itself for timeline change. But timeline changes in 9.3+ don't >need such an aid. You'd better consult StartReplication in >current master for detail. There might be other defferences. Timeline increases currently need work, yes, that error messgage is the smallest part... > - In walsender.c, the typedef name WalSndSendData doesn't seem >to be a function pointer. I suppose passing bare function >pointer to WanSndLoop and WalSndDone is not a good deed. It'd >be better to wrap it in any struct for callback, say, >LogicalDecodingContext. It'd be even better if it could be a >common struct with 'physycal' replication. I don't see that as being realistic/advantageous. Wrapping a function pointer in a struct doesn't improve anything in itself. I was thinking we might want to just decouple the entire event loop and not reuse that code, but that's ugly as well. > - In walsender.c, I wonder if the differences are necessary >between logical and physical replication in fetching latest >WALs, construction of WAL sending loop and so on .. Logical >walsender seems to be implimentated in somewhat ad-hoc way on >the whole. I belive it could be more commonize in the base >structure. That's because the xlogreader.h interface - over my loud protests - doesn't support chunk-wise reading of the WAL stream and neccessicates blocking inside the reader callback. So the event loop needs to be in several individual functions (WalSndLoop, WalSndWaitForWal, WalSndWriteData) instead of once in WalSndLoop… > - In procarray.c, the added two includes which is not >accompanied by any other modification are needless. make emits >no error or warning without them. Right. Will remove. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
On 2 December 2013 04:55, Pavel Stehule wrote: > Hello > > it looks well, thank you > > Regards > > Pavel > I've been thinking about this some more, and there's another case that concerns me slightly. We're now making some of the DROP...IF EXISTS commands tolerate non-existent types as well as non-existent schemas --- functions, aggregates, casts and operators all have type names in their specifications. Of course it's possible that the type is missing because it was in a schema that was dropped, so this change seems to be in spirit of what was discussed, but it seems like a change that might catch some people out. I think that, on balance, it is a sensible change, since if the type doesn't exist, the dependent object can't exist either, so DROP...IF EXISTS shouldn't be raising an error. However, I wonder if we should be issuing a more specific NOTICE in this case too --- i.e., check for non-existent types in the same way as we check for non-existent parent objects --- type_does_not_exist_skipping() and type_list_does_not_exist_skipping(). Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote: > >4) Start the slave and connect to it using psql and in another session I can > >see > >all archive recovery log > Hmm... I had thought my mistake in reading your email, but it reproduce again. > When I sat small recovery_time_delay(=3), it might work collectry. > However, I sat long timed recovery_time_delay(=300), it didn't work. > My reporduced operation log is under following. > >[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432 > >starting vacuum...end. > >transaction type: TPC-B (sort of) > >scaling factor: 10 > >query mode: simple > >number of clients: 8 > >number of threads: 4 > >duration: 30 s > >number of transactions actually processed: 68704 > >latency average: 3.493 ms > >tps = 2289.196747 (including connections establishing) > >tps = 2290.175129 (excluding connections establishing) > >[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf > >[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start > >server starting > >[mitsu-ko@localhost postgresql]$ LOG: database system was shut down in > >recovery at 2013-12-03 10:26:41 JST > >LOG: entering standby mode > >LOG: consistent recovery state reached at 0/5C4D8668 > >LOG: redo starts at 0/5C4000D8 > >[mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up > >FATAL: the database system is starting up > >FATAL: the database system is starting up > >FATAL: the database system is starting up > >FATAL: the database system is starting up > >[mitsu-ko@localhost postgresql]$ bin/psql -p6543 > >psql: FATAL: the database system is starting up > >[mitsu-ko@localhost postgresql]$ bin/psql -p6543 > >psql: FATAL: the database system is starting up > I attached my postgresql.conf and recovery.conf. It will be reproduced. So, you brought up a standby and it took more time to become consistent because it waited on commits? That's the problem? If so, I don't think that's a bug? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
On Mon, Dec 2, 2013 at 10:42 PM, Peter Eisentraut wrote: > On 11/19/13, 11:30 PM, Peter Geoghegan wrote: > >>> +1 from me. > >> > > >> > > >> > That's +1 for *not* including this? > > Right. > > I agree with not including this. > > If you're looking for more of those, here's another +1 for not including it. (And of course also for Peters comment about that we need to figure out if something is actually missing for building this higher level tool that can provide this information and more) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Time-Delayed Standbys
Hi, On 04/12/13 11:13, KONDO Mitsumasa wrote: > >1) Clusters > >- build master > >- build slave and attach to the master using SR and config > >recovery_time_delay to > >1min. > > > >2) Stop de Slave > > > >3) Run some transactions on the master using pgbench to generate a lot of > >archives > > > >4) Start the slave and connect to it using psql and in another session I can > >see > >all archive recovery log > Hmm... I had thought my mistake in reading your email, but it reproduce again. > When I sat small recovery_time_delay(=3), it might work collectry. > However, I sat long timed recovery_time_delay(=300), it didn't work. > […] I'm not sure if I understand your problem correctly. I try to summarize, please correct if I'm wrong: You created a master node and a hot standby with 300 delay. Then you stopped the standby, did the pgbench and startet the hot standby again. It did not get in line with the master. Is this correct? I don't see a problem here… the standby should not be in sync with the master, it should be delayed. I did step by step what you did and after 50 minutes (300ms) the standby was at the same level the master was. Did I missunderstand you? Regards, Christian Kruse -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pgp7HACTkLsby.pgp Description: PGP signature
Re: [HACKERS] Problem with displaying "wide" tables in psql
On Wed, Dec 4, 2013 at 11:20 AM, Sergey Muraviov < sergey.k.murav...@gmail.com> wrote: > Thank you for this trick. > It would be nice if this trick was documented. > > However, with the pager I can't see wide value on one screen, select and > copy it entirely. > And I have to press many keys to find the necessary part of the value. > There is no such problems with the patch. > I think the solution is to provide proposed behaviour as an option. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
From: "Tom Lane" I think the reason why it was coded like that was that we hadn't written postmaster_is_alive() yet, or maybe we had but didn't want to trust it. However, with the coding you have here, we're fully exposed to any failure modes postmaster_is_alive() may have; so there's not a lot of value in accepting those and get_pgpid's failure modes too. Thank you for reviewing the patch so quickly. You are right, I don't think get_pgpid() here is no longer necessary. If the pid changes in one second, i.e. the original postgres terminates and "pg_ctl start" starts another one, "pg_ctl stop" can terminate successfully because the original postgres it was waiting for actually terminated. I'll submit the revised patch tomorrow. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder
From: "MauMau" In addition, I'll remove libpq.dll from lib folder unless somebody objects. Currently, libpq.dll is placed in both bin and lib. I guess libpq.dll was left in lib because it was considered necessary for ECPG DLLs. The attached patch also removes libpq.dll from lib folder. I don't mind whether this patch or yesterday's one will be adopted. I'll add this to 2014-1 commitfest this weekend if the patch is not committed until then. Regards MauMau ECPG_DLL.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
> I think all of this data cannot fit in shared_buffers, you might want to increase shared_buffers > to larger size (not 30GB but close to your data size) to see how it behaves. When I use shared_buffers larger than my data size such as 10 GB, results scale nearly as expected at least for this instance type. > You queries have Aggregation, ORDER/GROUP BY, so there is a chance > that I/O can happen for those operation's > if PG doesn't have sufficient memory (work_mem) to perform such operation. I used work_mem as 32 MB, this should be enough for these queries. I also tested with higher values of work_mem, and didn't obverse any difference. > Can you simplify your queries (simple scan or in other words no > aggregation or other things) to see how > they behave in your env., once you are able to see simple queries > scaling as per your expectation, you > can try with complex one's. Actually we observe problem when queries start to get simpler such as select count(*). Here is the results table in more compact format: select count(*) TPC-H Simple(#6) TPC-H Complex(#1) 1 Table / 1 query 1.5 s2.5 s 8.4 s 2 Tables/ 2 queries1.5 s2.5 s 8.4 s 4 Tables/ 4 queries2.0 s2.9 s 8.8 s 8 Tables/ 8 queries3.3 s4.0 s 9.6 s > Can we have the explain analyze of those queries, postgres > configuration, perhaps vmstat output during execution? postgres=# explain analyze SELECT count(*) from lineitem_1; QUERY PLAN -- Aggregate (cost=199645.01..199645.02 rows=1 width=0) (actual time=11317.391..11317.393 rows=1 loops=1) -> Seq Scan on lineitem_1 (cost=0.00..184641.81 rows=6001281 width=0) (actual time=0.011..5805.255 rows=6001215 loops=1) Total runtime: 11317.440 ms (3 rows) postgres=# explain analyze SELECT postgres-# sum(l_extendedprice * l_discount) as revenue postgres-# FROM postgres-# lineitem_1 postgres-# WHERE postgres-# l_shipdate >= date '1994-01-01' postgres-# AND l_shipdate < date '1994-01-01' + interval '1' year postgres-# AND l_discount between 0.06 - 0.01 AND 0.06 + 0.01 postgres-# AND l_quantity < 24; QUERY PLAN -- Aggregate (cost=260215.36..260215.37 rows=1 width=16) (actual time=1751.775..1751.776 rows=1 loops=1) -> Seq Scan on lineitem_1 (cost=0.00..259657.82 rows=111508 width=16) (actual time=0.031..1630.449 rows=114160 loops=1) Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.05::double precision) AND (l_discount <= 0.07::double precision) AND (l_quantity < 24::double precision)) Rows Removed by Filter: 5887055 Total runtime: 1751.830 ms (5 rows) postgres=# explain analyze SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem_1 WHERE l_shipdate <= date '1998-12-01' - interval '90' day GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus; QUERY PLAN - Sort (cost=436342.68..436342.69 rows=6 width=36) (actual time=18720.932..18720.936 rows=4 loops=1) Sort Key: l_returnflag, l_linestatus Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=436342.49..436342.60 rows=6 width=36) (actual time=18720.887..18720.892 rows=4 loops=1) -> Seq Scan on lineitem_1 (cost=0.00..199645.01 rows=5917437 width=36) (actual time=0.011..6754.619 rows=5916591 loops=1) Filter: (l_shipdate <= '1998-09-02 00:00:00'::timestamp without time zone) Rows Removed by Filter: 84624 Total runtime: 18721.021 ms (8 rows) Here are the results of "vmstat 1" while running 8 parallel TPC-H Simple (#6) queries: Although there is no need for I/O, "wa" fluctuates between 0 and 1. procs ---memory-- ---swap-- -io --system-- -cpu- r b swpd free buff cache si sobibo in cs us sy id wa st 0 0 0 30093568 84892 3872389600 0 0 22 14 0 0 100 0 0 8 1 0 30043056 84892 3872389600 0 0 27080 52708 16
[HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows
Hello, I've found a bug that psql's \conninfo displays incorrect information on Windows. Please find attached the patch and commit this. [Problem] When I run "psql postgres" on Windows and execute \conninfo, it outputs the text below. It reports that psql connected to the server via UNIX domain socket, but the actual connection is of course via TCP/IP socket to localhost. You are connected to database "postgres" as user "Administrator" via socket in "/tmp" at port "5432". It should output: You are connected to database "postgres" as user "Administrator" on host "localhost" at port "5432". [Cause] \conninfo calls PQhost(conn) to get the destination info. PQhost() in this case returns NULL because conn->pghost and conn->pghostaddr are NULL. When \conninfo receives NULL from PQhost(), it assumes /tmp. [Fix] PQhost() should return the actual destination. Regards MauMau conninfo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Parallel Select query performance and shared buffers
> Maybe you could help test this patch: > http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de Which repository should I apply these patches. I tried main repository, 9.3 stable and source code of 9.3.1, and in my trials at least of one the patches is failed. What patch command should I use?
Re: [HACKERS] ruleutils vs. empty targetlists
Dean Rasheed writes: > On 3 December 2013 23:37, Tom Lane wrote: >> Thinking some more about bug #8648, it occurred to me that ruleutils.c >> isn't exactly prepared for the case either: >> ... So I'm leaning towards just doing >> >> + if (colno == 0) >> + appendStringInfoString(buf, " *"); >> >> at least till such time as somebody shows a case where this isn't good >> enough. > Well here's a contrived example with grouping: > create table nocols(); > create table somecols(a int, b int); > create view v as select exists(select nocols.* from nocols, somecols > group by somecols.a); Hm, that's cute :-(. And I realized last night that my patch's approach of seizing on some random zero-column RTE isn't too bulletproof against ALTER TABLE ADD COLUMN operations, either, ie, the referenced table might have some columns by now. Which is more or less the entire reason we expand "*" at parse time in the first place. What I'm thinking about this today is that really the *right* solution is to allow syntactically-empty SELECT lists; once we've bought into the notion of zero-column tables, the notion that you can't have an empty select list is just fundamentally at odds with that. And since you can already have semantically-empty SELECT lists, this should in theory not create much risk of new bugs. If we did that, the existing ruleutils code is just fine, as are any existing dump files containing this sort of query. That change might still be thought too aggressive for a back-patch, though. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/4 Andres Freund > On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote: > > >4) Start the slave and connect to it using psql and in another session > I can see > > >all archive recovery log > > Hmm... I had thought my mistake in reading your email, but it reproduce > again. > > When I sat small recovery_time_delay(=3), it might work collectry. > > However, I sat long timed recovery_time_delay(=300), it didn't work. > > > My reporduced operation log is under following. > > >[mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432 > > >starting vacuum...end. > > >transaction type: TPC-B (sort of) > > >scaling factor: 10 > > >query mode: simple > > >number of clients: 8 > > >number of threads: 4 > > >duration: 30 s > > >number of transactions actually processed: 68704 > > >latency average: 3.493 ms > > >tps = 2289.196747 (including connections establishing) > > >tps = 2290.175129 (excluding connections establishing) > > >[mitsu-ko@localhost postgresql]$ vim slave/recovery.conf > > >[mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start > > >server starting > > >[mitsu-ko@localhost postgresql]$ LOG: database system was shut down > in recovery at 2013-12-03 10:26:41 JST > > >LOG: entering standby mode > > >LOG: consistent recovery state reached at 0/5C4D8668 > > >LOG: redo starts at 0/5C4000D8 > > >[mitsu-ko@localhost postgresql]$ FATAL: the database system is > starting up > > >FATAL: the database system is starting up > > >FATAL: the database system is starting up > > >FATAL: the database system is starting up > > >FATAL: the database system is starting up > > >[mitsu-ko@localhost postgresql]$ bin/psql -p6543 > > >psql: FATAL: the database system is starting up > > >[mitsu-ko@localhost postgresql]$ bin/psql -p6543 > > >psql: FATAL: the database system is starting up > > I attached my postgresql.conf and recovery.conf. It will be reproduced. > > So, you brought up a standby and it took more time to become consistent > because it waited on commits? That's the problem? If so, I don't think > that's a bug? > When it happened, psql cannot connect standby server at all. I think this behavior is not good. It should only delay recovery position and can seen old delay table data. Cannot connect server is not hoped behavior. If you think this behavior is the best, I will set ready for commiter. And commiter will fix it better. Rregards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote: > 2013/12/4 Andres Freund > When it happened, psql cannot connect standby server at all. I think this > behavior is not good. > It should only delay recovery position and can seen old delay table data. That doesn't sound like a good plan - even if the clients cannot connect yet, you can still promote the server. Just not taking delay into consideration at that point seems like it would possibly surprise users rather badly in situations they really cannot use such surprises. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ruleutils vs. empty targetlists
Tom Lane escribió: > What I'm thinking about this today is that really the *right* solution > is to allow syntactically-empty SELECT lists; once we've bought into the > notion of zero-column tables, the notion that you can't have an empty > select list is just fundamentally at odds with that. And since you can > already have semantically-empty SELECT lists, this should in theory not > create much risk of new bugs. If we did that, the existing ruleutils > code is just fine, as are any existing dump files containing this sort > of query. Wow, as strange-sounding as that is, you're probably correct. This might probably be seen as a deviation from the standard, but then so are zero-column tables. Of course, syntactically-empty select lists would also work with (standard-conforming) tables containing columns, but it's hard to see that that would be a problem in practice. > That change might still be thought too aggressive for a back-patch, > though. Comments? Well, no correct query will start failing due to this change; the only visible change would be queries that previously throw errors would start working. It's hard to see that as a backward-incompatibility. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/4 Christian Kruse > You created a master node and a hot standby with 300 delay. Then > you stopped the standby, did the pgbench and startet the hot standby > again. It did not get in line with the master. Is this correct? > No. First, I start master, and execute pgbench. Second, I start standby with 300ms(50min) delay. Then it cannot connect standby server by psql at all. I'm not sure why standby did not start. It might because delay feature is disturbed in REDO loop when first standby start-up. > I don't see a problem here… the standby should not be in sync with the > master, it should be delayed. I did step by step what you did and > after 50 minutes (300ms) the standby was at the same level the > master was. > I think we can connect standby server any time, nevertheless with delay option. > Did I missunderstand you? > I'm not sure... You might right or another best way might be existed. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Time-Delayed Standbys
Hi, On 2013-12-03 19:33:16 +, Simon Riggs wrote: > > - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and > > XLOG_XACT_COMMIT_COMPACT checks > > Why just those? Why not aborts and restore points also? What would the advantage of waiting on anything but commits be? If it's not a commit, the action won't change the state of the database (yesyes, there are exceptions, but those don't have a timestamp)... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/4 Andres Freund > On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote: > > 2013/12/4 Andres Freund > > When it happened, psql cannot connect standby server at all. I think this > > behavior is not good. > > It should only delay recovery position and can seen old delay table data. > > That doesn't sound like a good plan - even if the clients cannot connect > yet, you can still promote the server. > I'm not sure your argument, but does a purpose of this patch slip off? Just not taking delay into > consideration at that point seems like it would possibly surprise users > rather badly in situations they really cannot use such surprises. > Hmm... I think user will be surprised... I think it is easy to fix behavior using recovery flag. So we had better to wait for other comments. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Changes in Trigger Firing
Sameer Kumar wrote: > > CreateTrigStmt is passed to CreateTrigger function as an arguement. I am > struggling to understand how the values for various members of trigger are > set and where [which file] calls CreateTrigStmt. > > > Can someone provide some help on this? I think you need better tools to guide you in exploring the source code. For example, you can use cscope to tell you where is CreateTrigStmt used, and you would find gram.y; and use it to tell you where CreateTrigger is used, and you would find utility.c. Any half-decent code editor should be able to generate a "database" of symbols and let you frolic around the various files quickly. Without that, anyone would be completely lost in developing new features of even the lowest complexity. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] pg_ctl fails with config-only directory
Hello, I've found a bug and would like to fix it, but I cannot figure out how to do that well. Could you give me any advice? I encountered this on PG 9.2, but it will probably exist in later versions. [Problem] On Windows, a user with Administrator privileges can start the database server. However, when he uses config-only directory, the database server cannot be started. "pg_ctl start" fails with the following messages: Execution of PostgreSQL by a user with administrative permissions is not permitted. The server must be started under an unprivileged user ID to prevent possible system security compromises. See the documentation for more information on how to properly start the server. [Cause] pg_ctl runs "postgres -C data_directory" to know the data directory. But postgres cannot be run by a user with Administrator privileges, and displays the above messages. [Fix] It is ideal that users with administrative privileges can start postgres, with the Administrator privileges removed. Currently, initdb and pg_ctl take trouble to invoke postgres in a process with restricted privileges. I understand this improvement was done in 8.2 or 8.3 for convenience. The same convenience should be available when running postgres directly, at least "postgres -C", "postgres --describe-config", and "postgres --single". Then, how can we do this? Which approach should we take? * Approach 1 When postgres starts, it removes Administrator privileges from its own process. But is this possible at all? Windows security API is complex and provides many functions. It seems difficult to understand them. I'm afraid it would take a long time to figure out the solution. Is there any good web page to look at? * Approach 2 Do not call check_root() on Windows when -C, --describe-config, or --single is specified when running postgres. This would be easy, and should not be dangerous in terms of security because attackers cannot get into the server process via network. I'll try to find a solution based on approach 1, but I doubt there's one. If okay, I want to take approach 2. Could you give me your thoughts? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Jeff Davis (pg...@j-davis.com) wrote: > On Tue, 2013-12-03 at 14:31 -0500, Tom Lane wrote: > > Stephen Frost writes: > > > When it comes to dump/reload, I'd much rather see a mechanism which uses > > > our deep understanding of the extension's objects (as database objects) > > > to implement the dump/reload than a text blob which is carried forward > > > from major version to major version and may even fail to run. > > > > Note that we're already doing that in the binary_upgrade code path. > > I agree that generalizing that approach sounds like a better idea > > than keeping a text blob around. > > So does this take us fully back to Inline Extensions, or is there a > distinction that I'm missing? I've not really looked at the inline extensions patch/proposal in depth, but I do think that's a lot closer than this. As I understand it, Dimitri had a patch for this, though what I've found is the blog post. Also, there were a lot of discussions about the idea a year or so ago, including folks who haven't spoken up on this discussion. > I still don't see that Extension Templates are all bad: > * They preserve the fact that two instances of the same extension > (e.g. in different databases) were created from the same template. This is only true if we change the extension templates to be shared catalogs, which they aren't today.. > * They mirror the file-based templates, so it seems easier to get > consistent behavior. While it might seem easier and perhaps simpler, I'm not sure that I really buy into the idea that we'd actually be more consistent. Even if we are, I'm not convinced that's what we want here.. The only thing driving us in that direction is that we're calling these 'extensions' too. While I don't want five different extension-like things, I'd rather use a different name from 'extensions' if we feel that the differences between catalog-only extensions and filesystem extensions will cause 'extensions' overall to have terribly inconsistent behavior. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] FDW: possible resjunk columns in AddForeignUpdateTargets
2013/11/8 Tom Lane : > Albe Laurenz writes: >> What I would like to do is add a custom resjunk column >> (e.g. a bytea) in AddForeignUpdateTargets that carries a row identifier >> from the scan state to the modify state. >> Would that be possible? Can I have anything else than a Var >> in a resjunk column? > > [ thinks for awhile... ] Hm. In principle you can put any expression > you want into the tlist during AddForeignUpdateTargets. However, if it's > not a Var then the planner won't understand that it's something that needs > to be supplied by the table scan, so things won't work right in any but > the most trivial cases (maybe not even then :-(). > > What I'd try is creating a Var that has the attno of ctid > (ie, SelfItemPointerAttributeNumber) but the datatype you want, ie bytea. > This won't match what the catalogs say your table's ctid is, but I think > that nothing will care much about that. Apologies for reinvigorating this thread, but I'm running into a similar wall myself and would like to clarify if this approach will work at all. My foreign data source is returning a fixed-length string as a unique row identifier; in AddForeignUpdateTargets() I can create a Var like this: var = makeVar(parsetree->resultRelation, SelfItemPointerAttributeNumber, BPCHAROID, 32, InvalidOid, 0); but is it possible to store something other than a TIDOID here, and if so how? Regards Ian Barwick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying "wide" tables in psql
Hello postgres=# \pset format wrapped Output format (format) is wrapped. postgres=# select 'afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf sadf sa df sadfsadfasd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f sadf sad fadsf'; ?column? - afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf sadf sa df sadfsadfasd fsad fsa df sadf asd fa sfd sadfsadf a. .sdf sad f sadf sad fadsf (1 row) It works as expected but it is not supported for row view. So any fix of this mode should be nice Regards Pavel 2013/12/4 Sergey Muraviov > Thank you for this trick. > It would be nice if this trick was documented. > > However, with the pager I can't see wide value on one screen, select and > copy it entirely. > And I have to press many keys to find the necessary part of the value. > There is no such problems with the patch. > > > 2013/12/3 Pavel Stehule > >> Hello >> >> do you know a pager less trick >> >> http://merlinmoncure.blogspot.cz/2007/10/better-psql-with-less.html >> >> Regards >> >> Pavel Stehule >> >> >> 2013/12/3 Sergey Muraviov >> >>> Hi. >>> >>> Psql definitely have a problem with displaying "wide" tables. >>> Even in expanded mode, they look horrible. >>> So I tried to solve this problem. >>> >>> Before the patch: >>> postgres=# \x 1 >>> Expanded display (expanded) is on. >>> postgres=# \pset border 2 >>> Border style (border) is 2. >>> postgres=# select * from pg_stats; >>> >>> +-[ RECORD 1 >>> ]---+-- >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> --
Re: [HACKERS] Extension Templates S03E11
* Jeff Davis (pg...@j-davis.com) wrote: > On Mon, 2013-12-02 at 15:44 -0500, Stephen Frost wrote: > > How are we going to handle new keywords > > being added in new major versions? A pg_dump of the extension template > > script is then going to be loaded into the new major version but will > > not actually be able to be run because it'll error out... > > Elsewhere in the thread you argued that the version of an extension > should be preserved across dump/reload. Surely a given version of the > extension corresponds to a specific set of SQL commands (specifically, > the SQL text blob on PGXN), so it *should* error out. I *do* think the version should be preserved (though I admit that the argument about things released with PG does make some sense). My point above is that if we dump the text blob out and then just rerun it, it might not work, but if we use pg_dump and dump the extension out as database objects (using the newer version of pg_dump, as we always recommend..), then it's certainly more likely to work even in the face of new keywords and the like which change between releases. > Otherwise you end up with a weird situation where upgrading a 9.4 > install to 9.5 allows you to keep version 1.2 of some extension, but 1.2 > won't install directly to 9.5. (By the way, I think this is a problem > with pg_upgrade currently.) Hmm. I'll grant, that's an interesting situation to consider, but I'm trying to figure out why it's better to make it always break, both on initial installation and when doing a restore from a backup, than only have it (most likely anyway) break on initial installation (to a newer version that may not have existed originally). > You're fighting pretty hard against text blobs, but at the same time > saying that we should be able to fully make use of existing PGXN > extensions, which contain text blobs of SQL. And extension authors are > versioning their SQL blobs, not some abstract concepts internal to > postgres and the catalogs. I don't want text blobs in the backend catalogs. I'm not argueing against text blobs in general (that'd be kinda hard to do..). > Just because we start with blobs from PGXN doesn't mean we need to use > blobs everywhere; but I think you're too quick to rule them out. Perhaps, but I really don't see the point of putting a text blob into the database for a set of objects that we're just going to create in the next moment. That would be, from my point of view anyway, akin to storing 'CREATE TABLE' statements in the catalog next to the actual definition of the table in pg_class/pg_attribute. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
2013/12/4 Dean Rasheed > On 2 December 2013 04:55, Pavel Stehule wrote: > > Hello > > > > it looks well, thank you > > > > Regards > > > > Pavel > > > > I've been thinking about this some more, and there's another case that > concerns me slightly. We're now making some of the DROP...IF EXISTS > commands tolerate non-existent types as well as non-existent schemas > --- functions, aggregates, casts and operators all have type names in > their specifications. Of course it's possible that the type is missing > because it was in a schema that was dropped, so this change seems to > be in spirit of what was discussed, but it seems like a change that > might catch some people out. > > I think that, on balance, it is a sensible change, since if the type > doesn't exist, the dependent object can't exist either, so DROP...IF > EXISTS shouldn't be raising an error. However, I wonder if we should > be issuing a more specific NOTICE in this case too --- i.e., check for > non-existent types in the same way as we check for non-existent parent > objects --- type_does_not_exist_skipping() and > type_list_does_not_exist_skipping(). > +1 Pavel > > Regards, > Dean >
Re: [HACKERS] [patch] Adding EXTRA_REGRESS_OPTS to all pg_regress invocations
On Mon, May 6, 2013 at 11:51:47PM -0700, Christoph Berg wrote: > "make check" supports EXTRA_REGRESS_OPTS to pass extra options to > pg_regress, but all the other places where pg_regress is used do not > allow this. The attached patch adds EXTRA_REGRESS_OPTS to > Makefile.global.in (for contrib modules) and two more special > Makefiles (isolation and pg_upgrade). > > The use case here is that Debian needs to be able to redirect the unix > socket directory used to /tmp, because /var/run/postgresql isn't > writable for the buildd user. The matching part for this inside > pg_regress is still in discussion here, but the addition of > EXTRA_REGRESS_OPTS is an independent step that is also useful for > others, so I'd like to propose it for inclusion. Thanks, patch applied. This will appear in PG 9.4. I suppose we could backpatch this but I would need community feedback on that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
Robert Haas wrote: > So, I proposed this patch previously and I still think it's a > good idea, but it got voted down on the grounds that it didn't > deal with clock drift. I view that as insufficient reason to > reject the feature, but others disagreed. Unless some of those > people have changed their minds, I don't think this patch has > much future here. There are many things that a system admin can get wrong. Failing to supply this feature because the sysadmin might not be running ntpd (or equivalent) correctly seems to me to be like not having the software do fsync because the sysadmin might not have turned off write-back buffering on drives without persistent storage. Either way, poor system management can defeat the feature. Either way, I see no reason to withhold the feature from those who manage their systems in a sane fashion. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hanada-san, Thanks for your reviewing, 2013/12/4 Shigeru Hanada : > I first reviewed postgres_fdw portion of the patches to learn the > outline of Custom Plan. Wiki page is also a good textbook of the > feature. I have some random comments about the basic design of Custom > Plan: > > (1) IIUC add_join_path and add_scan_path are added to allow extensions > to plug their code into planner. > Almost yes. For more correctness, these hooks allows extensions to plug paths they can provide into a particular join or scan. Then planner will choose the cheapest one according to the cost value. > (2) FDW framework has executor callbacks based on existing executor > nodes. Is there any plan to integrate them into one way, or wrap on > by another? I'm not sure that we should have two similar framework > side by side. > # I'm sorry if I've missed the past discussion about this issue. > Probably, FDW has different role from the CustomScan API. As literal, FDW performs as a bridge between a relation form and an opaque external data source, to intermediate two different world on behalf of a foreign table. On the other hand, CustomScan allows to provide alternative logic to scan or join particular relations, in addition to the built-in ones, but does not perform on behalf of foreign tables. Existing FDW is designed to implement a scan on an intangible relation, thus it can assume some things; like a tuple returned from FDW has equivalent TupleDesc as table definition, or it can always use ExecScan() for all the cases. So, I don't think these two frameworks should be consolidated because it makes confusion on the existing extensions that assumes FDW callbacks always has a particular foreign table definition. > (3) Internal routines such as is_self_managed_relation and > has_wholerow_reference seem to be useful for other FDWs. Is it able > to move them into core? > Probably, src/backend/foreign/foreign.c is a good host for them. > (4) postgres_fdw estimates costs of join by calculating local numbers. > How about to support remote estimation by throwing EXPLALAIN query > when use_remote_estimates = true. > I'm uncertain whether the cost value from remote EXPLAIN represents right difficulty on the local side, because it indeed represents the difficulty to join two relations on the remote side, however, does not represents local job; that just fetches tuples from the result set of remote query with table joining. How about your opinion? Is the remote cost estimation value comparable with local value? Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Thanks for fixing many my carelessness. I didn't know "seek" was an irregular verb... Best regards, 2013/12/4 Shigeru Hanada : > 2013/11/29 Kohei KaiGai : >> I merged all the propositions from Jim. Thanks, it made the documentation >> quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab. > > I found some typos in documents and comments. Please see attached > patch for detail. > > -- > Shigeru HANADA -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
Hi, On 04/12/13 07:22, Kevin Grittner wrote: > There are many things that a system admin can get wrong. Failing > to supply this feature because the sysadmin might not be running > ntpd (or equivalent) correctly seems to me to be like not having > the software do fsync because the sysadmin might not have turned > off write-back buffering on drives without persistent storage. > Either way, poor system management can defeat the feature. Either > way, I see no reason to withhold the feature from those who manage > their systems in a sane fashion. I agree. But maybe we should add a warning in the documentation about time syncing? Greetings, CK -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pgp929ckT_fsN.pgp Description: PGP signature
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/4/13, 2:14 AM, Stefan Kaltenbrunner wrote: > running a > few kvm instances that get bootstrapped automatically is something that > is a solved problem. Is it sound to run performance tests on kvm? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On Tue, 03 Dec 2013 10:44:15 -0800 Josh Berkus wrote: > It seems clear that Kernel.org, since 2.6, has been in the business of > pushing major, hackish, changes to the IO stack without testing them or > even thinking too hard about what the side-effects might be. This is > perhaps unsurprising given that two of the largest sponsors of the > Kernel -- who, incidentally, do 100% of the performance testing -- don't > use the IO stack. > > This says to me that Linux will clearly be an undependable platform in > the future with the potential to destroy PostgreSQL performance without > warning, leaving us scrambling for workarounds. Too bad the > alternatives are so unpopular. Wow, Josh, I'm surprised to hear this from you. The active/inactive list mechanism works great for the vast majority of users. The second-use algorithm prevents a lot of pathological behavior, like wiping out your entire cache by copying a big file or running a backup. We *need* that kind of logic in the kernel. Now, back in 2012, Johannes (working for one of those big contributors) hit upon an issue where second-use falls down. So he set out to fix it: https://lwn.net/Articles/495543/ This code has been a bit slow getting into the mainline for a few reasons, but one of the chief ones is this: nobody is saying from the sidelines that they need it! If somebody were saying "Postgres would work a lot better with this code in place" and had some numbers to demonstrate that, we'd be far more likely to see it get into an upcoming release. In the end, Linux is quite responsive to the people who participate in its development, even as testers and bug reporters. It responds rather less well to people who find problems in enterprise kernels years later, granted. The amount of automated testing, including performance testing, has increased markedly in the last couple of years. I bet that it would not be hard at all to get somebody like Fengguang Wu to add some Postgres-oriented I/O tests to his automatic suite: https://lwn.net/Articles/571991/ Then we would all have a much better idea of how kernel releases are affecting one of our most important applications; developers would pay attention to that information. Or you could go off and do your own thing, but I believe that would leave us all poorer. Thanks, jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 04:30 PM, Peter Eisentraut wrote: On 12/4/13, 2:14 AM, Stefan Kaltenbrunner wrote: running a few kvm instances that get bootstrapped automatically is something that is a solved problem. Is it sound to run performance tests on kvm? as sounds as on any other platform imho, the performance characteristics will differ between bare metal or other virtualisation platforms but the future is virtual and that is what a lot of stuff runs on... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: programmable file format for postgresql.conf
On 12/4/13, 1:42 AM, Álvaro Hernández Tortosa wrote: > IMHO, a data structure like the above would be completely > self-contained and allow any autoconfiguring tool or GUI tool to be > easily created, if the syntax is programmable. It would certainly make > the config file more verbose, but at the same time would help a lot of > users to configure postgres providing much more information. What you are describing appears to be isomorphic to XML and XML Schema. Note that you are not required to maintain your configuration data in a postgresql.conf-formatted file. You can keep it anywhere you like, GUI around in it, and convert it back to the required format. Most of the metadata is available through postgres --describe-config, which is the result of a previous attempt in this area, which never really went anywhere. It's not like there are a bunch of GUI and autotuning tools that people are dying to use or developers are dying to create, but couldn't because editing configuration files programmatically is hard. Let's also not forget the two main use cases (arguably) of the configuration files: hand editing, and generation by configuration management tools. Anything that makes these two harder is not going to be well-received. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor patch for the uuid-ossp extension
On 11/23/13, 7:12 AM, Mario Weilguni wrote: > Well, in that case and since this is a rarely used extension (I guess > so), maybe it would be the best to simply rename that extension to > uuidossp (or whatever) and don't make any special treatment for it? Why? This is a solved problem, and renaming the extension would only cause unnecessary work. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW: possible resjunk columns in AddForeignUpdateTargets
Ian Lawrence Barwick wrote: > 2013/11/8 Tom Lane : >> [ thinks for awhile... ] Hm. In principle you can put any expression >> you want into the tlist during AddForeignUpdateTargets. However, if it's >> not a Var then the planner won't understand that it's something that needs >> to be supplied by the table scan, so things won't work right in any but >> the most trivial cases (maybe not even then :-(). >> >> What I'd try is creating a Var that has the attno of ctid >> (ie, SelfItemPointerAttributeNumber) but the datatype you want, ie bytea. >> This won't match what the catalogs say your table's ctid is, but I think >> that nothing will care much about that. > > Apologies for reinvigorating this thread, but I'm running into a similar wall > myself and would like to clarify if this approach will work at all. > > My foreign data source is returning a fixed-length string as a unique row > identifier; in AddForeignUpdateTargets() I can create a Var like this: > > var = makeVar(parsetree->resultRelation, >SelfItemPointerAttributeNumber, >BPCHAROID, >32, >InvalidOid, >0); > > but is it possible to store something other than a TIDOID here, and if so how? Subsequent analysis showed that this won't work as you have no way to populate such a resjunk column. resjunk columns seem to get filled with the values from the column of the same name, so currently there is no way to invent your own column, fill it and pass it on. See thread 8b848b463a71b7a905bc5ef18b95528e.squir...@sq.gransy.com What I ended up doing is introduce a column option that identifies a primary key column. I add a resjunk entry for each of those and use them to identify the correct row during an UPDATE or DELETE. That only works for foreign data sources that have a concept of a primary key, but maybe you can do something similar. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: programmable file format for postgresql.conf
On 04/12/13 16:51, Peter Eisentraut wrote: On 12/4/13, 1:42 AM, Álvaro Hernández Tortosa wrote: IMHO, a data structure like the above would be completely self-contained and allow any autoconfiguring tool or GUI tool to be easily created, if the syntax is programmable. It would certainly make the config file more verbose, but at the same time would help a lot of users to configure postgres providing much more information. What you are describing appears to be isomorphic to XML and XML Schema. I don't think XML would be a good idea. Even if it is both programatically and humanly editable (two of the features I was suggesting for it), it is messy and very verbose for this purpose. Note that you are not required to maintain your configuration data in a postgresql.conf-formatted file. You can keep it anywhere you like, GUI around in it, and convert it back to the required format. Most of the I think it is not a very good idea to encourage GUI tools or tools to auto-configure postgres to use a separate configuration file and then convert it to postgresql.conf. That introduces a duplicity with evil problems if either source of data is modified out-of-the-expected-way. That's why I'm suggesting a config file that is, at the same time, usable by both postgres and other external tools. That also enables other features such as editing the config file persistently through a SQL session. metadata is available through postgres --describe-config, which is the result of a previous attempt in this area, which never really went anywhere. It's not like there are a bunch of GUI and autotuning tools that people are dying to use or developers are dying to create, but couldn't because editing configuration files programmatically is hard. It might be a chicken-and-egg problem. Maybe it's hard and futile to write this config tools since postgresql.conf doesn't support the required features. I don't know how to measure the "interest of people" but I have seen many comments on this mailing list about features like this. IMHO it would be a great addition :) Let's also not forget the two main use cases (arguably) of the configuration files: hand editing, and generation by configuration management tools. Anything that makes these two harder is not going to be well-received. 100% agreed :) That's why I suggested that the format of the config file should adhere to the requisites a) to e) mentioned on my original email (http://www.postgresql.org/message-id/529b8d01.6060...@nosys.es). Would it be well-received a new file format that keeps it simple for both hand editing and generation of the configuration, and at the same time offers the features I have mentioned? Thanks for your comments, aht -- Álvaro Hernández Tortosa --- NOSYS Networked Open SYStems -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On Tue, Dec 3, 2013 at 5:57 PM, Tom Dunstan wrote: > On 4 December 2013 01:24, Robert Haas wrote: >> Yeah, more or less, but the key is ensuring that it wouldn't let you >> create the constraint in the first place if the partial index >> specified *didn't* match the WHERE clause. For example, suppose the >> partial index says WHERE parent_entity = 'event' but the constraint >> definition is WHERE parent_event = 'somethingelse'. That ought to >> fail, just as creating a regular foreign constraint will fail if >> there's no matching unique index. > > The where clause only applies to queries against the FK table, and we > don’t currently fail if there isn’t a matching index on the fk column > when creating a FK (I’ve been bitten by that before). > > We fail if there isn’t a unique index on the referenced > table/column(s), but queries against that table on insert/update not > the FK table are unchanged (save that we don’t bother with them at all > if the where clause expression fails for the given tuple). Oh. I misinterpreted what this feature was about, then. I thought it was about restricting the reference to a subset of the *referenced* table, but it seems to be about restricting the constraint to a subset of the *referencing* table. I guess they're both useful, but the syntax... REFERENCES tab(col) WHERE (stuff) ...sure looks like the WHERE clause is syntactically associated with the table being referenced. What would we do if we eventually wanted to support both variants? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
On Wed, Dec 4, 2013 at 9:19 AM, Metin Doslu wrote: > > Here are the results of "vmstat 1" while running 8 parallel TPC-H Simple > (#6) queries: Although there is no need for I/O, "wa" fluctuates between 0 > and 1. > > procs ---memory-- ---swap-- -io --system-- > -cpu- > r b swpd free buffcache si sobiboin cs us > sy id wa st > 0 0 0 30093568 84892 3872389600 0 022 14 0 > 0 100 0 0 > 8 1 0 30043056 84892 3872389600 0 0 27080 52708 16 > 14 70 0 0 > 8 1 0 30006600 84892 3872389600 0 0 44952 118286 43 > 44 12 1 0 > 8 0 0 29986264 84900 3872389600 020 28043 95934 49 > 42 8 1 0 > 7 0 0 29991976 84900 3872389600 0 0 8308 73641 52 > 42 6 0 0 > 0 0 0 30091828 84900 3872389600 0 0 3996 30978 23 > 24 53 0 0 > 0 0 0 30091968 84900 3872389600 0 01723 0 > 0 100 0 0 Notice the huge %sy What kind of VM are you using? HVM or paravirtual? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 14:27:10 -0200, Claudio Freire wrote: > On Wed, Dec 4, 2013 at 9:19 AM, Metin Doslu wrote: > > > > Here are the results of "vmstat 1" while running 8 parallel TPC-H Simple > > (#6) queries: Although there is no need for I/O, "wa" fluctuates between 0 > > and 1. > > > > procs ---memory-- ---swap-- -io --system-- > > -cpu- > > r b swpd free buffcache si sobiboin cs us > > sy id wa st > > 0 0 0 30093568 84892 3872389600 0 022 14 0 > > 0 100 0 0 > > 8 1 0 30043056 84892 3872389600 0 0 27080 52708 16 > > 14 70 0 0 > > 8 1 0 30006600 84892 3872389600 0 0 44952 118286 43 > > 44 12 1 0 > > 8 0 0 29986264 84900 3872389600 020 28043 95934 49 > > 42 8 1 0 > > 7 0 0 29991976 84900 3872389600 0 0 8308 73641 52 > > 42 6 0 0 > > 0 0 0 30091828 84900 3872389600 0 0 3996 30978 23 > > 24 53 0 0 > > 0 0 0 30091968 84900 3872389600 0 01723 0 > > 0 100 0 0 > > > Notice the huge %sy My bet is on transparent hugepage defragmentation. Alternatively it's scheduler overhead, due to superflous context switches around the buffer mapping locks. I'd strongly suggest doing a "perf record -g -a ; perf report" run to check what's eating up the time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
>Notice the huge %sy >What kind of VM are you using? HVM or paravirtual? This instance is paravirtual.
Re: [HACKERS] Parallel Select query performance and shared buffers
> I'd strongly suggest doing a "perf record -g -a ; > perf report" run to check what's eating up the time. Here is one example: + 38.87% swapper [kernel.kallsyms] [k] hypercall_page + 9.32% postgres [kernel.kallsyms] [k] hypercall_page + 6.80% postgres [kernel.kallsyms] [k] xen_set_pte_at + 5.83% postgres [kernel.kallsyms] [k] copy_user_generic_string + 2.06% postgres [kernel.kallsyms] [k] file_read_actor + 1.89% postgres postgres[.] heapgettup_pagemode + 1.83% postgres postgres[.] hash_search_with_hash_value + 1.33% postgres [kernel.kallsyms] [k] get_phys_to_machine + 1.25% postgres [kernel.kallsyms] [k] find_get_page + 1.00% postgres postgres[.] heapgetpage + 0.99% postgres [kernel.kallsyms] [k] radix_tree_lookup_element + 0.98% postgres postgres[.] advance_aggregates + 0.96% postgres postgres[.] ExecProject + 0.94% postgres postgres[.] advance_transition_function + 0.88% postgres postgres[.] ExecScan + 0.87% postgres postgres[.] HeapTupleSatisfiesMVCC + 0.86% postgres postgres[.] LWLockAcquire + 0.82% postgres [kernel.kallsyms] [k] put_page + 0.82% postgres postgres[.] MemoryContextReset + 0.80% postgres postgres[.] SeqNext + 0.78% postgres [kernel.kallsyms] [k] pte_mfn_to_pfn + 0.69% postgres postgres[.] ExecClearTuple + 0.57% postgres postgres[.] ExecProcNode + 0.54% postgres postgres[.] heap_getnext + 0.53% postgres postgres[.] LWLockRelease + 0.53% postgres postgres[.] ExecStoreTuple + 0.51% postgres libc-2.12.so[.] __GI___libc_read + 0.42% postgres [kernel.kallsyms] [k] xen_spin_lock + 0.40% postgres postgres[.] ReadBuffer_common + 0.38% postgres [kernel.kallsyms] [k] __do_fault + 0.37% postgres [kernel.kallsyms] [k] shmem_fault + 0.37% postgres [kernel.kallsyms] [k] unmap_single_vma + 0.35% postgres [kernel.kallsyms] [k] __wake_up_bit + 0.33% postgres postgres[.] StrategyGetBuffer + 0.33% postgres [kernel.kallsyms] [k] set_page_dirty + 0.33% postgres [kernel.kallsyms] [k] handle_pte_fault + 0.33% postgres postgres[.] ExecAgg + 0.31% postgres postgres[.] XidInMVCCSnapshot + 0.31% postgres [kernel.kallsyms] [k] __audit_syscall_entry + 0.31% postgres postgres[.] CheckForSerializableConflictOut + 0.29% postgres [kernel.kallsyms] [k] handle_mm_fault + 0.25% postgres [kernel.kallsyms] [k] shmem_getpage_gfp On Wed, Dec 4, 2013 at 6:33 PM, Andres Freund wrote: > On 2013-12-04 14:27:10 -0200, Claudio Freire wrote: > > On Wed, Dec 4, 2013 at 9:19 AM, Metin Doslu wrote: > > > > > > Here are the results of "vmstat 1" while running 8 parallel TPC-H > Simple > > > (#6) queries: Although there is no need for I/O, "wa" fluctuates > between 0 > > > and 1. > > > > > > procs ---memory-- ---swap-- -io --system-- > > > -cpu- > > > r b swpd free buffcache si sobiboin > cs us sy id wa st > > > 0 0 0 30093568 84892 3872389600 0 022 > 14 0 0 100 0 0 > > > 8 1 0 30043056 84892 3872389600 0 0 27080 > 52708 16 14 70 0 0 > > > 8 1 0 30006600 84892 3872389600 0 0 44952 > 118286 43 44 12 1 0 > > > 8 0 0 29986264 84900 3872389600 020 28043 > 95934 49 42 8 1 0 > > > 7 0 0 29991976 84900 3872389600 0 0 8308 > 73641 52 42 6 0 0 > > > 0 0 0 30091828 84900 3872389600 0 0 3996 > 30978 23 24 53 0 0 > > > 0 0 0 30091968 84900 3872389600 0 017 > 23 0 0 100 0 0 > > > > > > Notice the huge %sy > > My bet is on transparent hugepage defragmentation. Alternatively it's > scheduler overhead, due to superflous context switches around the buffer > mapping locks. > > I'd strongly suggest doing a "perf record -g -a ; > perf report" run to check what's eating up the time. > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On 12/04/2013 11:25 AM, Robert Haas wrote: On Tue, Dec 3, 2013 at 5:57 PM, Tom Dunstan wrote: On 4 December 2013 01:24, Robert Haas wrote: Yeah, more or less, but the key is ensuring that it wouldn't let you create the constraint in the first place if the partial index specified *didn't* match the WHERE clause. For example, suppose the partial index says WHERE parent_entity = 'event' but the constraint definition is WHERE parent_event = 'somethingelse'. That ought to fail, just as creating a regular foreign constraint will fail if there's no matching unique index. The where clause only applies to queries against the FK table, and we don’t currently fail if there isn’t a matching index on the fk column when creating a FK (I’ve been bitten by that before). We fail if there isn’t a unique index on the referenced table/column(s), but queries against that table on insert/update not the FK table are unchanged (save that we don’t bother with them at all if the where clause expression fails for the given tuple). Oh. I misinterpreted what this feature was about, then. I thought it was about restricting the reference to a subset of the *referenced* table, but it seems to be about restricting the constraint to a subset of the *referencing* table. I guess they're both useful, but the syntax... REFERENCES tab(col) WHERE (stuff) ...sure looks like the WHERE clause is syntactically associated with the table being referenced. What would we do if we eventually wanted to support both variants? Well I guess we could say something like: FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE (b-condition) But it's somewhat ugly. The case of restricting the allowed referent rows does look slightly like a solution in search of a problem, but maybe that's just because I haven't thought of a use for it yet. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 18:43:35 +0200, Metin Doslu wrote: > > I'd strongly suggest doing a "perf record -g -a ; > > perf report" run to check what's eating up the time. > > Here is one example: > > + 38.87% swapper [kernel.kallsyms] [k] hypercall_page > + 9.32% postgres [kernel.kallsyms] [k] hypercall_page > + 6.80% postgres [kernel.kallsyms] [k] xen_set_pte_at All that time is spent in your virtualization solution. One thing to try is to look on the host system, sometimes profiles there can be more meaningful. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On Wed, Dec 4, 2013 at 11:44 AM, Andrew Dunstan wrote: >> Oh. I misinterpreted what this feature was about, then. I thought it >> was about restricting the reference to a subset of the *referenced* >> table, but it seems to be about restricting the constraint to a subset >> of the *referencing* table. I guess they're both useful, but the >> syntax... >> >> REFERENCES tab(col) WHERE (stuff) >> >> ...sure looks like the WHERE clause is syntactically associated with >> the table being referenced. What would we do if we eventually wanted >> to support both variants? >> > > Well I guess we could say something like: > >FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE >(b-condition) > > But it's somewhat ugly. > > The case of restricting the allowed referent rows does look slightly like a > solution in search of a problem, but maybe that's just because I haven't > thought of a use for it yet. Interestingly, the variant for which you can't think of a use case is the one I've missed most. Typical examples in my experience are things like project.project_manager_id references person (id) where person.is_project_manager, or alert (device_id) references device (id) where not device.deleted. Different strokes for different folks, I guess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
src/backend/access/transam/xlog.c:5889: trailing whitespace. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On 12/04/2013 12:00 PM, Robert Haas wrote: On Wed, Dec 4, 2013 at 11:44 AM, Andrew Dunstan wrote: Oh. I misinterpreted what this feature was about, then. I thought it was about restricting the reference to a subset of the *referenced* table, but it seems to be about restricting the constraint to a subset of the *referencing* table. I guess they're both useful, but the syntax... REFERENCES tab(col) WHERE (stuff) ...sure looks like the WHERE clause is syntactically associated with the table being referenced. What would we do if we eventually wanted to support both variants? Well I guess we could say something like: FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE (b-condition) But it's somewhat ugly. The case of restricting the allowed referent rows does look slightly like a solution in search of a problem, but maybe that's just because I haven't thought of a use for it yet. Interestingly, the variant for which you can't think of a use case is the one I've missed most. Typical examples in my experience are things like project.project_manager_id references person (id) where person.is_project_manager, or alert (device_id) references device (id) where not device.deleted. OK, those make sense. I wonder whether this should be done via a USING clause on the constraint that pointed to the partial unique index. Or would that be too obscure? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane wrote: > Magnus Hagander writes: > > On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane wrote: > >> Maybe we should just bite the bullet and change the WAL format for > >> heap_freeze (inventing an all-new record type, not repurposing the old > >> one, and allowing WAL replay to continue to accept the old one). The > >> implication for users would be that they'd have to update slave servers > >> before the master when installing the update; which is unpleasant, but > >> better than living with a known data corruption case. > > > Agreed. It may suck, but it sucks less. > > > How badly will it break if they do the upgrade in the wrong order though. > > Will the slaves just stop (I assume this?) or is there a risk of a > > wrong-order upgrade causing extra breakage? > > I assume what would happen is the slave would PANIC upon seeing a WAL > record code it didn't recognize. Installing the updated version should > allow it to resume functioning. Would be good to test this, but if it > doesn't work like that, that'd be another bug to fix IMO. We've always > foreseen the possible need to do something like this, so it ought to > work reasonably cleanly. > > I wonder if we should for the future have the START_REPLICATION command (or the IDENTIFY_SYSTEM would probably make more sense - or even adding a new command like IDENTIFY_CLIENT. The point is, something in the replication protocol) have walreceiver include it's version sent to the master. That way we could have the walsender identify a walreceiver that's too old and disconnect it right away - with a much nicer error message than a PANIC. Right now, walreceiver knows the version of the walsender (through pqserverversion), but AFAICT there is no way for the walsender to know which version of the receiver is connected. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Extension Templates S03E11
On Wed, 2013-12-04 at 09:50 -0500, Stephen Frost wrote: > > I still don't see that Extension Templates are all bad: > > * They preserve the fact that two instances of the same extension > > (e.g. in different databases) were created from the same template. > > This is only true if we change the extension templates to be shared > catalogs, which they aren't today.. I agree with you about that -- I don't like per-DB templates. I guess the challenge is that we might want to use namespaces to support user-installable extensions, and namespaces reside within a DB. But I think we can find some other solution there (e.g. user names rather than schemas), and per-DB templates are just not a good solution anyway. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 04:33 PM, Jonathan Corbet wrote: > On Tue, 03 Dec 2013 10:44:15 -0800 > Josh Berkus wrote: > >> It seems clear that Kernel.org, since 2.6, has been in the business of >> pushing major, hackish, changes to the IO stack without testing them or >> even thinking too hard about what the side-effects might be. This is >> perhaps unsurprising given that two of the largest sponsors of the >> Kernel -- who, incidentally, do 100% of the performance testing -- don't >> use the IO stack. >> >> This says to me that Linux will clearly be an undependable platform in >> the future with the potential to destroy PostgreSQL performance without >> warning, leaving us scrambling for workarounds. Too bad the >> alternatives are so unpopular. > > Wow, Josh, I'm surprised to hear this from you. > > The active/inactive list mechanism works great for the vast majority of > users. The second-use algorithm prevents a lot of pathological behavior, > like wiping out your entire cache by copying a big file or running a > backup. We *need* that kind of logic in the kernel. > > Now, back in 2012, Johannes (working for one of those big contributors) > hit upon an issue where second-use falls down. So he set out to fix it: > > https://lwn.net/Articles/495543/ > > This code has been a bit slow getting into the mainline for a few reasons, > but one of the chief ones is this: nobody is saying from the sidelines > that they need it! If somebody were saying "Postgres would work a lot > better with this code in place" and had some numbers to demonstrate that, > we'd be far more likely to see it get into an upcoming release. > > In the end, Linux is quite responsive to the people who participate in its > development, even as testers and bug reporters. It responds rather less > well to people who find problems in enterprise kernels years later, > granted. > > The amount of automated testing, including performance testing, has > increased markedly in the last couple of years. I bet that it would not > be hard at all to get somebody like Fengguang Wu to add some > Postgres-oriented I/O tests to his automatic suite: > > https://lwn.net/Articles/571991/ > > Then we would all have a much better idea of how kernel releases are > affecting one of our most important applications; developers would pay > attention to that information. hmm interesting tool, I can see how that would be very useful "for early warning" style detection on the kernel development side using a small set of postgresql "benchmarks". That would basically help with part of Josh complained that it will take ages for regressions to be detected. >From postgresqls pov we would also need additional long term and more complex testing spanning different postgresql version on various distribution platforms (because that is what people deploy in production, hand built git-fetched kernels are rare) using tests that both might have extended runtimes and/or require external infrastructure > > Or you could go off and do your own thing, but I believe that would leave > us all poorer. fully agreed Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, 2013-12-03 at 14:31 -0500, Tom Lane wrote: > Stephen Frost writes: > > When it comes to dump/reload, I'd much rather see a mechanism which uses > > our deep understanding of the extension's objects (as database objects) > > to implement the dump/reload than a text blob which is carried forward > > from major version to major version and may even fail to run. > > Note that we're already doing that in the binary_upgrade code path. > I agree that generalizing that approach sounds like a better idea > than keeping a text blob around. The reason for doing it that way in pg_upgrade was to preserve OIDs for types, etc.: http://www.postgresql.org/message-id/20783.1297184...@sss.pgh.pa.us That doesn't seem to apply to ordinary dump/reload. Do you think it's good for other reasons, as well? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On Wed, Dec 4, 2013 at 12:18 PM, Andrew Dunstan wrote: >> Interestingly, the variant for which you can't think of a use case is >> the one I've missed most. Typical examples in my experience are >> things like project.project_manager_id references person (id) where >> person.is_project_manager, or alert (device_id) references device (id) >> where not device.deleted. >> > OK, those make sense. I wonder whether this should be done via a USING > clause on the constraint that pointed to the partial unique index. Or would > that be too obscure? I wondered that, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changes in Trigger Firing
> > > > > > CreateTrigStmt is passed to CreateTrigger function as an arguement. I am > > struggling to understand how the values for various members of trigger > are > > set and where [which file] calls CreateTrigStmt. > > > > > > Can someone provide some help on this? > > I think you need better tools to guide you in exploring the source code. > For example, you can use cscope to tell you where is CreateTrigStmt > used, and you would find gram.y; and use it to tell you where > CreateTrigger is used, and you would find utility.c. > > Thanks for your advice. I was relying on PostgreSQL documentation which was quite helpful so far. Let me try some development tool. > Any half-decent code editor should be able to generate a "database" of > symbols and let you frolic around the various files quickly. Without > that, anyone would be completely lost in developing new features of even > the lowest complexity. > Got the point!
Re: [HACKERS] Parallel Select query performance and shared buffers
On Wed, Dec 4, 2013 at 1:54 PM, Andres Freund wrote: > On 2013-12-04 18:43:35 +0200, Metin Doslu wrote: >> > I'd strongly suggest doing a "perf record -g -a ; >> > perf report" run to check what's eating up the time. >> >> Here is one example: >> >> + 38.87% swapper [kernel.kallsyms] [k] hypercall_page >> + 9.32% postgres [kernel.kallsyms] [k] hypercall_page >> + 6.80% postgres [kernel.kallsyms] [k] xen_set_pte_at > > All that time is spent in your virtualization solution. One thing to try > is to look on the host system, sometimes profiles there can be more > meaningful. You cannot profile the host on EC2. You could try HVM. I've noticed it fare better under heavy CPU load, and it's not fully-HVM (it still uses paravirtualized network and I/O). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
> You could try HVM. I've noticed it fare better under heavy CPU load, > and it's not fully-HVM (it still uses paravirtualized network and > I/O). I already tried with HVM (cc2.8xlarge instance on Amazon EC2) and observed same problem.
Re: [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 16:00:40 -0200, Claudio Freire wrote: > On Wed, Dec 4, 2013 at 1:54 PM, Andres Freund wrote: > > All that time is spent in your virtualization solution. One thing to try > > is to look on the host system, sometimes profiles there can be more > > meaningful. > > You cannot profile the host on EC2. Didn't follow the thread from the start. So, this is EC2? Have you checked, with a recent enough version of top or whatever, how much time is reported as "stolen"? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
> Didn't follow the thread from the start. So, this is EC2? Have you > checked, with a recent enough version of top or whatever, how much time > is reported as "stolen"? Yes, this EC2. "stolen" is randomly reported as 1, mostly as 0.
Re: [HACKERS] Parallel Select query performance and shared buffers
Here are some extra information: - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism? - I tried this test with 4 core machines including my personel computer and some other instances on Amazon EC2, I didn't see this problem with 4 core machines. I started to see this problem in PostgreSQL when core count is 8 or more. - Here are the results of "vmstat 1" while running 8 parallel select count(*). Normally I would expect zero idle time. procs ---memory-- ---swap-- -io --system-- -cpu- r b swpd free buff cache si sobibo in cs us sy id wa st 0 0 0 29838640 94000 3895474000 0 0 22 21 0 0 100 0 0 7 2 0 29788416 94000 3895474000 0 0 53922 108490 14 24 60 1 1 5 0 0 29747248 94000 3895474000 0 0 68008 164571 22 48 27 2 1 8 0 0 29725796 94000 3895474000 0 0 43587 150574 28 54 16 1 1 0 0 0 29838328 94000 3895474000 0 0 15584 100459 26 55 18 1 0 0 0 0 29838328 94000 3895474000 0 0 42 15 0 0 100 0 0 - When I run 8 parallel wc command or other scripts, they scale out as expected and they utilize all cpu. This leads me to think that problem is related with PostgreSQL instead of OS.
Re: [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 20:19:55 +0200, Metin Doslu wrote: > - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is > disappeared for 8 core machines and come back with 16 core machines on > Amazon EC2. Would it be related with PostgreSQL locking mechanism? You could try my lwlock-scalability improvement patches - for some workloads here, the improvements have been rather noticeable. Which version are you testing? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Select query performance and shared buffers
> You could try my lwlock-scalability improvement patches - for some > workloads here, the improvements have been rather noticeable. Which > version are you testing? I'm testing with PostgreSQL 9.3.1.
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 07:32 AM, Stefan Kaltenbrunner wrote: On 12/04/2013 04:30 PM, Peter Eisentraut wrote: On 12/4/13, 2:14 AM, Stefan Kaltenbrunner wrote: running a few kvm instances that get bootstrapped automatically is something that is a solved problem. Is it sound to run performance tests on kvm? as sounds as on any other platform imho, the performance characteristics will differ between bare metal or other virtualisation platforms but the future is virtual and that is what a lot of stuff runs on... In actuality you need both. We need to know what the kernel is going to do on bare metal. For example, 3.2 to 3.8 are total crap for random IO access. We will only catch that properly from bare metal tests or at least, we will only catch it easily on bare metal tests. If we know the standard bare metal tests are working then the next step up would be to test virtual. BTW: Virtualization is only one future and it is still a long way off from serving the needs that bare metal serves at the same level (speaking PostgreSQL specifically). JD Stefan -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 07:30 PM, Joshua D. Drake wrote: > > On 12/04/2013 07:32 AM, Stefan Kaltenbrunner wrote: >> >> On 12/04/2013 04:30 PM, Peter Eisentraut wrote: >>> On 12/4/13, 2:14 AM, Stefan Kaltenbrunner wrote: running a few kvm instances that get bootstrapped automatically is something that is a solved problem. >>> >>> Is it sound to run performance tests on kvm? >> >> as sounds as on any other platform imho, the performance characteristics >> will differ between bare metal or other virtualisation platforms but the >> future is virtual and that is what a lot of stuff runs on... > > In actuality you need both. We need to know what the kernel is going to > do on bare metal. For example, 3.2 to 3.8 are total crap for random IO > access. We will only catch that properly from bare metal tests or at > least, we will only catch it easily on bare metal tests. > > If we know the standard bare metal tests are working then the next step > up would be to test virtual. > > BTW: Virtualization is only one future and it is still a long way off > from serving the needs that bare metal serves at the same level > (speaking PostgreSQL specifically). we need to get that off the ground - and whatever makes it easier to get off the ground will help. and if we solve the automation for virtualisation, bare metal is just a small step away (or the other way round). Getting comparable performance levels between either different postgresql versions (or patches) or different operating systems with various workloads is probably more valuable now that getting absolute peak performance levels under specific tests long term. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Mon, Dec 2, 2013 at 10:26 PM, David Fetter wrote: > On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote: >> On 11/28/2013 03:24 AM, David Fetter wrote: >> > WITH, or SRF, or whatever, the point is that we need to be able to >> > specify what we're sending--probably single opaque strings delimited >> > just as we do other strings--and what we might get back--errors only, >> > rows, [sets of] refcursors are the ones I can think of offhand. >> >> So, you're thinking of something like: >> >> WITH FOREIGN somecte AS $$... foreign query ...$$ >> SELECT ... >> FROM somecte; > > I was picturing something a little more like an SRF which would take > one opaque string, the remote command, some descriptor, perhaps an > enum, of what if anything might come back. Long ago, I implemented a > similar thing in DBI-Link. It was called > > remote_exec_dbh(data_source_id integer, query text, returns_rows bool) Couple thoughts: *) Any 'pass through' API should support parameterization (the FDW may not support that, but many will and API should allow for it). Lack of parameterization is a major downside of dblink. The function could be set up to be variadic for the parameters. *) For a connectivity APIs of this style, Dblink-ish mechanic of separating command execution from data returning commands is likely the right way to go. Also, probably better to stick with SRF mechanics if we go the 'function route'. So basically we are making dblink for FDW, adding parameterization and some concept of utilizing the foreign server. All this is assuming we are adding a special remote execution function ('fdwlink'). While that would be great, it's a significant deviation from the standard into postgresql specific SRF syntax. If some of the qual pushdown deparsing functionality could be put inside the internal FDW API, then you'd get the best of both worlds. Maybe you'd still want a dblink style extension anyways, but it wouldn't be as critical. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: programmable file format for postgresql.conf
On 12/4/13, 11:22 AM, Álvaro Hernández Tortosa wrote: > Would it be well-received a new file format that keeps it simple for > both hand editing and generation of the configuration, and at the same > time offers the features I have mentioned? I don't see how that would work exactly: You want to add various kinds of complex metadata to the configuration file, but make that metadata optional at the same time. The immediate result will be that almost no one will supply the optional metadata, and no tools will be able to rely on their presence. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: programmable file format for postgresql.conf
On 04/12/13 19:49, Peter Eisentraut wrote: On 12/4/13, 11:22 AM, Álvaro Hernández Tortosa wrote: Would it be well-received a new file format that keeps it simple for both hand editing and generation of the configuration, and at the same time offers the features I have mentioned? I don't see how that would work exactly: You want to add various kinds of complex metadata to the configuration file, but make that metadata optional at the same time. The immediate result will be that almost no one will supply the optional metadata, and no tools will be able to rely on their presence. I wouldn't say the metadata is "complex". Looks quite familiar to that of pg_settings (besides that, it was just a brainstorming, not a formal proposal). The optional fields are basically NULLABLE attributes in pg_settings. That is, they only make sense depending on other values (in this case, the parameter name). All of the attributes that are required for tools to work are marked as non optional. So optional fields are either purely optional (i.e., only for tools that want to use them; everyone else may ignore, but preserve, them) and some other are just NULLABLEs, depending on the parameter). In any case, my idea is just to open up the question and search for the best possible set of data to be represented, and then, the best possible syntax / file format for it. aht -- Álvaro Hernández Tortosa --- NOSYS Networked Open SYStems -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 07:33 AM, Jonathan Corbet wrote: > Wow, Josh, I'm surprised to hear this from you. Well, I figured it was too angry to propose for an LWN article. ;-) > The active/inactive list mechanism works great for the vast majority of > users. The second-use algorithm prevents a lot of pathological behavior, > like wiping out your entire cache by copying a big file or running a > backup. We *need* that kind of logic in the kernel. There's a large body of research on 2Q algorithms going back to the 80s, which is what this is. As far as I can tell, the modification was performed without any reading of this research, since that would have easily shown that 50/50 was unlikely to be a good division, and that in fact there is nothing which would work except a tunable setting, because workloads are different. Certainly the "what happens if a single file is larger than the entire recency bucket" question is addressed and debated. As an example, PostgreSQL would want to shrink the frequency list to 0%, because we already implement our own frequency list, and we already demonstrated back in version 8.1 that a 3-list system was ineffective. I can save Johannes some time: don't implement ARC. Not only is it under IBM patent, it's not effective in real-world situations. Both Postgres and Apache tried it in the early aughts. However, this particular issue concerns me less than the general attitude that it's OK to push in experimental IO changes which can't be disabled by users into release kernels, as exemplified by several problematic and inadequately tested IO changes in the 3.X kernels -- most notably the pdflush bug. It speaks of a policy that the Linux IO stack is not production software, and it's OK to tinker with it in ways that break things for many users. I also wasn't exaggerating the reception I got when I tried to talk about IO and PostgreSQL at LinuxCon and other events. The majority of Linux hackers I've talked to simply don't want to be bothered with PostgreSQL's performance needs, and I've heard similar things from my collegues at the MySQL variants. Greg KH was the only real exception. Heck, I went to a meeting of filesystem geeks at LinuxCon and the main feedback I received, from Linux FS developers (Chris and Ted), was "PostgreSQL should implement its own storage and use DirectIO, we don't know why you're even trying to use the Linux IO stack." That's why I gave up on working through community channels; I face enough uphill battles in *this* project. > This code has been a bit slow getting into the mainline for a few reasons, > but one of the chief ones is this: nobody is saying from the sidelines > that they need it! If somebody were saying "Postgres would work a lot > better with this code in place" and had some numbers to demonstrate that, > we'd be far more likely to see it get into an upcoming release. Well, Citus did that; do you need more evidence? > In the end, Linux is quite responsive to the people who participate in its > development, even as testers and bug reporters. It responds rather less > well to people who find problems in enterprise kernels years later, > granted. All infrastructure software, including Postgres, has the issue that most enterprise users are using a version which was released years ago. As a result, some performance issues simply aren't going to be found until that version has been out for a couple of years. This leads to a Catch-22: enterprise users are reluctant to upgrade because of potential performance regressions, and as a result the median "enterprise" version gets further and further behind current development, and as a result the performance regressions are never fixed. We encounter this in PostgreSQL (I have customers who are still on 8.4 or 9.1 because of specific regressions), and it's even worse in the Linux world, where RHEL is still on 2.6. We work really hard to avoid performance regressions in Postgres versions, because we know we can't test for them adequately, and often can't fix them in release versions after the fact. But you know what? 2.6, overall, still performs better than any kernel in the 3.X series, at least for Postgres. > The amount of automated testing, including performance testing, has > increased markedly in the last couple of years. I bet that it would not > be hard at all to get somebody like Fengguang Wu to add some > Postgres-oriented I/O tests to his automatic suite: > > https://lwn.net/Articles/571991/ > > Then we would all have a much better idea of how kernel releases are > affecting one of our most important applications; developers would pay > attention to that information. Oh, good! I was working with Greg on having an automated pgBench run, but doing it on Wu's testing platform would be even better. I still need to get some automated stats digestion, since I want to at least make sure that the tests would show the three major issues which we encountered in recent Linux kernels
Re: [HACKERS] WITHIN GROUP patch
> "Tom" == Tom Lane writes: Tom> Well, okay, but you've not said anything that wouldn't be Tom> handled just as well by some logic that adds a fixed Tom> integer-constant-zero flag column to the rows going into the Tom> tuplesort. Adding such a column unconditionally even for non-hypothetical functions would break the optimization for sorting a single column (which is a big deal, something like 3x speed difference, for by-value types). Adding it only for hypothetical set functions is making a distinction in how functions are executed that I don't think is warranted - imagine for example a function that calculates some measure over a frequency distribution by adding a known set of boundary values to the sort; this would not be a hypothetical set function in terms of argument processing, but it would still benefit from the extra sort column. I did not want to unnecessarily restrict such possibilities. >> It would still be overloaded in some sense because a non-hypothetical >> ordered set function could still take an arbitrary number of args >> (using variadic "any") - there aren't any provided, but there's no >> good reason to disallow user-defined functions doing that - so you'd >> still need a special value like -1 for aggordnargs to handle that. Tom> Sure. But a -1 to indicate "not applicable" doesn't seem like it's Tom> too much of a stretch. It's the -2 business that's bothering me. Tom> Again, that seems unnecessarily non-orthogonal --- who's to say which Tom> functions would want to constrain the number of direct arguments and Tom> which wouldn't? (I wonder whether having this info in the catalogs Tom> isn't the wrong thing anyhow, as opposed to expecting the functions Tom> themselves to check the argument count at runtime.) Not checking the number of arguments to a function until runtime seems a bit on the perverse side. Having a fixed number of direct args is the "normal" case (as seen from the fact that all the non-hypothetical ordered set functions in the spec and in our patch have fixed argument counts). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Dec 04, 2013 at 12:43:44PM -0600, Merlin Moncure wrote: > On Mon, Dec 2, 2013 at 10:26 PM, David Fetter wrote: > > On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote: > >> On 11/28/2013 03:24 AM, David Fetter wrote: > >> > WITH, or SRF, or whatever, the point is that we need to be able to > >> > specify what we're sending--probably single opaque strings delimited > >> > just as we do other strings--and what we might get back--errors only, > >> > rows, [sets of] refcursors are the ones I can think of offhand. > >> > >> So, you're thinking of something like: > >> > >> WITH FOREIGN somecte AS $$... foreign query ...$$ > >> SELECT ... > >> FROM somecte; > > > > I was picturing something a little more like an SRF which would take > > one opaque string, the remote command, some descriptor, perhaps an > > enum, of what if anything might come back. Long ago, I implemented a > > similar thing in DBI-Link. It was called > > > > remote_exec_dbh(data_source_id integer, query text, returns_rows bool) > > Couple thoughts: > *) Any 'pass through' API should support parameterization (the FDW may > not support that, but many will and API should allow for it). Lack > of parameterization is a major downside of dblink. The function could > be set up to be variadic for the parameters. I don't know for sure that that needs to be in version 1 of this. It definitely shouldn't block implementing the non-parameterized one. > *) For a connectivity APIs of this style, Dblink-ish mechanic of > separating command execution from data returning commands is likely > the right way to go. Also, probably better to stick with SRF > mechanics if we go the 'function route'. So basically we are making > dblink for FDW, adding parameterization and some concept of utilizing > the foreign server. Yes, modulo the above. > All this is assuming we are adding a special remote execution function > ('fdwlink'). While that would be great, it's a significant deviation > from the standard into postgresql specific SRF syntax. What the standard has is literally insane. > If some of the qual pushdown deparsing functionality could be put > inside the internal FDW API, then you'd get the best of both worlds. If this were flawless on the PostgreSQL side (i.e. our path generator understood everything perfectly including aggregates) and trivial to implement correctly in FDWs, certainly. The idea here is that such a happy situation will not obtain until much later, if ever, and meanwhile, we need a way to get things accomplished even if it's inelegant, inefficient, etc. The alternative is that those things simply will not get accomplished at all. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
Andrew Dunstan writes: >>> Well I guess we could say something like: >>> >>> FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE >>> (b-condition) >>> >>> But it's somewhat ugly. > OK, those make sense. I wonder whether this should be done via a USING > clause on the constraint that pointed to the partial unique index. Or > would that be too obscure? I like what you have above. Yeah, it requires the more verbose syntax for declaring a foreign key, but this feature is not going to be so heavily used that anyone will be in danger of worsening their carpal tunnel syndrome. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On 12/04/2013 07:33 AM, Jonathan Corbet wrote: Wow, Josh, I'm surprised to hear this from you. The active/inactive list mechanism works great for the vast majority of users. The second-use algorithm prevents a lot of pathological behavior, like wiping out your entire cache by copying a big file or running a backup. We *need* that kind of logic in the kernel. The amount of automated testing, including performance testing, has increased markedly in the last couple of years. I bet that it would not be hard at all to get somebody like Fengguang Wu to add some Postgres-oriented I/O tests to his automatic suite: https://lwn.net/Articles/571991/ Then we would all have a much better idea of how kernel releases are affecting one of our most important applications; developers would pay attention to that information. Or you could go off and do your own thing, but I believe that would leave us all poorer. Thank you for your very well thought out, and knowledgeable response. This is certainly helpful and highlights what a lot of us were already stating. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
Magnus Hagander writes: > On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane wrote: >> I assume what would happen is the slave would PANIC upon seeing a WAL >> record code it didn't recognize. > I wonder if we should for the future have the START_REPLICATION command (or > the IDENTIFY_SYSTEM would probably make more sense - or even adding a new > command like IDENTIFY_CLIENT. The point is, something in the replication > protocol) have walreceiver include it's version sent to the master. That > way we could have the walsender identify a walreceiver that's too old and > disconnect it right away - with a much nicer error message than a PANIC. Meh. That only helps for the case of streaming replication, and not for the thirty-seven other ways that some WAL might arrive at something that wants to replay it. It might be worth doing anyway, but I can't get excited about it for this scenario. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Jeff Davis writes: > On Tue, 2013-12-03 at 14:31 -0500, Tom Lane wrote: >> Stephen Frost writes: >>> When it comes to dump/reload, I'd much rather see a mechanism which uses >>> our deep understanding of the extension's objects (as database objects) >>> to implement the dump/reload than a text blob which is carried forward >>> from major version to major version and may even fail to run. >> Note that we're already doing that in the binary_upgrade code path. >> I agree that generalizing that approach sounds like a better idea >> than keeping a text blob around. > The reason for doing it that way in pg_upgrade was to preserve OIDs for > types, etc.: That was *a* reason, but not the only one, I believe. > That doesn't seem to apply to ordinary dump/reload. Do you think it's > good for other reasons, as well? I think Stephen has already argued why it could be a good idea here. But in a nutshell: it seems like there are two use-cases to be supported, one where you want "CREATE EXTENSION hstore" to give you some appropriate version of hstore, and one where you want to restore exactly what you had on the previous installation. It seems to me that "exploding" the extension by dumping, rather than suppressing, its component objects is by far the most reliable way of accomplishing the latter. To point out just one reason why, we've never made any effort to prohibit suitably-privileged users from modifying the objects within an extension. So even if you'd kept around the originally defining text string, it might not represent current reality. And as for relying on some URL or other --- whoever proposed that doesn't live in the same internet I do. URLs aren't immutable, even on days when you can get to them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
David Fetter writes: > The idea here is that such a happy situation will not obtain until > much later, if ever, and meanwhile, we need a way to get things > accomplished even if it's inelegant, inefficient, etc. The > alternative is that those things simply will not get accomplished at > all. If that's the argument, why not just use dblink or dbilink, and be happy? This discussion sounds a whole lot like it's trending to a conclusion of wanting one of those in core, which is not where I'd like to end up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Keys
On 12/04/2013 02:40 PM, Tom Lane wrote: Andrew Dunstan writes: Well I guess we could say something like: FOREIGN KEY (a-col) WHERE (a-condition) REFERENCES b(b-col) WHERE (b-condition) But it's somewhat ugly. OK, those make sense. I wonder whether this should be done via a USING clause on the constraint that pointed to the partial unique index. Or would that be too obscure? I like what you have above. Yeah, it requires the more verbose syntax for declaring a foreign key, but this feature is not going to be so heavily used that anyone will be in danger of worsening their carpal tunnel syndrome. Fair enough. I guess in terms of *this* feature TomD would then need to adjust the location of his WHERE clause so it's before the REFERENCES clause. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Tue, Dec 3, 2013 at 11:44 AM, Dimitri Fontaine wrote: >> We should also consider the possibility of a user trying to >> deliberately install and older release. For example, if the user has >> 1.0, 1.0--1.1, 1.1, 1.1--1.2, and 1.2--1.0 (a downgrade script) with >> default_full_version = 1.2, an attempt to install 1.0 should run just >> the 1.0 script, NOT 1.2 and then 1.2--1.0. > > In what I did, if you want version 1.0 and we have a script --1.0.sql > around, then we just use that script, never kicking the path chooser. Oh, right. Duh. Sorry, bad example. I do think we want to avoid using a downgrade script as part of an install though - and to install from the newest possible full version (I kind of like the term "base" version) whenever possible. >> break the tie by choosing which version number appears first in the >> aforementioned list. If that still doesn't break the tie, either >> because none of the starting points are mentioned in that list or >> because there are multiple equal-length paths starting in the same >> place, we give up and emit an error. > > Jeff also did mention about tiebreakers without entering into any level > of details. > > We won't be able to just use default_version as the tiebreaker list > here, because of the following example: > > default_version = 1.2, 1.0 > > create extension foo version '1.1'; > > With such a setup it would prefer 1.2--1.1 to 1.0--1.1, which doesn't > look like what we want. Instead, we want > > default_version = 1.2 > create_from_version_candidates = 1.0 > > create extension foo version '1.1'; > > Then the tie breaker is the 1.0 in "create_from_version_candidates" so > we would run foo--1.0.sql and then foo--1.0--1.1.sql. I guess one way to skin this cat would be to just let the user provide an ordering for the versions i.e. version_ordering = 1.0 1.1 1.2 When the user asks for version X, we reject any paths that pass through a newer version (so that we never downgrade), and start with the path that begins as close to the target version as possible. For scenarios were people might be installing either an older or newer version, that might be easier to understand than a base-version preference list. > Baring objections, I'm going to prepare a new branch to support > developping that behavior against only file based extensions, and submit > a spin-off patch to the current CF entry. Not totally sure we're all on the same page yet, but that's not necessarily meant to dissuade you. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Andrew Gierth writes: > "Tom" == Tom Lane writes: > Tom> Well, okay, but you've not said anything that wouldn't be > Tom> handled just as well by some logic that adds a fixed > Tom> integer-constant-zero flag column to the rows going into the > Tom> tuplesort. > Adding such a column unconditionally even for non-hypothetical > functions would break the optimization for sorting a single column > (which is a big deal, something like 3x speed difference, for by-value > types). Well, sure, but I was only suggesting adding it when the aggregate asks for it, probably via a new flag column in pg_aggregate. The question you're evading is what additional functionality could be had if the aggregate could demand a different datatype or constant value for the flag column. > Adding it only for hypothetical set functions is making a distinction > in how functions are executed that I don't think is warranted - That seems like rather a curious argument from someone who's willing to give up the ability to specify a regular transition value concurrently with the flag column. But anyway, what I'm thinking right now is that these questions would all go away if the aggregate transfunction were receiving the rows and sticking them into the tuplestore. It could add whatever columns it felt like. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Dec 4, 2013 at 1:39 PM, David Fetter wrote: > On Wed, Dec 04, 2013 at 12:43:44PM -0600, Merlin Moncure wrote: >> On Mon, Dec 2, 2013 at 10:26 PM, David Fetter wrote: >> > On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote: >> >> On 11/28/2013 03:24 AM, David Fetter wrote: >> >> > WITH, or SRF, or whatever, the point is that we need to be able to >> >> > specify what we're sending--probably single opaque strings delimited >> >> > just as we do other strings--and what we might get back--errors only, >> >> > rows, [sets of] refcursors are the ones I can think of offhand. >> >> >> >> So, you're thinking of something like: >> >> >> >> WITH FOREIGN somecte AS $$... foreign query ...$$ >> >> SELECT ... >> >> FROM somecte; >> > >> > I was picturing something a little more like an SRF which would take >> > one opaque string, the remote command, some descriptor, perhaps an >> > enum, of what if anything might come back. Long ago, I implemented a >> > similar thing in DBI-Link. It was called >> > >> > remote_exec_dbh(data_source_id integer, query text, returns_rows bool) >> >> Couple thoughts: >> *) Any 'pass through' API should support parameterization (the FDW may >> not support that, but many will and API should allow for it). Lack >> of parameterization is a major downside of dblink. The function could >> be set up to be variadic for the parameters. > > I don't know for sure that that needs to be in version 1 of this. It > definitely shouldn't block implementing the non-parameterized one. I'm not making the case it should be version anything. But, if you went dblink style, you'd want to go variadic. It's not really any extra work and you can always embed the string if the FDW driver doesn't support parameterization. > What the standard has is literally insane. Not sure I agree. The guiding principle of the standard implementation AIUI is that it wants to connectivity management via syntax and keep the DML abstractions clean (minus some un-implementable things like RI triggers). In other words, you write exactly the same queries for native and foreign tables. This makes things much easier for people who just want to write SQL the classical way and not get into funky vendor specific APIs. The downside of SQL-MED, particularly the way postgres implemented the driver API, is that each driver is responsible for for all optimization efforts and I think this is bad. So I'm openly wondering if the FDW API should expose optional query rewriting hooks. The odbc-fdw and jdbc-fdw drivers for example could then benefit from those hooks so that qual pushdown could be implemented with far less code duplication and effort and a *much* broader set of problems could be addressed by FDW. For non- or exotic- SQL implementations those hooks could be implemented locally by the driver or disabled if doesn't make sense to use them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Merlin Moncure writes: > The downside of SQL-MED, particularly the way postgres implemented the > driver API, is that each driver is responsible for for all > optimization efforts and I think this is bad. There was never any intention that that would be the final state of things. All the FDW APIs are quite experimental at this point, and subject to change, and one of the reasons for change is going to be to improve the optimization situation. At the same time, it's hard to say what might constitute optimization for FDWs that aren't backed by a remote SQL database. There are always going to be reasons why an FDW will have to do some of that work for itself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Wed, Dec 4, 2013 at 3:39 AM, Jeff Davis wrote: > On Tue, 2013-12-03 at 10:23 -0500, Robert Haas wrote: >> In more normal cases, however, the system can (and probably should) >> figure out what was intended by choosing the *shortest* path to get to >> the intended version. For example, if someone ships 1.0, 1.0--1.1, >> 1.1, and 1.1--1.2, the system should choose to run 1.1 and then >> 1.1--1.2, not 1.0 and then 1.0--1.1 and then 1.1--1.2. But that can >> be automatic: only if there are two paths of equal length (as in the >> example in the previous paragraph) do we need help from the user to >> figure out what to do. > > Why do we need help from the user? Just pick a path. > > For an extension update, I understand why someone wouldn't want to > accidentally downgrade 5 versions (dropping all of their dependent > objects) before updating to the latest. But this doesn't apply to > creation. I suppose. But suppose we have 1.0, 1.1, 1.0--1.2, and 1.1--1.2. Suppose further that 1.1 drops some interfaces present in 1.0, and 1.2 adds new stuff. If the system chooses to run 1.0 and then 1.0--1.2, it'll create all the deprecated interfaces and then drop them again. Now maybe that won't cause any problems, but I bet it will. For example, consider hstore again. If we eventually disallow => as an operator altogether, the 1.0 script won't even run any more. Of course that doesn't matter for core because we've removed it entirely from our repository and don't ship it any more, but an out-of-core extension might well keep around more old scripts than we do, to make it easier to use the same bundle with multiple server versions. Imagine, for example, that 1.0 only works on 9.4 or earlier and 1.2 only works on releases 9.2 or later. The extension author wants to keep the 1.0 script around for the benefit of people who haven't upgraded, so that they can still install the older version that works there; but he also need the 1.1 base version to be preferred to the 1.0 base version, else installation of 1.2 on 10.0+ will fail completely. My experience with software upgrades is that cases like this, and even weirder things, happen pretty routinely, so I think more control is good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On Wed, 04 Dec 2013 11:07:04 -0800 Josh Berkus wrote: > On 12/04/2013 07:33 AM, Jonathan Corbet wrote: > > Wow, Josh, I'm surprised to hear this from you. > > Well, I figured it was too angry to propose for an LWN article. ;-) So you're going to make us write it for you :) > > The active/inactive list mechanism works great for the vast majority of > > users. The second-use algorithm prevents a lot of pathological behavior, > > like wiping out your entire cache by copying a big file or running a > > backup. We *need* that kind of logic in the kernel. > > There's a large body of research on 2Q algorithms going back to the 80s, > which is what this is. As far as I can tell, the modification was > performed without any reading of this research, since that would have > easily shown that 50/50 was unlikely to be a good division, and that in > fact there is nothing which would work except a tunable setting, because > workloads are different. In general, the movement of useful information between academia and real-world programming seems to be minimal at best. Neither side seems to find much that is useful or interesting in what the other is doing. Unfortunate. For those interested in the details... (1) It's not quite 50/50, that's one bound for how the balance is allowed to go. (2) Anybody trying to add tunables to the kernel tends to run into resistance. Exposing thousands of knobs tends to lead to a situation where you *have* to be an expert on all those knobs to get decent behavior out of your system. So there is a big emphasis on having the kernel tune itself whenever possible. Here is a situation where that is not always happening, but a fix (which introduces no knob) is in the works. As an example, I've never done much with the PostgreSQL knobs on the LWN server. I just don't have the time to mess with it, and things Work Well Enough. > However, this particular issue concerns me less than the general > attitude that it's OK to push in experimental IO changes which can't be > disabled by users into release kernels, as exemplified by several > problematic and inadequately tested IO changes in the 3.X kernels -- > most notably the pdflush bug. It speaks of a policy that the Linux IO > stack is not production software, and it's OK to tinker with it in ways > that break things for many users. Bugs and regressions happen, and I won't say that we do a good enough job in that regard. There has been some concern recently that we're accepting too much marginal stuff. We have problems getting enough people to adequately review code — I think I've heard of another project or two with similar issues :). But nobody sees the kernel as experimental or feels that the introduction of bugs is an acceptable thing. > I also wasn't exaggerating the reception I got when I tried to talk > about IO and PostgreSQL at LinuxCon and other events. The majority of > Linux hackers I've talked to simply don't want to be bothered with > PostgreSQL's performance needs, and I've heard similar things from my > collegues at the MySQL variants. Greg KH was the only real exception. > > Heck, I went to a meeting of filesystem geeks at LinuxCon and the main > feedback I received, from Linux FS developers (Chris and Ted), was > "PostgreSQL should implement its own storage and use DirectIO, we don't > know why you're even trying to use the Linux IO stack." I think you're talking to the wrong people. Nothing you've described is a filesystem problem; you're contending with memory management problems. Chris and Ted weren't helpful because there's actually little they can do to help you. I would be happy to introduce you to some people who would be more likely to take your problems to heart. Mel Gorman, for example, is working on putting together a set of MM benchmarks in the hopes of quantifying changes and catching regressions before new code is merged. He's one of the people who has to deal with performance regressions when they show up in enterprise kernels, and I get the sense he'd rather do less of that. Perhaps even better: the next filesystem, storage, and memory management summit is March 24-25. A session on your pain points there would bring in a substantial portion of the relevant developers at all levels. LSFMM is arguably the most productive kernel event I see over the course of a year; it's where I would go first to make progress on this issue. I'm not an LSFMM organizer, but I would be happy to work to make such a session happen if somebody from the PostgreSQL community wanted to be there. > > This code has been a bit slow getting into the mainline for a few reasons, > > but one of the chief ones is this: nobody is saying from the sidelines > > that they need it! If somebody were saying "Postgres would work a lot > > better with this code in place" and had some numbers to demonstrate that, > > we'd be far more likely to see it get into an upcoming release. > > Well, Citus did that; do you need mor
Re: [HACKERS] Why we are going to have to go DirectIO
On Wed, Dec 4, 2013 at 9:31 PM, Jonathan Corbet wrote: > > I also wasn't exaggerating the reception I got when I tried to talk > > about IO and PostgreSQL at LinuxCon and other events. The majority of > > Linux hackers I've talked to simply don't want to be bothered with > > PostgreSQL's performance needs, and I've heard similar things from my > > collegues at the MySQL variants. Greg KH was the only real exception. > > > > Heck, I went to a meeting of filesystem geeks at LinuxCon and the main > > feedback I received, from Linux FS developers (Chris and Ted), was > > "PostgreSQL should implement its own storage and use DirectIO, we don't > > know why you're even trying to use the Linux IO stack." > > I think you're talking to the wrong people. Nothing you've described is a > filesystem problem; you're contending with memory management problems. > Chris and Ted weren't helpful because there's actually little they can do > to help you. I would be happy to introduce you to some people who would be > more likely to take your problems to heart. > > Mel Gorman, for example, is working on putting together a set of MM > benchmarks in the hopes of quantifying changes and catching regressions > before new code is merged. He's one of the people who has to deal with > performance regressions when they show up in enterprise kernels, and I get > the sense he'd rather do less of that. > > Perhaps even better: the next filesystem, storage, and memory management > summit is March 24-25. A session on your pain points there would bring in > a substantial portion of the relevant developers at all levels. LSFMM > is arguably the most productive kernel event I see over the course of a > year; it's where I would go first to make progress on this issue. I'm not > an LSFMM organizer, but I would be happy to work to make such a session > happen if somebody from the PostgreSQL community wanted to be there. > I think that's an excellent idea. If one of our developers could find the time to attend that, I think that could be very productive. While I'm not on the funds team, I'd definitely vote for funding such participation out of community funds if said developer can't do it on his own. But it should definitely be a developer with interest and skills in that particular area as well of course :) So don't think I'm proposing myself, I definitely am not :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing
On Wed, Dec 4, 2013 at 8:43 PM, Tom Lane wrote: > Magnus Hagander writes: > > On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane wrote: > >> I assume what would happen is the slave would PANIC upon seeing a WAL > >> record code it didn't recognize. > > > I wonder if we should for the future have the START_REPLICATION command > (or > > the IDENTIFY_SYSTEM would probably make more sense - or even adding a new > > command like IDENTIFY_CLIENT. The point is, something in the replication > > protocol) have walreceiver include it's version sent to the master. That > > way we could have the walsender identify a walreceiver that's too old and > > disconnect it right away - with a much nicer error message than a PANIC. > > Meh. That only helps for the case of streaming replication, and not for > the thirty-seven other ways that some WAL might arrive at something that > wants to replay it. > > It might be worth doing anyway, but I can't get excited about it for this > scenario. > It does, but I bet it's one of the by far most common cases. I'd say it's that one and restore-from-backup that would cover a huge majority of all cases. If we can cover those, we don't have to be perfect - so unless it turns out to be ridiculously complicated, I think it would be worthwhile having. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Why we are going to have to go DirectIO
* Magnus Hagander (mag...@hagander.net) wrote: > I think that's an excellent idea. If one of our developers could find the > time to attend that, I think that could be very productive. While I'm not > on the funds team, I'd definitely vote for funding such participation out > of community funds if said developer can't do it on his own. > > But it should definitely be a developer with interest and skills in that > particular area as well of course :) So don't think I'm proposing myself, I > definitely am not :) For my part, I'm definitely interested and those dates currently look like they'd work for me. Not sure if I really meet Magnus' qualifications above, but I'd be happy to try. ;) Stark and I were having a pretty good discussion with Ted Ts'o at pgconf.eu and he certainly seemed interested and willing to at least discuss things with us.. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Performance optimization of btree binary search
Having nothing better to do over the holiday weekend, I decided to pursue a number of ideas for improving performance that I thought about a long time ago. These include: * Pre-fetching list node pointers. This looks to be moderately promising, but I'm certainly not going to be the one to land it, given present obligations. Stephen Frost may wish to pick it up, given his previous interest in the matter. This is slightly controversial, because it uses a GCC intrinsic (__builtin_prefetch), but also because the Linux kernel removed this optimization to their generic list data-structure [1]. However, that list was what we'd call an embedded list, so we should probably shouldn't be totally deterred. The amount of effort that I put into this was, frankly, pretty low. A motivated person, willing to do the appropriate analysis could probably bring it further. For one thing, every single foreach() has a call to this intrinsic, even where the list doesn't store pointers (which is not undefined). At the very least that's going to bloat things up, frequently for no conceivable gain, and yet with the patch applied we're still able to see see quite tangible benefits, even if it isn't exactly a stellar improvement. I have an idea that prefetching the last element at the start of the loop could be much better than what I did, because we know that those lists are mostly pretty small in practice, and that's likely to help pipelining - prefetching too late or even too early makes the optimization useless, because you may still get a cache miss. * Optimizing index scans - I noticed that binary searching accounted for many cache misses during a pgbench select.sql benchmark, instrumented with "perf record -e cache-misses". This warranted further investigation. I won't say anything further about the former optimization, except to note that it's included for comparative purposes in the set of benchmarks I've run (I haven't included a patch). The benchmark results are here: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results I took two approaches to the latter. This was the more interesting piece of work. Test sets include: * Master baseline (green) * List optimization (as mentioned above, not really relevant to the main topic of this mail) (red) * "fib btree", earlier patch, please disregard (blue) * "Fixed fib patch", Fibonacci search, no specialization (purple) * The interesting one, Finonacci search + specialization - "fib + no jump" (turquoise, see below for details) Initially, I had a little bit of success with Fibonnacci search [2] in place of binary search, in the hope that it would better take advantage of CPU cache characteristics - Fibonnacci search is said to have advantages where non-uniform memory access is an issue - it minimizes the final interval. I wasn't all that optimistic that it would work that well given the smallish size of BLCKSZ relative to modern CPU L1 cache sizes [3], but it did make an appreciable dent on its own. I suppose old habits die hard, because next I hacked up _bt_compare and had it do an int4btcmp directly, in the event of encountering a scankey that had as its comparator the relevant pg_proc oid. This is very much in the style (and the spirit) of the grotty early draft patches for the inlining-comparators-for-sorting patch. Patch is attached. This is a draft, a POC, posted only to facilitate discussion and to allow my results to be independently duplicated/verified. Note that there is a bug (attributable to the new search code) that causes the regression tests to fail in exactly one place (i.e. one line of difference). I didn't think it was worth deferring discussion to deal with that, though, since I don't think it undermines anything. I'm not sure how valuable the comparator trick is if we stick with binary search - I didn't test that. I'm sure it's a question that must be considered, though. I have a fairly huge amount of data here, having run plenty of benchmarks over several nights. The short version is that the 'select' benchmark has just over 18% greater throughput on this machine at some client counts (in particular, when there are 4 clients - there are 4 cores, but 8 logical cores) with the attached patch. There is a 3.5% regression with one client, which is certainly not accounted for by noise. Note, however, that latency appears consistently better with the patch applied. This is a machine running on dedicated hardware: a 4-core i7-4770. The working set easily fits in its 32GiB of DDR3 RAM at all pgbench scales tested (1, 10 and 100). The kernel used is "3.8.0-31-generic #46~precise1-Ubuntu SMP". Postgres settings are typical for this kind of thing (6GiB shared_buffers), but you can refer to my pgbench-tools results for full details (drill down to an individual pgbench run for that - they're all the same). I'm kind of curious as to what this benchmark would like like on a server with many more cores. I guess I could write a proper patch to have co
Re: [HACKERS] WITHIN GROUP patch
> "Tom" == Tom Lane writes: Tom> Well, sure, but I was only suggesting adding it when the Tom> aggregate asks for it, probably via a new flag column in Tom> pg_aggregate. Sure, I was only pointing out the necessity. Tom> The question you're evading is what additional functionality Tom> could be had if the aggregate could demand a different datatype Tom> or constant value for the flag column. I don't really see a question there to answer - I simply chose to provide a general mechanism rather than make assumptions about what future users of the code would desire. I have no specific application in mind that would require some other type. >> Adding it only for hypothetical set functions is making a >> distinction in how functions are executed that I don't think is >> warranted - Tom> That seems like rather a curious argument from someone who's Tom> willing to give up the ability to specify a regular transition Tom> value concurrently with the flag column. In the current patch the idea of also specifying a regular transition value is meaningless since there is no transition function. Tom> But anyway, what I'm thinking right now is that these questions Tom> would all go away if the aggregate transfunction were receiving Tom> the rows and sticking them into the tuplestore. It could add Tom> whatever columns it felt like. True, but this ends up duplicating the sorting functionality of nodeAgg that we are leveraging off in the first place. I think this will be somewhat more intrusive and likely slower. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
Jonathan, > For those interested in the details... (1) It's not quite 50/50, that's one > bound for how the balance is allowed to go. (2) Anybody trying to add > tunables to the kernel tends to run into resistance. Exposing thousands of > knobs tends to lead to a situation where you *have* to be an expert on all > those knobs to get decent behavior out of your system. So there is a big > emphasis on having the kernel tune itself whenever possible. Here is a > situation where that is not always happening, but a fix (which introduces > no knob) is in the works. Yeah, we get into this argument all the time. The problem is when you run into situations where there is no optimal (or even acceptable) setting for all, or even most, users. And I'll say in advance that 2Q is one of those situations. > As an example, I've never done much with the PostgreSQL knobs on the LWN > server. I just don't have the time to mess with it, and things Work Well > Enough. Sure, and even when I teach fiddling with the knobs, there's only 12-20 knobs 95% of users need to have any interest in. But we have ~~220 settings for the other 5%, and those users would be screwed without them. > Bugs and regressions happen, and I won't say that we do a good enough job > in that regard. There has been some concern recently that we're accepting > too much marginal stuff. We have problems getting enough people to > adequately review code — I think I've heard of another project or two with > similar issues :). But nobody sees the kernel as experimental or feels > that the introduction of bugs is an acceptable thing. OK. The chain of events over the pdflush bug really felt like what I said earlier, especially since problems *were* reported shortly after kernel release and ignored. > I think you're talking to the wrong people. Quite possibly. > Perhaps even better: the next filesystem, storage, and memory management > summit is March 24-25. Link? I can't find anything Googling by that name. I'm pretty sure we can get at least one person there. > Gee, if only there were a web site where one could read about changes to > the Linux kernel :) Even you don't cover 100% of performance-changing commits. And I'll admit to missing issues of LWN when I'm travelling. > Seriously, though, one of the best things to do would be to make a point of > picking up a kernel around -rc3 (right around now, say, for 3.13) and > running a few benchmarks on it. If you report a performance regression at > that stage, it will get attention. Yeah, back to the "we need resources for good benchmarks" discussion fork ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On Wed, Dec 4, 2013 at 2:31 PM, Jonathan Corbet wrote: > For those interested in the details... (1) It's not quite 50/50, that's one > bound for how the balance is allowed to go. (2) Anybody trying to add > tunables to the kernel tends to run into resistance. Exposing thousands of > knobs tends to lead to a situation where you *have* to be an expert on all > those knobs to get decent behavior out of your system. So there is a big > emphasis on having the kernel tune itself whenever possible. Here is a > situation where that is not always happening, but a fix (which introduces > no knob) is in the works. I think there are interesting parallels here with the 'query plan hints' debate. In both cases I think the conservative voices are correct: better not to go crazy adding knobs. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On Wed, 04 Dec 2013 13:01:37 -0800 Josh Berkus wrote: > > Perhaps even better: the next filesystem, storage, and memory management > > summit is March 24-25. > > Link? I can't find anything Googling by that name. I'm pretty sure we > can get at least one person there. It looks like the page for the 2014 event isn't up yet. It will be attached (as usual) to the LF collaboration summit: http://events.linuxfoundation.org/events/collaboration-summit I'll make a personal note to send something here when the planning process begins and the CFP goes out. Napa Valley...one can do worse...:) jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance optimization of btree binary search
Peter Geoghegan writes: > I guess I could write a proper patch to have code setting up a scankey > also set a flag that indicated that it was acceptable to assume that > the special built-in comparator would do fine. ... > I'd be happy with a scheme with only one built-in comparator, and > allowed a few types to be cataloged such that it was indicated that > just using the "built-in" comparator was acceptable, knowledge that > could be passed to _bt_compare via the scankey. I'm thinking of just > int4, and maybe date and a few other such int4 "covariant" types. If what you're proposing is that we have a fast path that compares Datums as Datums, I should think that that would work fine for int2 as well, *and* for int8 on machines where int8 is pass-by-value. (Does anyone still care much about PG's performance on 32-bit hardware?) We might have to fool a bit with the fooGetDatum macros in some cases, eg I think Int16GetDatum isn't careful about sign extension. Admittedly, that might introduce an offsetting cost on some hardware, but I think on most machines sign-extension isn't noticeably more expensive than zero-extension. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance optimization of btree binary search
On Wed, Dec 4, 2013 at 4:28 PM, Tom Lane wrote: > Peter Geoghegan writes: >> I guess I could write a proper patch to have code setting up a scankey >> also set a flag that indicated that it was acceptable to assume that >> the special built-in comparator would do fine. ... >> I'd be happy with a scheme with only one built-in comparator, and >> allowed a few types to be cataloged such that it was indicated that >> just using the "built-in" comparator was acceptable, knowledge that >> could be passed to _bt_compare via the scankey. I'm thinking of just >> int4, and maybe date and a few other such int4 "covariant" types. > > If what you're proposing is that we have a fast path that compares Datums > as Datums, I should think that that would work fine for int2 as well, > *and* for int8 on machines where int8 is pass-by-value. (Does anyone > still care much about PG's performance on 32-bit hardware?) We might > have to fool a bit with the fooGetDatum macros in some cases, eg > I think Int16GetDatum isn't careful about sign extension. Admittedly, > that might introduce an offsetting cost on some hardware, but I think > on most machines sign-extension isn't noticeably more expensive than > zero-extension. Yeah, I think if we can make something like this work, it would be neat-o. Getting this working for int4 would be a good win, as Peter says, but getting it working for both int4 and int8 with the same code would be a significantly better one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Andrew Gierth writes: > "Tom" == Tom Lane writes: > Tom> But anyway, what I'm thinking right now is that these questions > Tom> would all go away if the aggregate transfunction were receiving > Tom> the rows and sticking them into the tuplestore. It could add > Tom> whatever columns it felt like. > True, but this ends up duplicating the sorting functionality of > nodeAgg that we are leveraging off in the first place. I think this > will be somewhat more intrusive and likely slower. Hm, it's just a refactoring of the same code we'd have to have anyway, so I'm not seeing a reason to assume it'd be slower. If anything, this approach would open more opportunities for function-specific optimizations, which in the long run could be faster. (I'm not claiming that any such optimizations would be in the first version.) In hindsight I wonder if it wasn't a mistake to embed ordered-aggregate support in nodeAgg.c the way we did. We could have dumped that responsibility into some sort of wrapper around specific aggregates, with an option for some aggregates to skip the wrapper and handle it themselves. A trivial, and perhaps not very useful, example is that non-order-sensitive aggregates like MIN/MAX/COUNT could have been coded to simply ignore any ordering request. I can't immediately think of any examples that are compelling enough to justify such a refactoring now --- unless it turns out to make WITHIN GROUP easier. Anyway, I'm going to go off and look at the WITHIN GROUP patch with these ideas in mind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On 12/2/13, 9:14 AM, Dimitri Fontaine wrote: > What I want to build is an “extension distribution” software that knows > how to prepare anything from PGXN (and other places) so that it's fully > ready for being used in the database. Then the main client would run as > a CREATE EXTENSION "ddl_command_start" Event Trigger and would fetch the > prepared extension for you and make it available, then leaving the main > command operate as intended. > > Which is what I think the pex extension is doing, and that's not > coincidental, but it runs the build step on the PostgreSQL server itself > and needs to have a non-trivial set of file-system privileges to be > doing so, and even needs to get root privileges with sudo for some of > its operations. You're thinking of autopex, and while that works, and can be made to work better with certain small changes, I don't think it can ever be the only solution. Many interesting extensions will have external packages build and run-time dependencies, and you need file-system level access to manage that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On 12/2/13, 2:33 PM, Greg Stark wrote: > Just tossing an idea out there. What if you could install an extension > by specifying not a local file name but a URL. Obviously there's a > security issue but for example we could allow only https URLs with > verified domain names that are in a list of approved domain names > specified by a GUC. This is similar to what autopex does (https://github.com/petere/autopex). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers