Re: [HACKERS] More thoughts about planner's cost estimates
Greg Stark <[EMAIL PROTECTED]> writes: > Perhaps what this indicates is that the real meat is in track sampling, not > block sampling. Fwiw, I've done a little benchmarking and I'm starting to think this isn't a bad idea. I see a dramatic speed improvement for samples of 1-10% as the block size increases. Presumably this is as Hannu said, reducing the number of tracks necessary to cover the sample. I see improvements up to around 256M blocks or so, but my data is pretty questionable since I'm busy watching tv in Mythtv in another window. It's on another drive but it still seems to be making the numbers jump around a bit. I expect there's a trade-off between keeping enough blocks for the sample of blocks to be representative on the one hand and large blocks being much faster to read in on the other. I would suggest something like setting the block size in the block sampling algorithm to something like max(8k,sqrt(table size)). That gives 8k blocks for anything up to 255M but takes better advantage of the speed increase available from sequential i/o for larger tables, from my experiments about a 50% increase in speed. Actually maybe even something even more aggressive would be better, maybe (table size)^.75 So it kicks in sooner than on 256M tables and gets to larger block sizes on reasonable sized tables. Note, this doesn't mean anything like changing page sizes, just selecting more blocks that hopefully lie on the same track when possible. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Faster Updates
On Saturday 03 June 2006 17:27, Tom Lane wrote: > PFC <[EMAIL PROTECTED]> writes: > >[snip - complicated update logic proprosal] > > What do you think ? > > Sounds enormously complicated and of very doubtful net win --- you're > > [snip - ... bad idea reasoning] :) What if every backend while processing a transaction collected a list of touched records - probably with a max number of entries (GUC) collected per transaction. Then when transaction completes the list of touples are sent to pg_autovacuum or possible a new process that selectively only went for those tupples. Of course it should have some kind of logic connected so we don't visit the tupples for vacuum unless we are quite sure no running transactions would be blocking adding the blocks to the FSM. We might be able to actually queue up the blocks until a later time (GUC queue-max-time + queue-size-limit) if we cannot determine that it would be safe to FSM the blocks at current time. I guess this has probably been suggested before and there is probably a reason why it cannot be done or wouldn't be effective. But it could probably be a big win in for common workloads like webpages. Where it would be troublesome is systems with long-running transactions - it might as well just be disabled there. Best regards, Nicolai Petri ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Going for "all green" buildfarm results
Original Message From: Tom Lane <[EMAIL PROTECTED]> kudu HEAD: one-time failure 6/1/06 in statement_timeout test, never seen before. Is it possible system was under enough load that the 1-second timeout fired before control reached the exception block? The load here was no different than any other day. As to whether it's a real issue or not I have no idea. It is a virtual machine that is subject to the load on other VMs, but none of them were scheduled to do anything at the time. Kris Jurka ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Possible TODO item: copy to/from pipe
Andreas Pflug wrote: > Tom Lane wrote: > > After re-reading what I just wrote to Andreas about how compression of > > COPY data would be better done outside the backend than inside, it > > struck me that we are missing a feature that's fairly common in Unix > > programs. Perhaps COPY ought to have the ability to pipe its output > > to a shell command, or read input from a shell command. Maybe something > > like > > > > COPY mytable TO '| gzip >/home/tgl/mytable.dump.gz'; For use case, consider this: COPY mytable TO '| rsh [EMAIL PROTECTED] > test '; so you can COPY to another server directly. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Possible TODO item: copy to/from pipe
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Wed, May 31, 2006 at 01:08:28PM -0700, Steve Atkins wrote: > > On May 31, 2006, at 12:58 PM, Dave Page wrote: > > >On 31/5/06 19:13, "Andreas Pflug" <[EMAIL PROTECTED]> wrote: > > > > > >>I wonder if we'd be able to ship gzip with the windows installer, to > > >>insure proper integration. > > > > > >'Fraid not. It's GPL'd. > > > > Well, one implementation of it is. zlib is new-bsd-ish, though, and > > includes minigzip, which should be just fine for use in a pipe on > > windows. > > Even then it's not relevent. The Windows Installer is already GPL'd by > the fact it includes readline. zlib is indeed straight BSD like. I assume gzip would be binary in the installer. Does putting a GPL binary in the installer make the entire thing GPL? I don't think so. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY (query) TO file
Greg Stark wrote: It would have been awfully nice to do be able to do SELECT ... FROM (VALUES (a,b,c),(d,e,f),(g,h,i)) The trouble with supporting it for any case other than INSERT is that you have to work out what the column datatypes of the construct ought to be. This is the same as the equivalent problem for UNION constructs, but the UNION type resolution algorithm looks kinda ugly for thousands of inputs :-( I always thought UNION just decided on the type based on the first branch and then coerced all the others to that type. I always cast all the columns on the first union branch just in case. Could we get away with requiring an explicit type expression where there's some abiguity or uncertainty, like this SELECT ... FROM (VALUES (a,b,c),(d,e,f),(g,h,i)) as (a int, b text, c float) That's what you have to do with an SRF that returns a SETOF RECORD in the same situation, after all. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] More thoughts about planner's cost estimates
Hannu Krosing <[EMAIL PROTECTED]> writes: > Disks can read at full rotation speed, so skipping (not reading) some > blocks will not make reading the remaining blocks from the same track > faster. And if there are more than 20 8k pages per track, you still have > a very high probablility you need to read all tracks.. Well, if there are exactly 20 you would expect a 50% chance of having to read a track so you would expect double the effective bandwidth. It would have to be substantially bigger to not have any noticeable effect. > You may be able to move to the next track a little earlier compared to > reading all blocks, but then you are likely to miss the block from next > track and have to wait a full rotation. No, I don't think that works out. You always have a chance of missing the block from the next track and having to wait a full rotation and your chance isn't increased or decreased by seeking earlier in the rotation. So you would expect each track to take <20 block reads> less time on average. > Your test program could have got a little better results, if you had > somehow managed to tell the system all the block numbers to read in one > go, not each time the next one after hetting the previous one. I was trying to simulate the kind of read pattern that postgres generates which I believe looks like that. > The fact that 5% was not slower than seqscan seems to indicate that > actually all track reads were cached inside the disk or controller. I dunno, your first explanation seemed pretty convincing and doesn't depend on specific assumptions about the caching. Moreover this doesn't explain why you *do* get a speedup when reading less than 5%. Perhaps what this indicates is that the real meat is in track sampling, not block sampling. -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] More thoughts about planner's cost estimates
pgbench appears to already support arbitrary SQL queries with the -f switch, so why couldn't we just make it a little smarter and have people enable SQL query logging for a day or two, then pass the log off to pgbench: pgbench -f Seems to me like that wouldn't be too difficult to do, and would give much closer "real-world" results than pgbench's built-in benchmark. On top of that the community could start offering up "template" benchmarks like: "busy website", "data warehouse", "forums", "financial" and distribute them with pgbench: pgbench -f templates/data_warehouse.pgbench pgbench -f templates/forums.pgbench ... From that point a brute force auto-tune utility would be pretty straight forward to write. pgautotune -f templates/data_warehouse.bench,myapp.sqllog Or if one server runs multiple custom apps that you want to tune for: pgautotune -f myapp1.sqllog,myapp2.sqllog,myapp3.sqllog Even if it took 48hrs to run, it would be a good burn-in test for a brand new server. ;) On Fri, 2006-06-02 at 19:38 -0400, Tom Lane wrote: > Rod Taylor <[EMAIL PROTECTED]> writes: > >> One objection to this is that after moving "off the gold standard" of > >> 1.0 = one page fetch, there is no longer any clear meaning to the > >> cost estimate units; you're faced with the fact that they're just an > >> arbitrary scale. I'm not sure that's such a bad thing, though. For > >> instance, some people might want to try to tune their settings so that > >> the estimates are actually comparable to milliseconds of real time. > > > Any chance that the correspondence to time could be made a part of the > > design on purpose and generally advise people to follow that rule? > > We might eventually get to that point, but I'm hesitant to try to do it > immediately. For one thing, I really *don't* want to get bug reports > from newbies complaining that the cost estimates are always off by a > factor of X. (Not but what we haven't gotten some of those anyway :-() > In the short term I see us sticking to the convention that seq_page_cost > is 1.0 in a "typical" database, while anyone who's really hot to try to > make the other happen is free to experiment. > > > If we could tell people to run *benchmark* and use those numbers > > directly as a first approximation tuning, it could help quite a bit > > for people new to PostgreSQL experiencing poor performance. > > We don't have such a benchmark ... if we did, we could have told > people how to use it to set the variables already. I'm very very > suspicious of any suggestion that it's easy to derive appropriate > numbers for these settings from one magic benchmark. > > regards, tom lane > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend -- Mike Benoit <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [HACKERS] More thoughts about planner's cost estimates
Ühel kenal päeval, R, 2006-06-02 kell 16:23, kirjutas Greg Stark: > And a 5% sample is a pretty big. In fact my tests earlier showed the i/o from > 5% block sampling took just as long as reading all the blocks. Even if we > figure out what's causing that (IMHO surprising) result and improve matters I > would only expect it to be 3-4x faster than a full scan. You should not be surprised by this once you visualise what happens at the disk level with all those platters spinning and heads moving :) Disks can read at full rotation speed, so skipping (not reading) some blocks will not make reading the remaining blocks from the same track faster. And if there are more than 20 8k pages per track, you still have a very high probablility you need to read all tracks.. You may be able to move to the next track a little earlier compared to reading all blocks, but then you are likely to miss the block from next track and have to wait a full rotation. You will get some win from skipping pages only once your % falls so low that you can also skip a significant number of tracks. > http://archives.postgresql.org/pgsql-hackers/2006-01/msg00285.php Your test program could have got a little better results, if you had somehow managed to tell the system all the block numbers to read in one go, not each time the next one after hetting the previous one. In current version it is quite likely that it had to wait several disk rotations for even the sectors from the same track, as for small steps it may have missed the next sector. It does not apply for disks which always read a full track in RAM cache, but even there all tracks are actually read. The fact that 5% was not slower than seqscan seems to indicate that actually all track reads were cached inside the disk or controller. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] COPY (query) TO file
Tom Lane <[EMAIL PROTECTED]> writes: > The interesting point here is that a is defined as a > parenthesized , which means that you ought to be able to > use a parenthesized VALUES list anyplace you could use a parenthesized > SELECT. So FROM lists, IN clauses, = ANY and friends, etc all really ought > to be able to support this. That's actually pretty neat. I've occasionally had to write queries with the idiom SELECT ... FROM (SELECT a,b,c UNION ALL SELECT d,e,f UNION ALL SELECT g,h,i ) WHERE ... That's pretty awful. It would have been awfully nice to do be able to do SELECT ... FROM (VALUES (a,b,c),(d,e,f),(g,h,i)) > The trouble with supporting it for any case other than INSERT is that > you have to work out what the column datatypes of the construct ought > to be. This is the same as the equivalent problem for UNION constructs, > but the UNION type resolution algorithm looks kinda ugly for thousands > of inputs :-( I always thought UNION just decided on the type based on the first branch and then coerced all the others to that type. I always cast all the columns on the first union branch just in case. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] More thoughts about planner's cost estimates
Ühel kenal päeval, L, 2006-06-03 kell 10:43, kirjutas Jim Nasby: > Might also be worth adding analyze delay settings, ala > vacuum_cost_delay. Actually we should have delay settings for all potential (almost-)full-scan service ops, - VACUUM, ANALYSE, CREATE INDEX, ADD CONSTRAINT, maybe more - so that there would be better chances of running those on busy databases without disastrous effects. Probably we should use the same settings for all these, not invent a new set for each. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COPY (query) TO file
Tom Lane wrote: Greg Stark <[EMAIL PROTECTED]> writes: PFC <[EMAIL PROTECTED]> writes: MySQL already does this for INSERT : INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...; The above syntax is SQL-standard, so we ought to support it sometime, performance benefits or no. I agree it might be tricky to avoid eating an unreasonable amount of memory for a very long list in Postgres :-( Supporting VALUES only in INSERT would be relatively trivial BTW, but the spec actually thinks it should be allowed as a in FROM ... Can we just start with the simple case? cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COPY (query) TO file
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> Supporting VALUES only in INSERT would be relatively trivial BTW, >> but the spec actually thinks it should be allowed as a >> in FROM ... > How does that syntax work? If you look at SQL92, INSERT ... VALUES is actually not a direct production in their BNF. They define as ::= INSERT INTO ::= [] | DEFAULT VALUES ::= and then one of the alternatives for is , which is ::= VALUES ::= [ { }... ] (Another alternative for is , which accounts for the INSERT ... SELECT syntax.) The interesting point here is that a is defined as a parenthesized , which means that you ought to be able to use a parenthesized VALUES list anyplace you could use a parenthesized SELECT. So FROM lists, IN clauses, = ANY and friends, etc all really ought to be able to support this. (A quick look at mysql's grammar suggests that they don't handle those cases.) The trouble with supporting it for any case other than INSERT is that you have to work out what the column datatypes of the construct ought to be. This is the same as the equivalent problem for UNION constructs, but the UNION type resolution algorithm looks kinda ugly for thousands of inputs :-( regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 'CVS-Unknown' buildfarm failures?
Jim Nasby wrote: yes, it's a file/directory it doesn't know about. At one stage I suppressed these checks, but I found that too many times we saw errors due to unclean repos. So now buildfarm insists on having a clean repo. I suppose I could provide a switch to turn it off ... in one recent case the repo was genuinely not clean, though, so I am not terribly keen on that approach - but I am open to persuasion. Another option would be to re-run cvs up one more time if we get any unexpected files. It sounds like that would fix this issue on windows machines, while still ensuring we had a clean repo to work from. please see the new release of the buildfarm client, in which I have followed Tom's suggestion of removing the -P flag from the checkout and update commands - that should solve the Windows problem, as it will no longer try to remove the directory. I hope that solves the problem - if not I'll have a look at other solutions. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY (query) TO file
Tom Lane <[EMAIL PROTECTED]> writes: > Supporting VALUES only in INSERT would be relatively trivial BTW, > but the spec actually thinks it should be allowed as a > in FROM ... How does that syntax work? INSERT INTO x (a,b) from select x,y,z from t from select x2,y2,z2 from t ? doesn't seem to be very sensible? -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] More thoughts about planner's cost estimates
On Jun 2, 2006, at 5:24 PM, Todd A. Cook wrote: Josh Berkus wrote: Greg, Tom, But for most users analyze doesn't really have to run as often as vacuum. One sequential scan per night doesn't seem like that big a deal to me. Clearly you don't have any 0.5 TB databases. Perhaps something like "ANALYZE FULL"? Then only those who need the more precise statistics would pay the cost for a full scan. Might also be worth adding analyze delay settings, ala vacuum_cost_delay. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 'CVS-Unknown' buildfarm failures?
On Jun 2, 2006, at 10:27 AM, Andrew Dunstan wrote: Joshua D. Drake wrote: Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: What's happening here is that cvs actually creates the directory and then later prunes it when it finds it is empty. I find that explanation pretty unconvincing. Why would cvs print a "?" for such a directory? cvs will print a ? if it doesn't know what it is... or is that svn? yes, it's a file/directory it doesn't know about. At one stage I suppressed these checks, but I found that too many times we saw errors due to unclean repos. So now buildfarm insists on having a clean repo. I suppose I could provide a switch to turn it off ... in one recent case the repo was genuinely not clean, though, so I am not terribly keen on that approach - but I am open to persuasion. Another option would be to re-run cvs up one more time if we get any unexpected files. It sounds like that would fix this issue on windows machines, while still ensuring we had a clean repo to work from. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Faster Updates
PFC <[EMAIL PROTECTED]> writes: > What do you think ? Sounds enormously complicated and of very doubtful net win --- you're moving a lot of overhead into SELECT in order to make UPDATE cheaper, and on top of that the restriction to same-page will limit the usefulness quite a lot (unless we deliberately keep pages less than full, which costs a lot in distributed extra I/O). Basically this is an extension of the notion of update tuple chains. Anyone who's been around the project awhile will know that we've had an unreasonably large number of corner-case bugs associated with tuple chains (particularly in VACUUM FULL), so adding a second level of complexity to 'em doesn't sound very appealing to me. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COPY (query) TO file
Mark Woodward wrote: >> Mark Woodward wrote: ... >>> This runs completely in the background and can serve as a running >>> backup. >> And you are sure it would be much faster then a server local running >> psql just dumping the result of a query? > > No I can't be sure of that at all, but The COPY command has a > specific use that is understood and an operation that is separate from the > normal query mechanism. Unless you change it to actually execute a query ;) >> (And you could more easy avoid raceconditions in contrast to several >> remote clients trying to trigger your above backup ) > > Again, the examples may not have been precise in presenting "why," the > focus was mostly "what" so it could be discussed. As a generic feature it > has many potential uses. Trying to debate and defend a specific use limits > the potential scope of the feature. Thats why I'm asking. I'm still wondering which use-case actually defends the integration of the resultset-formatter into the backend vs. just doing it in the frontend (in both places there are already some routines which could be used to implent). > Why have COPY anyway? Why not just use "SELECT * FROM TABLE?" Because the special SELECT * FROM TABLE can be optimized aparently. Ah yes, and if usual result fetch requires storing result set in server ram, it would be nicer to change that if possible. I think we run SELECT ... much more often then COPY ;-) (And I hope nobody comes up with the idea if copy would be implemented to execute queries, to generally use COPY instead of select for large result sets in applications. Goodbye portability...) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Faster Updates
Hello, Sometimes people complain that UPDATE is slow in postgres. UPDATE... - generates dead tuples which must be vacuumed. - needs to hit all indexes even if only one column was modified. From what I know UPDATE creates a new copy of the old row with the relevant C/TID's, then indexes it. On COMMIT the old version becomes dead but stays in the table and indexes until VACUUM. I propose a simple idea, which may be idiotic, but who knows. When a row is UPDATED, instead of storing a new copy of the entire row, only a differential is stored. The old row stays in the page anyway, so we might as well only store the binary encoded equivalent of "Use the row version number X and change column A to value Y". This is possible only if the differential fits in the free space on the page. In this case, a lot less dead space is generated. VACUUM would consolidate the differentials for commited transactions into a new base value for this row. While reading the page looking for a specific version of a row, all differences would need to be consolidated. This adds overhead, but it might be a win. With this method, it could be possible to avoid updating the indexes for unmodified columns. This is a big win. What do you think ? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COPY (query) TO file
MySQL already does this for INSERT : INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...; Does MySQL really let you stream that? Trying to do syntax like that in Postgres wouldn't work because the parser would try to build up a parse tree for the whole statement before running the command. Hehe, I don't know, but I suppose it's parsed in one-shot then executed, and not streamed, because : - you can append modifiers at the end of the statement (IGNORE...), - mysql barfs if the complete SQL including data is larger than the query buffer specified in the config file. The second point leads to an interesting fact, ie. dumps generated by phpmyadmin and mysqldump need a parameter specifying how long, in bytes, the insert commands can be ; so that hopefully they can be reloaded later. If one of the inserted values violates a "constraint", it is substituted by "some other default value". Still, it's useful ; and one interesting part is that everything happens in the same SQL command (wrt concurrency). ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] COPY (query) TO file
Greg Stark <[EMAIL PROTECTED]> writes: > PFC <[EMAIL PROTECTED]> writes: >> MySQL already does this for INSERT : >> INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...; > Does MySQL really let you stream that? Trying to do syntax like that in > Postgres wouldn't work because the parser would try to build up a parse tree > for the whole statement before running the command. A quick look at MySQL's grammar doesn't show any indication that they don't build a full parse tree too. The above syntax is SQL-standard, so we ought to support it sometime, performance benefits or no. I agree it might be tricky to avoid eating an unreasonable amount of memory for a very long list in Postgres :-( Supporting VALUES only in INSERT would be relatively trivial BTW, but the spec actually thinks it should be allowed as a in FROM ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] COPY (query) TO file
PFC <[EMAIL PROTECTED]> writes: > > I was also vaguely pondering whether all the DDL commands could be > > generalized to receive or send COPY formatted data for repeated execution. > > It would be neat to be able to prepare an UPDATE with placeholders and > > stream data in COPY format as parameters to the UPDATE to execute it > > thousands or millions of times without any protocol overhead or network > > pipeline stalls. > > MySQL already does this for INSERT : > INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...; Does MySQL really let you stream that? Trying to do syntax like that in Postgres wouldn't work because the parser would try to build up a parse tree for the whole statement before running the command. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] COPY (query) TO file
I was also vaguely pondering whether all the DDL commands could be generalized to receive or send COPY formatted data for repeated execution. It would be neat to be able to prepare an UPDATE with placeholders and stream data in COPY format as parameters to the UPDATE to execute it thousands or millions of times without any protocol overhead or network pipeline stalls. MySQL already does this for INSERT : INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...; allowing arbitrary SELECT statements as a COPY source seems much more powerful and flexible than just supporting COPY FROM VIEW. MySQL already does this : SELECT INTO OUTFILE blah FROM table... Now in both cases the MySQL syntax sucks but it's still quite practical, and the INSERT saves some overhead (parsing, acquiring locks...) and is quite a bit faster than regular INSERT. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Generalized concept of modules
Think about version API compatibility. Suppose you have a working database on server A which uses module foo version 1. Some time passes, you buy another server B and install postgres on it. Meanwhile the module foo has evolved into version 2 which is cooler, but has some minor API incompatibilities. You dump the database on server A and reload it on server B. pg_dump issues an INSTALL MODULE which installs foo version 2 on the new server. Due to the "minor API incompatibilities", your database breaks. It's really cool not to pollute the dumps (and the global namespace...) with all the module functions, however implementing module functionality can be tricky. So don't forget about versions and possible incompatibilities ; also versions means you might need an UPGRADE MODULE which does more than uninstall + reinstall. Suppose a module has created some tables for its use, these shouldn't be dumped when upgrading to a new version ; however maybe the new version will want to add a column... Think gentoo portage, for instance. This excellent package system is a lot more evolved than the module system needs to be, but having a look at the feature list would be a good inspiration maybe. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org