Re: [HACKERS] prefix btree implementation
Qingqing Zhou [EMAIL PROTECTED] writes: 1/ What types of prefix compression shall we support? Given the requirement of datatype independence, this idea seems a complete nonstarter to me... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [pgadmin-hackers] pgAdmin guru hints
I noticed a typo in hints/conn-hba.html The second internal ip adres missed a '.' You're invited to make your comments on the hints: are these correct, is there something missing or misleading? Cheers, -- ^(B(astia{2}n)?)(\s)?(W(ak{2}ie)?)$
Re: [HACKERS] [PERFORM] A Better External Sort?
On Mon, Oct 03, 2005 at 01:34:01PM -0700, Josh Berkus wrote: Realistically, you can't do better than about 25MB/s on a single-threaded I/O on current Linux machines, What on earth gives you that idea? Did you drop a zero? Nope, LOTS of testing, at OSDL, GreenPlum and Sun. For comparison, A Big-Name Proprietary Database doesn't get much more than that either. You seem to be talking about database IO, which isn't what you said. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuum and Transactions
On T, 2005-10-04 at 11:10 -0400, Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: The catch is that there are some other very active structures (like pg_listener for Slony) which after a couple of hours without vacuuming will quickly have the DB at an unreasonably high load (low tens) which seems to all but halt the vacuum on the large structure. Yeah. We desperately need to reimplement listen/notify :-( ... that code was never designed to handle high event rates. Sure. But it handles amazingly well event rates up to a few hundred events per second - given that pg_listener is cleaned up often enough. Above a few hundred eps it starts geting stuck on locks. It also seems that Slony can be modified to not use LISTEN/NOTIFY in high load situations (akin to high performance network cards, which switch from interrupt driven mode to polling mode if number of packets per second reaches certain thresolds). Unfortunately Slony and Listen/Notify is not the only place where high- update rate tables start to suffer from vacuums inability to clean out dead tuples when working in parallel with other slower vacuums. In real life there are other database tasks which also need some tables to stay small, while others must be huge in order to work effectively. Putting small and big tables in different databases and using dblink-like functionality when accessing them is one solution for such cases, but it is rather ugly :( -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuum and Transactions
On T, 2005-10-04 at 00:26 -0400, Rod Taylor wrote: As I understand it vacuum operates outside of the regular transaction and if you stop it (SIGTERM, or pg_cancel_backend()) some of the work it accomplished will be kept when it rolls back. For large structures with a ton of dead entries (which I seem to have a case), running vacuum takes long enough that high-churn structures begin to experience difficulties. Is it reasonable to cancel and restart the vacuum process periodically (say every 12 hours) until it manages to complete the work? It takes about 2 hours to do the table scan, and should get in about 10 hours of index work each round. It seems that the actual work done by LAZY VACUUM is not rolled back when you kill the backend doing the vacuum (though VACUUM is quite hart to kill, and may require KILL -9 to accomplis, with all the downsides of kill -9). So, yes, as a last resort you can kill VACUUM (or rather limit its lifetime by set statement_timeout = XXX) and get some work done in each run. It only makes sense if the timeout is big enough for vacuum to complete the first scan (collect dead tuples) over the heap and then do some actual work. For table with 3 indexes the timeout must be at least (1.st heap scan + 3 indexscans with no work + some portion of 2nd (cleanuout) heap scan ) to ever get the table completely cleaned up. The vacuum ignores vacuum transaction concept looks handy right now. There is a patch for 8.1 in PATCHES list (postponed to 8.2 :( ). This can be backported to 8.0 quite easily. -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Vacuum and Transactions
Is it reasonable to cancel and restart the vacuum process periodically (say every 12 hours) until it manages to complete the work? It takes about 2 hours to do the table scan, and should get in about 10 hours of index work each round. If we started the vacuum with the indexes, remembered a lowest xid per index, we could then vacuum the heap up to the lowest of those xids, no ? We could then also vacuum each index separately. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Slony RPM issue
Hi Devrim, I ran into another RPM issue, this time with Slony. I grabbed the RPM from http://developer.postgresql.org/~devrim/slony/1.1.0/rpms/PG8.0.3/ Trying to run slon_start, I got errors such as: $ slon_start --config /etc/slon_tools.conf 2 Invoke slon for node 2 - @@@/slon -s 1000 -d2 -g 80 replication 'host=dev2 dbname=booktown user=postgres port=5432' 21 /var/log/slony/slony1/node2/booktown-2005-10-05_16:23:53.log sh: line 1: @@@/slon: No such file or directory Slon failed to start for cluster replication, node node2 This turned out to be because /usr/lib/pgsql/slon-tools.pm contained several un-expanded @@PGBINDIR@@ prefixes - once I replaced them with /usr/bin it all ran just fine. Please let me know if I can assist in fixing these issues. Also, thanks to David and Gavin for their help on IRC - it made a big difference to my day! Regards, Philip. -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan - Utiba Pty Ltd This message has been scanned for viruses and dangerous content by Utiba mail server and is believed to be clean. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] Query in SQL statement
Thanks. I've already understood that I need to post it in another list. Sorry for wasting your precious time. -- Rajesh R -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 05, 2005 2:24 PM To: R, Rajesh (STSD) Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query in SQL statement R, Rajesh (STSD) wrote: Am trying to port a mysql statement to postgres. Please help me in finding the error in this, Can I recommend the reference section of the manuals for this sort of thing? There is an excellent section detailing the valid SQL for the CREATE TABLE command. Also - the pgsql-hackers list is for discussion of database development, and the performance list is for performance problems. This would be better posted on pgsql-general or -sql or -novice. CREATE SEQUENCE ai_id; This line is causing the first error: ERROR: relation ai_id already exists That's because you've already successfully created the sequence, so it already exists. Either drop it and recreate it, or stop trying to recreate it. CREATE TABLE badusers ( id int DEFAULT nextval('ai_id') NOT NULL, UserName varchar(30), Date datetime DEFAULT '-00-00 00:00:00' NOT NULL, Well, Date is a type-name, datetime isn't and even if it was -00-00 isn't a valid date is it? Reason varchar(200), Admin varchar(30) DEFAULT '-', PRIMARY KEY (id), KEY UserName (UserName), KEY Date (Date) The word KEY isn't valid here either - are you trying to define an index? If so, see the CREATE INDEX section of the SQL reference. http://www.postgresql.org/docs/8.0/static/sql-commands.html If you reply to this message, please remove the pgsql-hackers CC: -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] Query in SQL statement
R, Rajesh (STSD) wrote: Am trying to port a mysql statement to postgres. Please help me in finding the error in this, Can I recommend the reference section of the manuals for this sort of thing? There is an excellent section detailing the valid SQL for the CREATE TABLE command. Also - the pgsql-hackers list is for discussion of database development, and the performance list is for performance problems. This would be better posted on pgsql-general or -sql or -novice. CREATE SEQUENCE ai_id; This line is causing the first error: ERROR: relation ai_id already exists That's because you've already successfully created the sequence, so it already exists. Either drop it and recreate it, or stop trying to recreate it. CREATE TABLE badusers ( id int DEFAULT nextval('ai_id') NOT NULL, UserName varchar(30), Date datetime DEFAULT '-00-00 00:00:00' NOT NULL, Well, Date is a type-name, datetime isn't and even if it was -00-00 isn't a valid date is it? Reason varchar(200), Admin varchar(30) DEFAULT '-', PRIMARY KEY (id), KEY UserName (UserName), KEY Date (Date) The word KEY isn't valid here either - are you trying to define an index? If so, see the CREATE INDEX section of the SQL reference. http://www.postgresql.org/docs/8.0/static/sql-commands.html If you reply to this message, please remove the pgsql-hackers CC: -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] A Better External Sort?
On Wed, Oct 05, 2005 at 05:41:25AM -0400, Michael Stone wrote: On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote: COPY TO /dev/null WITH binary 13MB/s55% user 45% system (ergo, CPU bound) [snip] the most expensive. But it does point out that the whole process is probably CPU bound more than anything else. Note that 45% of that cpu usage is system--which is where IO overhead would end up being counted. Until you profile where you system time is going it's premature to say it isn't an IO problem. It's a dual CPU system, so 50% is the limit for a single process. Since system usage user, PostgreSQL is the limiter. Sure, the system is taking a lot of time, but PostgreSQL is still the limiting factor. Anyway, the later measurements using gprof exclude system time altogether and it still shows CPU being the limiting factor. Fact is, extracting tuples from pages is expensive. -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpP683jRzgfx.pgp Description: PGP signature
Re: [HACKERS] Vacuum and Transactions
Rod Taylor wrote: I have maintenace_work_mem set to about 1GB in size. Isn't a bit too much ? Regards Gaetano Mendola ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] prefix btree implementation
On Wed, Oct 05, 2005 at 12:50:41AM -0400, Tom Lane wrote: Qingqing Zhou [EMAIL PROTECTED] writes: 1/ What types of prefix compression shall we support? Given the requirement of datatype independence, this idea seems a complete nonstarter to me... How about having each type optionally provide the required routines? Thus we could provide them at least for the most common datatypes, and the system would continue working as currently for the rest (including user-defined types). Cross-column prefixes would be hard to handle I guess, as well as TOASTed data. One problem I do see is what happens if I need to insert a new tuple in the page that doesn't share the prefix. It obviously would have to be the leftmost or rightmost item on the page, but it's possible. -- Alvaro Herrerahttp://www.PlanetPostgreSQL.org A male gynecologist is like an auto mechanic who never owned a car. (Carrie Snow) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Vacuum and Transactions
The vacuum ignores vacuum transaction concept looks handy right now. There is a patch for 8.1 in PATCHES list (postponed to 8.2 :( ). This can be backported to 8.0 quite easily. Understood. I've seen them, but until they're well tested in the newest version I won't be using them in a production environment. I do appreciate the goal and look forward to this concept being applied or a method of splitting up the work vacuum needs to do, in the future. -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Vacuum and Transactions
On Wed, 2005-10-05 at 09:53 +0300, Hannu Krosing wrote: On T, 2005-10-04 at 11:10 -0400, Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: The catch is that there are some other very active structures (like pg_listener for Slony) which after a couple of hours without vacuuming will quickly have the DB at an unreasonably high load (low tens) which seems to all but halt the vacuum on the large structure. Yeah. We desperately need to reimplement listen/notify :-( ... that code was never designed to handle high event rates. Sure. But it handles amazingly well event rates up to a few hundred events per second - given that pg_listener is cleaned up often enough. Accomplishing the pg_listener cleanup often enough can be difficult in some circumstances. It also seems that Slony can be modified to not use LISTEN/NOTIFY in high load situations (akin to high performance network cards, which switch from interrupt driven mode to polling mode if number of packets per second reaches certain thresolds). I have other items in this database with high churn as well. Slony was just an example. -- ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] current_user versus current_role
hello I used info from current_user for log. about some operations (who, when, ..). What I can see, current_user is equal current_role function. I had problem with it, because user (if is member of any group role) can change his identity. example: peter is member of role users. But peter can do set role to users. From this moment I lost possibility of get identity if user, because current_user returns users and not peter. I can check it (if some user changed identity) ~ by exampl. test if role has login privilegia, and if not I can stop any operations. But Is there some possibility get identity of user without impact of change of role? (default role?) thank you Pavel Stehule _ Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] current_user versus current_role
On Wed, Oct 05, 2005 at 03:17:25PM +0200, Pavel Stehule wrote: Hi, I used info from current_user for log. about some operations (who, when, ..). What I can see, current_user is equal current_role function. I had problem with it, because user (if is member of any group role) can change his identity. example: peter is member of role users. But peter can do set role to users. From this moment I lost possibility of get identity if user, because current_user returns users and not peter. I can check it (if some user changed identity) ~ by exampl. test if role has login privilegia, and if not I can stop any operations. I believe you can use session_user for this. -- Alvaro Herrera Developer, http://www.PostgreSQL.org Amanece. (Ignacio Reyes) El Cerro San Cristóbal me mira, cínicamente, con ojos de virgen ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] A Better External Sort?
On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote: COPY TO /dev/null WITH binary 13MB/s55% user 45% system (ergo, CPU bound) [snip] the most expensive. But it does point out that the whole process is probably CPU bound more than anything else. Note that 45% of that cpu usage is system--which is where IO overhead would end up being counted. Until you profile where you system time is going it's premature to say it isn't an IO problem. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] A Better External Sort?
On Tue, Oct 04, 2005 at 12:43:10AM +0300, Hannu Krosing wrote: Just FYI, I run a count(*) on a 15.6GB table on a lightly loaded db and it run in 163 sec. (Dual opteron 2.6GHz, 6GB RAM, 6 x 74GB 15k disks in RAID10, reiserfs). A little less than 100MB sec. And none of that 15G table is in the 6G RAM? Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Interesting optimizer's supposition in 8.1
Tsearch2 has function to_tsquery defined as: CREATE FUNCTION to_tsquery(oid, text) RETURNS tsquery AS '$libdir/tsearch2' LANGUAGE 'c' with (isstrict,iscachable); And let we take 2 essential equivalent queries: # explain select book.id from to_tsquery('foo') as t, book where book.fts @@ t; QUERY PLAN Nested Loop (cost=13.19..6550.69 rows=290 width=4) Join Filter: (inner.fts @@ outer.t) - Function Scan on t (cost=0.00..12.50 rows=1000 width=32) - Materialize (cost=13.19..16.09 rows=290 width=36) - Seq Scan on book (cost=0.00..12.90 rows=290 width=36) # explain select book.id from book where book.fts @@ to_tsquery('foo'); QUERY PLAN - Seq Scan on book (cost=0.00..13.62 rows=1 width=4) Filter: (fts @@ '''foo'''::tsquery) Why planner suppose that t 'table' will return 1000 rows? Obviosly that function returns only one value because of itsn't marked as 'returns setof'. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Slony RPM issue
Hi, On Wed, 5 Oct 2005, Philip Yarra wrote: Hi Devrim, I ran into another RPM issue, this time with Slony. :-) I grabbed the RPM from http://developer.postgresql.org/~devrim/slony/1.1.0/rpms/PG8.0.3/ Trying to run slon_start, I got errors such as: $ slon_start --config /etc/slon_tools.conf 2 Invoke slon for node 2 - @@@/slon -s 1000 -d2 -g 80 replication 'host=dev2 dbname=booktown user=postgres port=5432' 21 /var/log/slony/slony1/node2/booktown-2005-10-05_16:23:53.log sh: line 1: @@@/slon: No such file or directory Slon failed to start for cluster replication, node node2 This turned out to be because /usr/lib/pgsql/slon-tools.pm contained several un-expanded @@PGBINDIR@@ prefixes - once I replaced them with /usr/bin it all ran just fine. Hrrm. It seems that I've installed wrong file in the spec :( However slon-tools file is not updated by the Makefile, so I'll try to fix that, too. Thanks for the report. It will fixed in CVS and all the RPM sets later today. Always feel free to send me a patch if you want, I can apply your patch, too. BTW, could you please report the possible Slony problems to slony1-general@gborg.postgresql.org later? Regards, -- Devrim GUNDUZ Kivi Bilişim Teknolojileri - http://www.kivi.com.tr devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] prefix btree implementation
Qingqing Zhou wrote: I am not sure if this idea was mentioned before. The basic prefix btree idea is quite straightforward, i.e., try to compress the key items within a data page by sharing the common prefix. Thus the fanout of the page is increased and the benefits is obvious theorectically. snip So together, there are basically four types of possible sharing: column-wise (case 1), character-wise (case 2), column-character-wise (case 3), and byte-wise (case 4). Oracle implements something similar called index compression, but I believe it is only for common column values. I haven't checked in versions9r1 so maybe there are other options implemented by now. Jonathan Lewis describes some pros and cons here: http://www.jlcomp.demon.co.uk/faq/compress_ind.html -- ___ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. ___ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] A Better External Sort?
Nope - it would be disk wait. COPY is CPU bound on I/O subsystems faster that 50 MB/s on COPY (in) and about 15 MB/s (out). - Luke -Original Message- From: Michael Stone [mailto:[EMAIL PROTECTED] Sent: Wed Oct 05 09:58:41 2005 To: Martijn van Oosterhout Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org Subject:Re: [HACKERS] [PERFORM] A Better External Sort? On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote: COPY TO /dev/null WITH binary 13MB/s55% user 45% system (ergo, CPU bound) [snip] the most expensive. But it does point out that the whole process is probably CPU bound more than anything else. Note that 45% of that cpu usage is system--which is where IO overhead would end up being counted. Until you profile where you system time is going it's premature to say it isn't an IO problem. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Announcing Veil
Tom, Thanks for your reponse. Unless I am missing your point, to add more locks we require a minor code change to the postgres server. I am happy to submit a patch but this will not help Veil work with existing versions of Postgres. I am aiming for compatibility with 7.4 onward. Your views on this would be appreciated. Assuming that simply allocating a few extra LWLocks for user-defined functions is acceptable, here are some patches: --cut--- *** ./src/backend/storage/lmgr/lwlock.c Sat Aug 20 16:26:24 2005 --- lwlock.cWed Oct 5 08:20:31 2005 *** *** 120,126 */ numLocks += 2 * NUM_SLRU_BUFFERS; ! /* Perhaps create a few more for use by user-defined modules? */ return numLocks; } --- 120,127 */ numLocks += 2 * NUM_SLRU_BUFFERS; ! /* Create a few more for use by user-defined modules. */ ! numLocks += NUM_USER_DEFINED_LWLOCKS; return numLocks; } --cut--- *** ./src/include/storage/lwlock.h Sat Aug 20 16:26:34 2005 --- lwlock.hWed Oct 5 08:22:26 2005 *** *** 53,58 --- 53,62 MaxDynamicLWLock = 10 } LWLockId; + /* + * Allocate a few LWLocks for user-defined functions. + */ + #define NUM_USER_DEFINED_LWLOCKS 4 typedef enum LWLockMode { --cut--- __ Marc Munro On Tue, 2005-10-04 at 22:51 -0400, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: Since I was unable to dynamically assign a LWLock using LWLockAssign (none available), I have fairly arbitrarily overloaded the use of existing LWLocks. When the flames die down perhaps we can discuss making a small number (one would be enough for me) of LWLocks available. Perhaps you missed the comment in NumLWLocks()? regards, tom lane signature.asc Description: This is a digitally signed message part
[HACKERS] wrong optimization ( postgres 8.0.3 )
Hi all, take a look at this simple function and view: CREATE OR REPLACE FUNCTION sp_connected_test ( INTEGER ) RETURNS BOOLEAN AS' DECLARE a_id_user ALIAS FOR $1; BEGIN PERFORM * FROM v_current_connection WHERE id_user = a_id_user; IF NOT FOUND THEN RETURN FALSE; END IF; RETURN TRUE; END; ' LANGUAGE 'plpgsql'; CREATE VIEW v_current_connection_test AS SELECT ul.id_user, cc.connected FROM current_connection cc, user_login ul WHERE cc.id_user = ul.id_user AND connected = TRUE; SELECT * FROM v_current_connection_test WHERE sp_connected_test(id_user) = FALSE; this line shall produce no row, but randomly does. If I put a RAISE NOTICE before RETURN NULL with the id_user I notice that the function is called on records present on user_login but discarged because the join with current_connectin have connected = FALSE! I can work_around the problem rewriting the view: CREATE VIEW v_current_connection_test AS SELECT cc.id_user, cc.connected FROM current_connection cc, user_login ul WHERE cc.id_user = ul.id_user AND connected = TRUE; Regards Gaetano Mendola ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Interesting optimizer's supposition in 8.1
Teodor Sigaev [EMAIL PROTECTED] writes: Why planner suppose that t 'table' will return 1000 rows? Because set_function_size_estimates() is only a stub :-( regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] A Better External Sort?
I've now gotten verification from multiple working DBA's that DB2, Oracle, and SQL Server can achieve ~250MBps ASTR (with as much as ~500MBps ASTR in setups akin to Oracle RAC) when attached to a decent (not outrageous, but decent) HD subsystem... I've not yet had any RW DBA verify Jeff Baker's supposition that ~1GBps ASTR is attainable. Cache based bursts that high, yes. ASTR, no. The DBA's in question run RW installations that include Solaris, M$, and Linux OS's for companies that just about everyone on these lists are likely to recognize. Also, the implication of these pg IO limits is that money spent on even moderately priced 300MBps SATA II based RAID HW is wasted $'s. In total, this situation is a recipe for driving potential pg users to other DBMS. 25MBps in and 15MBps out is =BAD=. Have we instrumented the code in enough detail that we can tell _exactly_ where the performance drainage is? We have to fix this. Ron -Original Message- From: Luke Lonergan [EMAIL PROTECTED] Sent: Oct 5, 2005 11:24 AM To: Michael Stone [EMAIL PROTECTED], Martijn van Oosterhout kleptog@svana.org Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org Subject: Re: [HACKERS] [PERFORM] A Better External Sort? Nope - it would be disk wait. COPY is CPU bound on I/O subsystems faster that 50 MB/s on COPY (in) and about 15 MB/s (out). - Luke -Original Message- From: Michael Stone [mailto:[EMAIL PROTECTED] Sent: Wed Oct 05 09:58:41 2005 To: Martijn van Oosterhout Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org Subject:Re: [HACKERS] [PERFORM] A Better External Sort? On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote: COPY TO /dev/null WITH binary 13MB/s55% user 45% system (ergo, CPU bound) [snip] the most expensive. But it does point out that the whole process is probably CPU bound more than anything else. Note that 45% of that cpu usage is system--which is where IO overhead would end up being counted. Until you profile where you system time is going it's premature to say it isn't an IO problem. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] prefix btree implementation
Bricklen Anderson [EMAIL PROTECTED] wrote Oracle implements something similar called index compression, but I believe it is only for common column values. I haven't checked in versions9r1 so maybe there are other options implemented by now. Jonathan Lewis describes some pros and cons here: http://www.jlcomp.demon.co.uk/faq/compress_ind.html Oracle 9 uses the grammar like this: CREATE INDEX ... [ COMPRESS number_of_first_columns ] So it gives the flexibility of choosing optimal number of coulumns to the user. The script mentioned in the article guesses the optimal number by estimating the size of each choice. But I am thinking we can do it better: (1) we don't require that the compressed number of columns on each page are the same; (2) when we build up index bottom-up, we can determine this number for each page automatically by maximizing the number of items within a page. Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] A Better External Sort?
We have to fix this. Ron The source is freely available for your perusal. Please feel free to point us in specific directions in the code where you may see some benefit. I am positive all of us that can, would put resources into fixing the issue had we a specific direction to attack. Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] A Better External Sort?
First I wanted to verify that pg's IO rates were inferior to The Competition. Now there's at least an indication that someone else has solved similar problems. Existence proofs make some things easier ;-) Is there any detailed programmer level architectual doc set for pg? I know the best doc is the code, but the code in isolation is often the Slow Path to understanding with systems as complex as a DBMS IO layer. Ron -Original Message- From: Joshua D. Drake [EMAIL PROTECTED] Sent: Oct 5, 2005 1:18 PM Subject: Re: [HACKERS] [PERFORM] A Better External Sort? The source is freely available for your perusal. Please feel free to point us in specific directions in the code where you may see some benefit. I am positive all of us that can, would put resources into fixing the issue had we a specific direction to attack. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] wrong optimization ( postgres 8.0.3 )
Gaetano Mendola [EMAIL PROTECTED] writes: What I'm experiencing is a problem ( I upgraded today from 7.4.x to 8.0.3 ) that I explain here: The following function just return how many records there are inside the view v_current_connection CREATE OR REPLACE FUNCTION sp_count ( ) RETURNS INTEGER AS' DECLARE c INTEGER; BEGIN SELECT count(*) INTO c FROM v_current_connection; RETURN c; END; ' LANGUAGE 'plpgsql'; I have the following select # select count(*), sp_count() from v_current_connection; count | sp_count - ---+-- 977 | 978 as you can see the two count are returning different record numbers ( in meant time other transactions are updating tables behind the view v_current_connection ). This isn't surprising at all, if other transactions are actively changing the table. See the release notes for 8.0: : Observe the following incompatibilities: : : In READ COMMITTED serialization mode, volatile functions now see : the results of concurrent transactions committed up to the : beginning of each statement within the function, rather than up to : the beginning of the interactive command that called the function. : : Functions declared STABLE or IMMUTABLE always use the snapshot of : the calling query, and therefore do not see the effects of actions : taken after the calling query starts, whether in their own : transaction or other transactions. Such a function must be : read-only, too, meaning that it cannot use any SQL commands other : than SELECT. If you want this function to see the same snapshot as the calling query sees, declare it STABLE. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] A Better External Sort?
On Wed, Oct 05, 2005 at 11:24:07AM -0400, Luke Lonergan wrote: Nope - it would be disk wait. I said I/O overhead; i.e., it could be the overhead of calling the kernel for I/O's. E.g., the following process is having I/O problems: time dd if=/dev/sdc of=/dev/null bs=1 count=1000 1000+0 records in 1000+0 records out 1000 bytes transferred in 8.887845 seconds (1125132 bytes/sec) real0m8.889s user0m0.877s sys 0m8.010s it's not in disk wait state (in fact the whole read was cached) but it's only getting 1MB/s. Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database
I see that my initial post never made it through to the list. I assume this was some technical failure, so I'm adding it back for this reply. It doesn't appear that we did stop postmaster between incidents. We have now done so. The software we are running is a build from the beta2 release, with no special options specified at ./configure time. Would you expect such a build to include the debug info you wanted? We will include the --enable-debug in our next build, but I wondered because I was showing our DBA manager the diagnostic steps, and ran gdb bt against an idle connection, and got: (gdb) bt #0 0x40197b46 in recv () from /lib/i686/libc.so.6 #1 0x0813485f in secure_read () #2 0x08138f7b in pq_recvbuf () #3 0x081393a9 in pq_getbyte () #4 0x08195565 in PostgresMain () #5 0x081716c5 in ServerLoop () #6 0x0817232e in PostmasterMain () #7 0x0813aad8 in main () Which seemed to show reasonable information, to my untrained eye. That got me wondering whether the (corrupt stack?) note on the previous backtrace might be something real. Both were run against processes running the same copy of the backend software. -Kevin Tom Lane [EMAIL PROTECTED] 10/04/05 4:22 PM Kevin Grittner [EMAIL PROTECTED] writes: I can't hold the database in the problem state much longer -- if there are any other diagnostic steps you'd like me to take before we clear the problem, please let me know very soon. Not at the moment ... INFO: vacuuming pg_catalog.pg_constraint INFO: index pg_constraint_conname_nsp_index now contains 35 row = versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index pg_constraint_conrelid_index now contains 35 row versions = in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. [Hanging here for about 2 hours so far.] Interesting that it seems to consistently be having a problem with a pg_constraint index. Have you restarted the postmaster at any point since this trouble began? If it were something like an unreleased buffer pin, then it could persist indefinitely until postmaster restart. (gdb) bt #0 0x40198488 in semop () from /lib/i686/libc.so.6 #1 0x4a2c8cf8 in ?? () #2 0xbfffb2e0 in ?? () #3 0xbfffb308 in ?? () #4 0x0816a3d4 in PGSemaphoreLock () Previous frame inner to this frame (corrupt stack?) This is fairly unhelpful :-(. The next stack frame down would have told us something useful, but really we need to see the whole call stack. It may be that you need to rebuild Postgres with --enable-debug in order to get something gdb can work with. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database
Kevin Grittner [EMAIL PROTECTED] writes: The software we are running is a build from the beta2 release, with no special options specified at ./configure time. Would you expect such a build to include the debug info you wanted? No, you need configure --enable-debug, which is not the default. For working with a beta release, --enable-cassert isn't a bad idea either, though it is probably not relevant to your problem. (gdb) bt #0 0x40197b46 in recv () from /lib/i686/libc.so.6 #1 0x0813485f in secure_read () #2 0x08138f7b in pq_recvbuf () #3 0x081393a9 in pq_getbyte () #4 0x08195565 in PostgresMain () #5 0x081716c5 in ServerLoop () #6 0x0817232e in PostmasterMain () #7 0x0813aad8 in main () Which seemed to show reasonable information, to my untrained eye. Yeah, that looks expected for a non-debug build. (Debug build would show call parameters too, which is why it would be more helpful even apart from the (corrupt stack?) problem.) That got me wondering whether the (corrupt stack?) note on the previous backtrace might be something real. More likely, it's specific to particular places in the code that got optimized in a way that gdb couldn't figure out. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
On Wed, 2005-10-05 at 12:14 -0400, Ron Peacetree wrote: I've now gotten verification from multiple working DBA's that DB2, Oracle, and SQL Server can achieve ~250MBps ASTR (with as much as ~500MBps ASTR in setups akin to Oracle RAC) when attached to a decent (not outrageous, but decent) HD subsystem... I've not yet had any RW DBA verify Jeff Baker's supposition that ~1GBps ASTR is attainable. Cache based bursts that high, yes. ASTR, no. I find your tone annoying. That you do not have access to this level of hardware proves nothing, other than pointing out that your repeated emails on this list are based on supposition. If you want 1GB/sec STR you need: 1) 1 or more Itanium CPUs 2) 24 or more disks 3) 2 or more SATA controllers 4) Linux Have fun. -jwb ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database
On Wed, Oct 05, 2005 at 02:27:32PM -0400, Tom Lane wrote: Kevin Grittner [EMAIL PROTECTED] writes: The software we are running is a build from the beta2 release, with no special options specified at ./configure time. Would you expect such a build to include the debug info you wanted? No, you need configure --enable-debug, which is not the default. For working with a beta release, --enable-cassert isn't a bad idea either, though it is probably not relevant to your problem. Also, note that --enable-cassert will reduce performance somewhat, and may make the bug go away. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 En el principio del tiempo era el desencanto. Y era la desolación. Y era grande el escándalo, y el destello de monitores y el crujir de teclas. (Sean los Pájaros Pulentios, Daniel Correa) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database
Alvaro Herrera [EMAIL PROTECTED] writes: On Wed, Oct 05, 2005 at 02:27:32PM -0400, Tom Lane wrote: For working with a beta release, --enable-cassert isn't a bad idea either, though it is probably not relevant to your problem. Also, note that --enable-cassert will reduce performance somewhat, and may make the bug go away. True --- but there's also a chance it could expose the bug immediately. It'd be worth trying both ways. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [Slony1-general] Re: [HACKERS] Slony RPM issue
On Wed, Oct 05, 2005 at 05:32:40PM +0300, Devrim GUNDUZ wrote: Hi, On Wed, 5 Oct 2005, Philip Yarra wrote: Hi Devrim, I ran into another RPM issue, this time with Slony. :-) I grabbed the RPM from http://developer.postgresql.org/~devrim/slony/1.1.0/rpms/PG8.0.3/ Trying to run slon_start, I got errors such as: $ slon_start --config /etc/slon_tools.conf 2 Invoke slon for node 2 - @@@/slon -s 1000 -d2 -g 80 replication 'host=dev2 dbname=booktown user=postgres port=5432' 21 /var/log/slony/slony1/node2/booktown-2005-10-05_16:23:53.log sh: line 1: @@@/slon: No such file or directory Slon failed to start for cluster replication, node node2 This turned out to be because /usr/lib/pgsql/slon-tools.pm contained several un-expanded @@PGBINDIR@@ prefixes - once I replaced them with /usr/bin it all ran just fine. This is also a problem for 64bit machines which like to install things in lib64. The @@PGBINDIR@@ is also unset in these cases. Generally a short sed (or perl if you like) script will fix these up. But it is really pretty obscure trail for people to find the exact problem. --elein Hrrm. It seems that I've installed wrong file in the spec :( However slon-tools file is not updated by the Makefile, so I'll try to fix that, too. Thanks for the report. It will fixed in CVS and all the RPM sets later today. Always feel free to send me a patch if you want, I can apply your patch, too. BTW, could you please report the possible Slony problems to slony1-general@gborg.postgresql.org later? Regards, -- Devrim GUNDUZ Kivi Bili?im Teknolojileri - http://www.kivi.com.tr devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org ___ Slony1-general mailing list Slony1-general@gborg.postgresql.org http://gborg.postgresql.org/mailman/listinfo/slony1-general ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [DOCS] Updated documentation for new sequence binding
On Sun, Oct 02, 2005 at 10:54:10PM -0400, Bruce Momjian wrote: pgman wrote: I have marged Tom's description of the new sequence binding with text I was working on. I modified it to follow the existing we used to do X, now we do Y pattern in the surrounding entries: http://candle.pha.pa.us/tmp/pgsql/release.html#RELEASE-8-1 Sorry, this is a better URL: http://candle.pha.pa.us/main/writings/pgsql/sgml/release.html#RELEASE-8-1 Out of curiosity, how is this file maintained as development is done? The reason I'm asking is because it would be nice to have links to more information, especially for the 'one-liner' items in 1.3.1 for example, and it seems like that would be easier to do along-the-way rather than waiting for the end. Even a link to a mailing list discussion would be better than nothing... That said, what's Improve the optimizer, including auto-resizing of hash joins (Tom) mean? On full_page_writes, are we certain that all battery-backed disk caches ensure that partial-page writes can't happen? Maybe pg_column_size should just be included in the item for all the other size functions brought into the backend? If not, ISTM they should at least be one after the other in the list... Finally, weren't more changes made it contrib than what's listed? Nothing's said about pg_autovacuum for example. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] prefix btree implementation
Alvaro Herrera [EMAIL PROTECTED] wrote On Wed, Oct 05, 2005 at 12:50:41AM -0400, Tom Lane wrote: Qingqing Zhou [EMAIL PROTECTED] writes: 1/ What types of prefix compression shall we support? Given the requirement of datatype independence, this idea seems a complete nonstarter to me... How about having each type optionally provide the required routines? Thus we could provide them at least for the most common datatypes, and the system would continue working as currently for the rest (including user-defined types). Cross-column prefixes would be hard to handle I guess, as well as TOASTed data. Yes, column-wise should not be difficult since it does require no knowledge of the data types. We can treat cross-column or incomplete-column share in two ways. One way (binary-comparison) is that we just compare the items binaryly, i.e., without knowing what in fact is stored. The other way (datatype-comparison) is we compare the items with some knowlege of the data types. For example, suppose the index is defined on a varchar column and the examplar data look like this: {3|'aaa'}{4|'aaab'}{5|'aaabc'} The binary-comparison way can't share prefix 'aaa' at all, because it will see 3, 4 and 5 are totally different. The datatype-comparison way can share the prefix 'aaa', but it has to know that each varchar column is associated with a length header. When it compares, it ignores the header. The first way is easy to implement, and works for some data types like integers, but no acceptable I guess, since it even does not support varchar column prefix sharing. We can find a way to handle the above case, but it is better to find a general way to handle any data types(include UDT). Each type optionally provide the required routines could be a way, more details? One problem I do see is what happens if I need to insert a new tuple in the page that doesn't share the prefix. It obviously would have to be the leftmost or rightmost item on the page, but it's possible. We do the prefix sharing when we build up index only, never on the fly. Regards, Qingqing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] [EMAIL PROTECTED]: [BUGS] Race condition in dropdb;createdb]
Never got a reply on -bugs... do people think this is an issue? - Forwarded message from Jim C. Nasby [EMAIL PROTECTED] - [EMAIL PROTECTED]:32]~:1%createdb test; CREATE DATABASE [EMAIL PROTECTED]:32]~:2%dropdb test; createdb test; DROP DATABASE createdb: database creation failed: ERROR: source database template1 is being accessed by other users [EMAIL PROTECTED]:32]~:3% I realize that the changes made in 8.1 WRT using the postgres database fix this particular case, but is there a bigger issue with there being a race condition here? Presumably dropdb returns before the connection is actually closed from the database's standpoint... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] A Better External Sort?
On 10/6/05, Michael Stone [EMAIL PROTECTED] wrote: On Wed, Oct 05, 2005 at 11:24:07AM -0400, Luke Lonergan wrote: Nope - it would be disk wait. I said I/O overhead; i.e., it could be the overhead of calling the kernel for I/O's. E.g., the following process is having I/O problems: time dd if=/dev/sdc of=/dev/null bs=1 count=1000 1000+0 records in 1000+0 records out 1000 bytes transferred in 8.887845 seconds (1125132 bytes/sec) real0m8.889s user0m0.877s sys 0m8.010s it's not in disk wait state (in fact the whole read was cached) but it's only getting 1MB/s. Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings I think you only proved that dd isn't the smartest tool out there... or that using it with a blocksize of 1 byte doesn't make too much sense. [EMAIL PROTECTED]:~]$ time dd if=/dev/sr0 of=/dev/null bs=2048 count=4883 4883+0 records in 4883+0 records out real0m6.824s user0m0.010s sys 0m0.060s [EMAIL PROTECTED]:~]$ time dd if=/dev/sr0 of=/dev/null bs=1 count=1000 1000+0 records in 1000+0 records out real0m18.523s user0m7.410s sys 0m10.310s [EMAIL PROTECTED]:~]$ time dd if=/dev/sr0 of=/dev/null bs=8192 count=1220 1220+0 records in 1220+0 records out real0m6.796s user0m0.000s sys 0m0.070s That's with caching, and all. Or did I miss the point of your post completely? Interestingly, the CPU usage with the bs=1 goes up to 97%, it stays at a mellow 3% with the 8192 and 2048. Cheers, Andrej ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [EMAIL PROTECTED]: [BUGS] Race condition in dropdb;createdb]
Jim C. Nasby [EMAIL PROTECTED] writes: Never got a reply on -bugs... do people think this is an issue? Not really. The only way to fix it would be to make PQfinish do a synchronous close, which seems like more of a performance loss than it's worth. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] A Better External Sort?
Ron, This thread is getting on my nerves. Your tone in some of the other posts (as-well-as this one) is getting very annoying. Yes, PostgreSQL's storage manager (like all other open source databases), lacks many of the characteristics and enhancements of the commercial databases. Unlike Oracle, Microsoft, etc., the PostgreSQL Global Development Group doesn't have the tens of millions of dollars required to pay hundreds of developers around the world for round-the-clock development and RD. Making sure that every little tweak, on every system, is taken advantage of is expensive (in terms of time) for an open source project where little ROI is gained. Before you make a statement like, I wanted to verify that pg's IO rates were inferior to The Competition, think about how you'd write your own RDBMS from scratch (in reality, not in theory). As for your question regarding developer docs for the storage manager and related components, read the READMEs and the code... just like everyone else. Rather than posting more assumptions and theory, please read through the code and come back with actual suggestions. -Jonah 2005/10/5, Ron Peacetree [EMAIL PROTECTED]: First I wanted to verify that pg's IO rates were inferior to The Competition. Now there's at least an indication that someone else has solved similar problems. Existence proofs make some things easier ;-) Is there any detailed programmer level architectual doc set for pg? I know the best doc is the code, but the code in isolation is often the Slow Path to understanding with systems as complex as a DBMS IO layer. Ron -Original Message- From: Joshua D. Drake [EMAIL PROTECTED] Sent: Oct 5, 2005 1:18 PM Subject: Re: [HACKERS] [PERFORM] A Better External Sort? The source is freely available for your perusal. Please feel free to point us in specific directions in the code where you may see some benefit. I am positive all of us that can, would put resources into fixing the issue had we a specific direction to attack. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Respectfully, Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [DOCS] Updated documentation for new sequence binding
Jim C. Nasby wrote: On Sun, Oct 02, 2005 at 10:54:10PM -0400, Bruce Momjian wrote: pgman wrote: I have marged Tom's description of the new sequence binding with text I was working on. I modified it to follow the existing we used to do X, now we do Y pattern in the surrounding entries: http://candle.pha.pa.us/tmp/pgsql/release.html#RELEASE-8-1 Sorry, this is a better URL: http://candle.pha.pa.us/main/writings/pgsql/sgml/release.html#RELEASE-8-1 Out of curiosity, how is this file maintained as development is done? The reason I'm asking is because it would be nice to have links to more information, especially for the 'one-liner' items in 1.3.1 for example, and it seems like that would be easier to do along-the-way rather than waiting for the end. Even a link to a mailing list discussion would be better than nothing... I go through the CVS commits and make the release notes just before beta. During it incrementally is much more work. That said, what's Improve the optimizer, including auto-resizing of hash joins (Tom) mean? We would sometimes fail in a query where the allocated memory was larger than our hash could hold. This fixed that. On full_page_writes, are we certain that all battery-backed disk caches ensure that partial-page writes can't happen? Well, I would think so, but I have no documentation to state that. The battery-backed memory is supposed to allow for power failure and keep writes in the cache until they hit the disk. Maybe pg_column_siz should just be included in the item for all the other size functions brought into the backend? If not, ISTM they should at least be one after the other in the list... Not really. It is different in that is does not aggregate values, but just the storage of a column. Finally, weren't more changes made it contrib than what's listed? Nothing's said about pg_autovacuum for example. Moved to the main server. That is mentioned. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] wrong optimization ( postgres 8.0.3 )
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: Gaetano Mendola [EMAIL PROTECTED] writes: What I'm experiencing is a problem ( I upgraded today from 7.4.x to 8.0.3 ) that I explain here: The following function just return how many records there are inside the view v_current_connection CREATE OR REPLACE FUNCTION sp_count ( ) RETURNS INTEGER AS' DECLARE c INTEGER; BEGIN SELECT count(*) INTO c FROM v_current_connection; RETURN c; END; ' LANGUAGE 'plpgsql'; I have the following select # select count(*), sp_count() from v_current_connection; count | sp_count - ---+-- 977 | 978 as you can see the two count are returning different record numbers ( in meant time other transactions are updating tables behind the view v_current_connection ). This isn't surprising at all, if other transactions are actively changing the table. See the release notes for 8.0: : Observe the following incompatibilities: : : In READ COMMITTED serialization mode, volatile functions now see : the results of concurrent transactions committed up to the : beginning of each statement within the function, rather than up to : the beginning of the interactive command that called the function. : : Functions declared STABLE or IMMUTABLE always use the snapshot of : the calling query, and therefore do not see the effects of actions : taken after the calling query starts, whether in their own : transaction or other transactions. Such a function must be : read-only, too, meaning that it cannot use any SQL commands other : than SELECT. If you want this function to see the same snapshot as the calling query sees, declare it STABLE. I think I understood :-( Just to be clear: select work_on_connected_user(id_user) from v_connected_user; if that function is not stable than it can work on an id_user that is not anymore on view v_connected_user. Is this right ? Regards Gaetano Mendola -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFDRDPh7UpzwH2SGd4RAnPVAJ9PdcVoUoOh7U4poR0Hd9uT4l/QgACg9nXg sebdHozcBV7t7JZslluGzB8= =rFgE -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Resultset duplicates (was Re: [HACKERS] prefix btree implementation)
Qingqing Zhou wrote: Oracle 9 uses the grammar like this: CREATE INDEX ... [ COMPRESS number_of_first_columns ] So it gives the flexibility of choosing optimal number of coulumns to the user. The script mentioned in the article guesses the optimal number by estimating the size of each choice. But I am thinking we can do it better: (1) we don't require that the compressed number of columns on each page are the same; (2) when we build up index bottom-up, we can determine this number for each page automatically by maximizing the number of items within a page. Are there any gains in eliminating duplicate values in result-sets? I'd guess that many/most large result-sets are sorted which should make it possible to get away with a same as last row marker when the whole set is returned to a client. Of course, this is where someone turns around and tells me we do this already :-) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
I'm putting in as much time as I can afford thinking about pg related performance issues. I'm doing it because of a sincere desire to help understand and solve them, not to annoy people. If I didn't believe in pg, I would't be posting thoughts about how to make it better. It's probably worth some review (suggestions marked with a +: +I came to the table with a possibly better way to deal with external sorts (that now has branched into 2 efforts: short term improvements to the existing code, and the original from-the-ground-up idea). That suggestion was based on a great deal of prior thought and research, despite what some others might think. Then we were told that our IO limit was lower than I thought. +I suggested that as a Quick Fix we try making sure we do IO transfers in large enough chunks based in the average access time of the physical device in question so as to achieve the device's ASTR (ie at least 600KB per access for a 50MBps ASTR device with a 12ms average access time.) whenever circumstances allowed us. As far as I know, this experiment hasn't been tried yet. I asked some questions about physical layout and format translation overhead being possibly suboptimal that seemed to be agreed to, but specifics as to where we are taking the hit don't seem to have been made explicit yet. +I made the from left field suggestion that perhaps a pg native fs format would be worth consideration. This is a major project, so the suggestion was to at least some extent tongue-in-cheek. +I then made some suggestions about better code instrumentation so that we can more accurately characterize were the bottlenecks are. We were also told that evidently we are CPU bound far before one would naively expect to be based on the performance specifications of the components involved. Double checking among the pg developer community led to some differing opinions as to what the actual figures were and under what circumstances they were achieved. Further discussion seems to have converged on both accurate values and a better understanding as to the HW and SW needed; _and_ we've gotten some RW confirmation as to what current reasonable expectations are within this problem domain from outside the pg community. +Others have made some good suggestions in this thread as well. Since I seem to need to defend my tone here, I'm not detailing them here. That should not be construed as a lack of appreciation of them. Now I've asked for the quickest path to detailed understanding of the pg IO subsystem. The goal being to get more up to speed on its coding details. Certainly not to annoy you or anyone else. At least from my perspective, this for the most part seems to have been an useful and reasonable engineering discussion that has exposed a number of important things. Regards, Ron ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] A Better External Sort?
Michael, On 10/5/05 8:33 AM, Michael Stone [EMAIL PROTECTED] wrote: real0m8.889s user0m0.877s sys 0m8.010s it's not in disk wait state (in fact the whole read was cached) but it's only getting 1MB/s. You've proven my point completely. This process is bottlenecked in the CPU. The only way to improve it would be to optimize the system (libc) functions like fread where it is spending most of it's time. In COPY, we found lots of libc functions like strlen() being called ridiculous numbers of times, in one case it was called on every timestamp/date attribute to get the length of TZ, which is constant. That one function call was in the system category, and was responsible for several percent of the time. By the way, system routines like fgetc/getc/strlen/atoi etc, don't appear in gprof profiles of dynamic linked objects, nor by default in oprofile results. If the bottleneck is in I/O, you will see the time spent in disk wait, not in system. - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Slony RPM issue
Hi Devrim, On Thu, 6 Oct 2005 12:32 am, Devrim GUNDUZ wrote: Thanks for the report. It will fixed in CVS and all the RPM sets later today. Always feel free to send me a patch if you want, I can apply your patch, too. OK, you got my previous email about why pgsql-libs was dependent on compat-libs RPM? I don't even know where to start with fixing RPM issues, but I guess I can learn if you would like help? BTW, could you please report the possible Slony problems to slony1-general@gborg.postgresql.org later? Sure will. Ermmm, there isn't something you want to tell me is there? :-) I'd rather hear about gotchas now rather than finding them myself at the eleventh hour. Regards, Philip. -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan - Utiba Pty Ltd This message has been scanned for viruses and dangerous content by Utiba mail server and is believed to be clean. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Vacuum and Transactions
[EMAIL PROTECTED] (Hannu Krosing) writes: It also seems that Slony can be modified to not use LISTEN/NOTIFY in high load situations (akin to high performance network cards, which switch from interrupt driven mode to polling mode if number of packets per second reaches certain thresolds). Yeah, I want to do some more testing of that; it should be easy to improve the abuse of pg_listener a whole lot. Unfortunately Slony and Listen/Notify is not the only place where high- update rate tables start to suffer from vacuums inability to clean out dead tuples when working in parallel with other slower vacuums. In real life there are other database tasks which also need some tables to stay small, while others must be huge in order to work effectively. Putting small and big tables in different databases and using dblink-like functionality when accessing them is one solution for such cases, but it is rather ugly :( That eliminates the ability to utilize transactions on things that ought to be updated in a single transaction... -- output = (cbbrowne @ ntlug.org) http://cbbrowne.com/info/lsf.html MS-Windows: Proof that P.T. Barnum was correct. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] prefix btree implementation
Hello all, I also was examining a similar compression method just. Qingqing Zhou wrote: We can find a way to handle the above case, but it is better to find a general way to handle any data types(include UDT). Each type optionally provide the required routines could be a way, more details? How about the use of difference information with High key? Because High key information exists also in the route page, I think that it seems to be able to use it well. (e.g. new tuple insertion) # There is a problem that in the rightmost page, High key is not... My consideration was just started, whether it goes well really has not been understood yet. --- Junji Teramoto ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bitmap scans vs. the statistics views
Quite some time ago I complained about the fact that bitmap index scans weren't being counted sanely by the statistics mechanism: http://archives.postgresql.org/pgsql-hackers/2005-04/msg00675.php That discussion trailed off without deciding how to fix it, but we really can't let this go without fixing it in 8.1. I studied the code some more and realized that we had been operating under some fundamental misconceptions. The distinction made in the existing stats code between tuples fetched and tuples returned has nothing whatever to do with live vs. dead tuples --- all these counts are made only after determining that a tuple is visible. The way it really works in 8.0 is: table tuples_returned: tuples returned by heap_getnext, ie, live tuples found by seqscans table tuples_fetched: tuples returned by heap_fetch under conditions other than being invoked by an indexscan (this covers various random cases like ANALYZE and TID scans) index tuples_fetched: tuples returned by heap_fetch when invoked by an indexscan on this index index tuples_returned: actually, exactly the same as tuples_fetched. This possibly explains why the original design of the pg_stat_all_tables view exposed only two of the seemingly four interesting counts. I have just committed changes that redefine the counts like this: table tuples_returned: same as before, ie, live tuples found by seqscans table tuples_fetched: tuples returned by heap_fetch when invoked by a bitmap scan (the random other cases no longer get counted at all) index tuples_fetched: same as before, ie, live tuples fetched by simple indexscans using this index index tuples_returned: number of index entries returned from the index AM, counting both simple and bitmap scans. The pg_stat_all_tables view is modified to add the table's tuples_fetched count to the sum of the per-index tuples_fetched counts, so that idx_tup_fetch counts both simple and bitmap index scans. It's possible to break these out by looking at the low-level statistics functions, however. With the new definitions you can get some weak information about the numbers of dead tuples fetched by indexscans, which was not possible at all before. (It's weak because it's not easy to distinguish differences due to dead tuples from differences due to bitmap scanning.) In the earlier discussion, Josh commented that getting stats about dead tuples probably belongs somewhere else anyway, and I'm inclined to agree with that; so I don't feel too bad about not having provided more complete information. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] fixing LISTEN/NOTIFY
Applications that frequently use LISTEN/NOTIFY can suffer from performance problems because of the MVCC bloat created by frequent insertions into pg_listener. A solution to this has been suggested in the past: rewrite LISTEN/NOTIFY to use shared memory rather than system catalogs. The problem is that there is a static amount of shared memory and a potentially unbounded number of notifications, so we can run out of memory. There are two ways to solve this: we can do as sinval does and clear the shared memory queue, then effectively issue a NOTIFY ALL that awakens all listeners. I don't like this behaviour: it seems ugly to expose an implementation detail (static sizing of shared memory) to applications. While a lot of applications are only using LISTEN/NOTIFY for cache invalidation (and so spurious notifications are just a performance hit), this behaviour still seems unfortunate to me. Using NOTIFY ALL also makes NOTIFY 'msg' far less useful, which is a feature several users have asked for in the past. I think it would be better to either fail the NOTIFY when there is not enough shared memory to add a new notification to the queue, or have the NOTIFY block until shared memory does become available (applications could of course implement the latter on top of the former by using savepoints and a loop, either on the client-side or in PL/PgSQL). I guess we could add an option to NOTIFY to specify how to handle failures. A related question is when to add the notification to the shared memory queue. We don't want the notification to fire until the NOTIFY's transaction commits, so one alternative would be to delay appending to the queue until transaction-commit time. However, that would mean we wouldn't notice NOTIFY failure until the end of the transaction, or else that we would block waiting for free space during the transaction-commit process. I think it would be better to add an entry to shared memory during the NOTIFY itself, and stamp that entry with the NOTIFY's toplevel XID. Other backends can read that the notification immediately (and once all the backends have seen it, the notification can be removed from the queue). Each backend can use the XID to determine when to fire the notification (and if the notifying backend rolls back, they can just discard the notification). This scheme is more expensive when the notifying transaction rolls back, but I don't think that is the common case. Comments? (I'm still thinking about how to organize the shared memory queue, and whether any of the sinval stuff can be reused...) -Neil ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] fixing LISTEN/NOTIFY
Neil Conway [EMAIL PROTECTED] writes: [ various ideas about reimplementing LISTEN/NOTIFY ] I really dislike the idea of pushing notifies out to the shared queue before commit. That essentially turns forever do notify foo into a global DOS tool: you can drive everybody else's backend into swap hell along with your own. The idea of blocking during commit until shmem becomes available might work. There's some issues here about transaction atomicity, though: how do you guarantee that all or none of your notifies get sent? (Actually, supposing that the notifies ought to be sent post-commit, all is the only acceptable answer. So maybe you just never give up.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] fixing LISTEN/NOTIFY
On Thu, 2005-06-10 at 01:14 -0400, Tom Lane wrote: The idea of blocking during commit until shmem becomes available might work. There's some issues here about transaction atomicity, though: how do you guarantee that all or none of your notifies get sent? (Actually, supposing that the notifies ought to be sent post-commit, all is the only acceptable answer. So maybe you just never give up.) Yeah, I think that would work. We could also write to shared memory before the commit proper, and embed an XID in the message to allow other backends to determine when/if to fire the notification. However, I don't really like the idea of blocking the backend for a potentially significant amount of time in a state half-way between committed and ready for the next query. -Neil ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq