Re: [HACKERS] PostGIS Integration
I can't see any way to handle parameterized types without extending the grammar individually for each one --- otherwise it's too hard to tell them apart from function calls. That makes it a bit hard to do 'em as plug-ins :-(. The grammar hacks are certainly ugly though, and if someone could think of a way, I'm all ears... Disallow it in table definitions, but allow it in domain definitions... Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostGIS Integration
Christopher Kings-Lynne [EMAIL PROTECTED] writes: I can't see any way to handle parameterized types without extending the grammar individually for each one --- otherwise it's too hard to tell them apart from function calls. Disallow it in table definitions, but allow it in domain definitions... Those two cases are not hard, because in those scenarios the parser knows it is expecting a type specification. The real problem is this syntax for typed literals: typename 'string' which occurs in ordinary expressions. So when you see name( you aren't real sure if you're seeing the start of a function call or the start of a typed-literal construct. And it's very hard to postpone that decision until you see what comes after the right paren. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostGIS Integration
Christopher Kings-Lynne [EMAIL PROTECTED] writes: Those two cases are not hard, because in those scenarios the parser knows it is expecting a type specification. The real problem is this syntax for typed literals: typename 'string' Just disallow that particular case for custom types :P Well, maybe we could --- comments? Tom Lockhart went to some lengths to support that, but now that he's gafiated we could perhaps rethink it. AFAICS the SQL spec only requires this syntax for certain built-in types. Tom wanted to generalize that to all datatypes that Postgres supports, and that seems like a reasonable goal ... but if it prevents getting to other reasonable goals then we ought to think twice. Will this work: 'string'::typename Yes, since the :: cues the parser to expect a typename next. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Recursive queries?
Christopher Kings-Lynne wrote: There is a website somewhere where a guy posts his patch he is maintaining that does it. I'll try to find it... Found it. Check it out: http://gppl.terminal.ru/index.eng.html Patch is current for 7.4, Oracle syntax. Chris I had a look at the patch. It is still in development but it seems to work nicely - at least I have been able to get the same results with Oracle. I will try it with a lot of data this afternoon so that we can compare Oracle vs. Pg performance. I expect horrible results ;). Does this patch have a serious chance to make it into Pg some day? I think Oracle's syntax is not perfect but is easy to handle and many people are used to it. In people's mind recursive queries = CONNECT BY and many people (like me) miss it sadly. If this patch has a serious chance I'd like to do some investigation and some real-world data testing. Regards, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR Dead horse?
I would like to join this effort too. I was afraid that people at RedHat are already halfway though and were to release their work shortly. But it does not seem to be the case. Regards, Nicolai -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Koichi Suzuki Sent: Wednesday, February 04, 2004 11:25 AM To: Tatsuo Ishii Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [HACKERS] PITR Dead horse? Hi, This is Suzuki from NTT DATA Intellilink. I told Bruce Momjan that I and my colleagues are interested in implementing PITR in BOF in NY LW2004. NTT's laboratory is very interested in this issue and I'm planning to work with them. I hope we could cooperate. Tatsuo Ishii wrote: Has this been beaten to death now? Just curious if PITR was in Dev tree yet. Been out of the loop. TIA. I and my co workers are very interested in implementing PITR. We will tackle this for 7.5 if no one objects. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Recursive queries?
I haven't had any problems with it so far, although I haven't really stressed it yet. I was going to make this very plea... I agree that the syntax can probably be improved, but its familiar to those of us unfortunate enough to have used (or still have to use) Oracle. I imagine that bringing it more in line with any standard would be what people would prefer. On Feb 4, 2004, at 5:28 AM, Hans-Jürgen Schönig wrote: Christopher Kings-Lynne wrote: There is a website somewhere where a guy posts his patch he is maintaining that does it. I'll try to find it... Found it. Check it out: http://gppl.terminal.ru/index.eng.html Patch is current for 7.4, Oracle syntax. Chris I had a look at the patch. It is still in development but it seems to work nicely - at least I have been able to get the same results with Oracle. I will try it with a lot of data this afternoon so that we can compare Oracle vs. Pg performance. I expect horrible results ;). Does this patch have a serious chance to make it into Pg some day? I think Oracle's syntax is not perfect but is easy to handle and many people are used to it. In people's mind recursive queries = CONNECT BY and many people (like me) miss it sadly. If this patch has a serious chance I'd like to do some investigation and some real-world data testing. Regards, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Andrew Rawnsley President The Ravensfield Digital Resource Group, Ltd. (740) 587-0114 www.ravensfield.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] array surprising behavior
Hi, I think this is most surprising behavior -- shouldn't the UPDATE raise an error? alvherre=# create table foo (a int[]); CREATE TABLE alvherre=# insert into foo values (null); INSERT 33649 1 alvherre=# update foo set a[3] = '42'; UPDATE 1 alvherre=# select a, a is null from foo; a | ?column? ---+-- | t (1 fila) alvherre=# select version(); version --- PostgreSQL 7.5devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.1 (Mandrake Linux 9.2 3.3.1-2mdk) (1 fila) (This is CVS tip as of a couple weeks ago) -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) El destino baraja y nosotros jugamos (A. Schopenhauer) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Recursive queries?
Clinging to sanity, [EMAIL PROTECTED] (Andrew Rawnsley) mumbled into her beard: I haven't had any problems with it so far, although I haven't really stressed it yet. I was going to make this very plea... I agree that the syntax can probably be improved, but its familiar to those of us unfortunate enough to have used (or still have to use) Oracle. I imagine that bringing it more in line with any standard would be what people would prefer. The SQL:1999 form is instead of the form with recquery (a,b,c,d) as (select a1,b1,c1,d1 from some table where d1 21) select * from recquery; Notice that I have indented this in the same way a Lisp programmer would indent a LET form... (let ((a value-for-a) (b value-for-b) (c compute-c) (d 42)) ;;; The ultimate answer... (compute-something-with-values a b c d)) In ML, there is an analagous let/in construct: #let a = 1 and b = 2 and c = 3 in a + b * c;; - : int = 7 That example is oversimplified, a bit, as it does not do anything recursive. In order to express a recursive relationship, the query likely needs to have a UNION ALL, and look more like the following: with recquery (a,b,c,d) as (select a,b,c,d from base_table root -- Root level entries where c 200 union all select child.a,child.b,child.c,child.d from recquery parent, base_table child -- Self-reference here where parent.a = child.b -- The link between nodes... and c 200) select a,b,c,d from recquery; The fact that the form of this resembles that of the Lisp/ML let forms means that WITH can be useful in structuring queries as well. For instance, supposing you're computing a value that gets used several times, putting it into a WITH clause might allow evading the need to compute it more than once. with notrec (radius, pi, month) as (select radius, 3.1412, date_trunc('month', txn_date) from pie_table) select month, sum(pi * radius * radius as area), count(*) from not_rec where month between '2003-01-01' and '2004-01-01' group by month; has some 'elegance' by virtue of only using date_trunc once over select date_trunc('month', txn_date), sum(3.1412 * radius*radius) as area, count(*) from pie_table where date_trunc('month', txn_date) between '2003-01-01' and '2004-01-01' group by month; Admittedly, date_trunc() may not be an ideal example, as the date constraint would work as well with an untruncated date the point is that in the no-WITH approach, there is an extra use of date_trunc(). But the recomputation that takes place when a functional value is used both in the result clause and in the WHERE clause is something that WITH can eliminate. -- aa454,@,freenet.carleton.ca http://www.ntlug.org/~cbbrowne/emacs.html Lisp Users: Due to the holiday next Monday, there will be no garbage collection. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR Dead horse?
On Wed, 4 Feb 2004, Tatsuo Ishii wrote: I and some other developers are also interested in. Do you think we can work together? Sure. Why not. I think it would be practical to decide who is the leader of this project, though. Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Recursive queries?
On Wed, 2004-02-04 at 05:28, Hans-Jürgen Schönig wrote: Christopher Kings-Lynne wrote: There is a website somewhere where a guy posts his patch he is maintaining that does it. I'll try to find it... Found it. Check it out: http://gppl.terminal.ru/index.eng.html Patch is current for 7.4, Oracle syntax. Chris I had a look at the patch. It is still in development but it seems to work nicely - at least I have been able to get the same results with Oracle. I will try it with a lot of data this afternoon so that we can compare Oracle vs. Pg performance. I expect horrible results ;). Does this patch have a serious chance to make it into Pg some day? I think Oracle's syntax is not perfect but is easy to handle and many people are used to it. In people's mind recursive queries = CONNECT BY and many people (like me) miss it sadly. If this patch has a serious chance I'd like to do some investigation and some real-world data testing. Seems it has no chance of getting in as it is GPL'd code... so step one would be convincing him to relicense it. As a side note, I thought Joe Conway also had an implementation of this... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Recursive queries?
=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes: Does this patch have a serious chance to make it into Pg some day? I think Oracle's syntax is not perfect but is easy to handle and many people are used to it. In people's mind recursive queries = CONNECT BY and many people (like me) miss it sadly. I would prefer to see us supporting the SQL-standard syntax (WITH etc), as it is (1) standard and (2) more flexible than CONNECT BY. The Red Hat work mentioned earlier in the thread was aimed at supporting the standard syntax. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] PITR Dead horse?
Marc G. Fournier [EMAIL PROTECTED] writes: Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? In terms of the amount of code to be written, I expect it's larger than the win32 porting effort. And it should be mostly pretty separate from hacking the core backend, since most of what remains to do is writing external management utilities (I think). I've been dissatisfied with having the separate pgsql-hackers-win32 list; I feel it just fragments the discussion, and people tend to end up crossposting to -hackers anyway. But a separate list for PITR work might be a good idea despite that experience, since it seems like it'd be a more separable project. Any other opinions out there? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Recursive queries?
Robert Treat kirjutas K, 04.02.2004 kell 16:55: Seems it has no chance of getting in as it is GPL'd code... so step one would be convincing him to relicense it. As a side note, I thought Joe Conway also had an implementation of this... IIRC Joe Conway had the simple join-by-parent-id variant done as set-returning function. --- Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Recursive queries?
Tom Lane kirjutas K, 04.02.2004 kell 06:04: Christopher Kings-Lynne [EMAIL PROTECTED] writes: Wasn't there some guy at RedHat doing it? Andrew Overholt did some work on SQL99 recursive queries, but went back to university without having gotten to the point where it actually worked. One of the many things on my to-do list is to pick up and finish Andrew's work on this. If someone has time to work on it, let me know and I'll try to get what he had over to you. I attach my early attempts at doing the same. I also sent this to Andrew while he was working on it, and he claimed that his version was similar. I think he sent me a slightly more advanced verion of this, but I'm currently unable to lovcate it. This has mainly the syntax part (without recursion control IIRC) and some initial documentation (in python's StructuredText and html formats) If I find Andrews variant I'll try to post it too. - Hannu pg-with-recursive.tar.gz Description: application/compressed-tar ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Recursive queries?
Christopher Browne kirjutas K, 04.02.2004 kell 15:10: The fact that the form of this resembles that of the Lisp/ML let forms means that WITH can be useful in structuring queries as well. For instance, supposing you're computing a value that gets used several times, putting it into a WITH clause might allow evading the need to compute it more than once. The main difference between giving the subquery in WITH and in FROM, is that the subqueries given in FROM claues don't see each other, while the ones given in WITH see the ones preceeding them and the ones in WITH RECURSIVE see all queries in the WITH RECURSIVE clause. -- Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostGIS Integration
tgl wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: Those two cases are not hard, because in those scenarios the parser knows it is expecting a type specification. The real problem is this syntax for typed literals: typename 'string' Just disallow that particular case for custom types :P Well, maybe we could --- comments? Tom Lockhart went to some lengths to support that, but now that he's gafiated we could perhaps rethink it. AFAICS the SQL spec only requires this syntax for certain built-in types. Tom wanted to generalize that to all datatypes that Postgres supports, and that seems like a reasonable goal ... but if it prevents getting to other reasonable goals then we ought to think twice. If it's not for SQL conformance I don't think we really need to generalize that. As far as there are other means to gain the same result... 'string'::type(parameter) can be the general postgres version. while varchar(2) 'string' can be the standard SQL version (not general). --strk; Will this work: 'string'::typename Yes, since the :: cues the parser to expect a typename next. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Recursive queries?
Tom Lane wrote: =?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes: Does this patch have a serious chance to make it into Pg some day? I think Oracle's syntax is not perfect but is easy to handle and many people are used to it. In people's mind recursive queries = CONNECT BY and many people (like me) miss it sadly. I would prefer to see us supporting the SQL-standard syntax (WITH etc), as it is (1) standard and (2) more flexible than CONNECT BY. The Red Hat work mentioned earlier in the thread was aimed at supporting the standard syntax. regards, tom lane I have already expected an answer like that. In my very personal opinion (don't cut my head off) I'd vote for both syntaxes. The reasons for that are fairly easy to explain: - I have to agree with Tom (1, 2). - CONNECT BY makes sense because it is easier to build applications supporting Oracle and PostgreSQL. In case of more complex applications (CONNECT BY is definitely more than pure storage of simple data) Oracle-Pg compliance is really important (I have seen that a dozen times). From a marketing point of view both versions make sense - Oracle-Pg migration is an increasing market share. From a technical point of view I completely agree with Tom (I have learned in the past that Tom us usually right). Regards, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Recursive queries?
=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes: In my very personal opinion (don't cut my head off) I'd vote for both syntaxes. I'm not opposed to that, although it would be a good idea to check that Oracle doesn't have some patent covering their syntax. However, if we go for that then what we probably want to do is implement the SQL-spec syntax and then add something to translate the Oracle syntax into a SQL parsetree. We shouldn't need two implementations in the guts of the system, and I'd expect that translating in the other direction (SQL WITH to an Oracle internal implementation) wouldn't work, because WITH does more. I dunno whether the patch mentioned earlier in this thread could serve as a starting point for that or not. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
I am concerned that the bgwriter will not be able to keep up with the I/O generated by even a single backend restoring a database, let alone a busy system. To me, the write() performed by the bgwriter, because it is I/O, will typically be the bottleneck on any system that is I/O bound (especially as the kernel buffers fill) and will not be able to keep up with active backends now freed from writes. The idea to fallback when the bgwriter can not keep up is to have the backends sync the data, which seems like it would just slow down an I/O-bound system further. I talked to Magnus about this, and we considered various ideas, but could not come up with a clean way of having the backends communicate to the bgwriter about their own non-sync writes. We had the ideas of using shared memory or a socket, but these seemed like choke-points. Here is my new idea. (I will keep throwing out ideas until I hit on a good one.) The bgwriter it going to have to check before every write to determine if the file is already recorded as needing fsync during checkpoint. My idea is to have that checking happen during the bgwriter buffer scan, rather than at write time. if we add a shared memory boolean for each buffer, backends needing to write buffers can writer buffers already recorded as safe to write by the bgwriter scanner. I don't think the bgwriter is going to be able to keep up with I/O bound backends, but I do think it can scan and set those booleans fast enough for the backends to then perform the writes. (We might need a separate bgwriter thread to do this or a separate process.) As I remember, our new queue system has a list of buffers that are most likely to be replaced, so the bgwriter can scan those first and make sure they have their booleans set. There is an issue that these booleans are set without locking, so there might need to be a double-check of them by backends, first before the write, then after just before they replace the buffer. The bgwriter would clear the bits before the checkpoint starts. Now that no one is ill from my fsync buffer boolean idea, let me give some implementation details. :-) First, we need to add a bit to each shared buffer descriptor (sbufdesc) that indicates whether the background writer (bwwriter) has recorded the file associated with the buffer as needing fsync. This bit will be set only by the background writer, usually during its normal buffer scan looking for buffers to write. The background writer doesn't write all dirty buffers on each buffer pass, but it could record the buffers that need fsync on each pass, allowing backends to write those buffers if buffer space becomes limited. (Not sure but perhaps the buffer bit set could be done with only a shared lock on the buffer because no one else sets the bit.) (One idea would be to move the fsync bit into its own byte in shared memory so it is more centralized and no locking is required to set the bit. Also, should we have one byte per shared buffer to indicate dirty buffers so the bwwriter can fine them more efficiently?) The bit can be cleared if either the background writer writes the page, or a backend writes the page. Right now, the checkpoint process writes out all dirty buffers. We might need to change this so the background writer does this because only it can record files needing fsync. During checkpoint, the background writer should write out all buffers. It will not be recording any new fsync bits during this scan because it is writing every dirty buffer. (If it did do this, it could mark an fsync bit that was written only during or after the fsync it performs later.) Once it is done, it should move the hash of files needing fsync to a backup pointer and create a new empty list and do a scan so backends can do writes. A subprocess should do fsync of all files, either using fork() and having the child read the saved pointer hash, or for EXEC_BACKEND, write a temp file that the child can read. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] array surprising behavior
Alvaro Herrera wrote: I think this is most surprising behavior -- shouldn't the UPDATE raise an error? Surprising, but not new (probably has been there back to the Berkley code), and has come up before on one of the lists (I think it might even have been pgsql-bugs). regression=# select version(); version PostgreSQL 7.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5) (1 row) regression=# create table foo (a int[]); CREATE TABLE regression=# insert into foo values (null); INSERT 1104092 1 regression=# update foo set a[3] = '42'; UPDATE 1 regression=# select a, a is null from foo; a | ?column? ---+-- | t (1 row) I'm still hoping to scrounge up the time to continue working on arrays for 7.5, including figuring out how to deal with this. Joe ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] implemented missing bitSetBit() and bitGetBit()
David Helgason [EMAIL PROTECTED] writes: I needed these, so I went and implemented them myself. I didn't see any followup to this: do we want to include this in the main tree, contrib/, or not at all? -Neil (who has no opinion on the matter, but just wants to make sure this doesn't fall through the cracks) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] implemented missing bitSetBit() and bitGetBit()
Bit sets are remarkably useful functionality. I suggest putting it into the core product. -Original Message- From: Neil Conway [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 04, 2004 11:22 AM To: David Helgason Cc: [EMAIL PROTECTED] Subject: Re: [HACKERS] implemented missing bitSetBit() and bitGetBit() David Helgason [EMAIL PROTECTED] writes: I needed these, so I went and implemented them myself. I didn't see any followup to this: do we want to include this in the main tree, contrib/, or not at all? -Neil (who has no opinion on the matter, but just wants to make sure this doesn't fall through the cracks) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Question on database backup
We have customers who prefer to use their backup facilities instead of what we provide in the app (we use pg_dump) I hear speed is at least one consideration. The questions I need to answer are these: 1) Is this absolutely safe to do file copy (cpio, or smth. else, whatever the robust backup app. would use) on the Postgres db, when it's completely shut down. 2) Same question, but the database is up and running in read-only mode. We're making sure that no updates are taking place. If it matters - this is on Solaris, HP, Linux. We're using v.7.3.2 running on UDS and v.7.3.4 on TCP/IP We provide no explicit settings for wal, fsync and the like. And (yes, I know) they often install it on NFS. Thank you much. Mike. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] implemented missing bitSetBit() and bitGetBit()
Neil Conway wrote: David Helgason [EMAIL PROTECTED] writes: I needed these, so I went and implemented them myself. I didn't see any followup to this: do we want to include this in the main tree, contrib/, or not at all? getbit sounds a lot like what substring() does. So perhaps setbit could actually be handled by replace()? That would be a more general solution (since it would handle more than one bit at a time). ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] implemented missing bitSetBit() and bitGetBit()
On 4. feb 2004, at 20:51, Peter Eisentraut wrote: Neil Conway wrote: David Helgason [EMAIL PROTECTED] writes: I needed these, so I went and implemented them myself. I didn't see any followup to this: do we want to include this in the main tree, contrib/, or not at all? getbit sounds a lot like what substring() does. So perhaps setbit could actually be handled by replace()? That would be a more general solution (since it would handle more than one bit at a time). I sort of agree, but it's currently documented like I implemented it (afaics), so it's a simple thing to include. I feel a bit bad for not having done a full patch with test-cases and .bki modifications etc., but it seemed a pretty daunting task (for my schedule at least). Hope someone can use it though. David Helgason, Over the Edge Entertainments ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Question on database backup
On Wed, 4 Feb 2004, Michael Brusser wrote: We have customers who prefer to use their backup facilities instead of what we provide in the app (we use pg_dump) I hear speed is at least one consideration. The questions I need to answer are these: 1) Is this absolutely safe to do file copy (cpio, or smth. else, whatever the robust backup app. would use) on the Postgres db, when it's completely shut down. Yes, it is. 2) Same question, but the database is up and running in read-only mode. We're making sure that no updates are taking place. Most likely, it is. No guarantees if the database is up and running, even if you're certain there are no updates happening. Also, you can use a snapshotting file system to make a backup image and then back up the image, while the database is up and being accessed, both by readers and writers, assuming you get REAL snapshots. If it matters - this is on Solaris, HP, Linux. We're using v.7.3.2 running on UDS and v.7.3.4 on TCP/IP We provide no explicit settings for wal, fsync and the like. And (yes, I know) they often install it on NFS. OK, here's a couple more issues to chew on as well. Suppose you have a backup, and the database server was compiled with UNKNOWN switches. The machine burns to the ground. now you get to try and figure out how to compile the database on the new server so it can read the old dataset. This may or may not be a complete friggin' nightmare for you. Dumps can move between versions / hardware configs / differently compiled versions of postgresql with some amount of reliability. binary copies, may or may not move so easily. Scenario II, the HP burns to the ground, and your boss just buys a big old intel box. how do you get your data up and running with a binary backup? you don't. Scenario III. Subtle corruption gets into your dataset due to a bad block or what not. No one notices for a while. Suddenly, someone notices. With only file system backups, with no error messages in them, how do you determine when the corruption occurred and get the uncorrupt data out leaving the corrupted behind? Plain and simple. Postgresql is designed to be backed up by pg_dump. Using anything else isn't supported so to speak, and may cause you untold grief in the future. That said, sometimes file backups are the perfect solution, just go into with your eyes open to the possible problems, and I'd make a pg_dump every so often just in case. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: Instead, have each backend maintain its own separate list in shared memory. The only readers of a given list would be the backend it belongs to and the bgwriter, and the only time bgwriter attempts to read the list is at checkpoint time. The sum total size of all the lists shouldn't be that much larger than it would be if you maintained it as a global list. I fear that is just wishful thinking. Consider the system catalogs as a counterexample of files that are likely to be touched/modified by many different backends. Oh, I'm not arguing that there won't be a set of files touched by a lot of backends, just that the number of such files is likely to be relatively small -- a few tens of files, perhaps. But that admittedly can add up fast. But see below. The bigger problem though with this is that it makes the problem of list overflow much worse. The hard part about shared memory management is not so much that the available space is small, as that the available space is fixed --- we can't easily change it after postmaster start. The more finely you slice your workspace, the more likely it becomes that one particular part will run out of space. So the inefficient case where a backend isn't able to insert something into the appropriate list will become considerably more of a factor. Well, running out of space in the list isn't that much of a problem. If the backends run out of list space (and the max size of the list could be a configurable thing, either as a percentage of shared memory or as an absolute size), then all that happens is that the background writer might end up fsync()ing some files that have already been fsync()ed. But that's not that big of a deal -- the fact they've already been fsync()ed means that there shouldn't be any data in the kernel buffers left to write to disk, so subsequent fsync()s should return quickly. How quickly depends on the individual kernel's implementation of the dirty buffer list as it relates to file descriptors. Perhaps a better way to do it would be to store the list of all the relfilenodes of everything in pg_class, with a flag for each indicating whether or not an fsync() of the file needs to take place. When anything writes to a file without O_SYNC or a trailing fsync(), it sets the flag for the relfilenode of what it's writing. Then at checkpoint time, the bgwriter can scan the list and fsync() everything that has been flagged. The relfilenode list should be relatively small in size: at most 16 bytes per item (and that on a 64-bit machine). A database that has 4096 file objects would have a 64K list at most. Not bad. Because each database backend can only see the class objects associated with the database it's connected to or the global objects (if there's a way to see all objects I'd like to know about it, but pg_class only shows objects in the current database or objects which are visible to all databases), the relfilenode list might have to be broken up into one list per database, with perhaps a separate list for global objects. The interesting question in that situation is how to handle object creation and removal, which should be a relatively rare occurrance (fortunately), so it supposedly doesn't have to be all that efficient. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Beta freeze? (was Re: [HACKERS] array surprising behavior)
On Wed, Feb 04, 2004 at 11:06:29AM -0800, Joe Conway wrote: Alvaro Herrera wrote: I think this is most surprising behavior -- shouldn't the UPDATE raise an error? Surprising, but not new (probably has been there back to the Berkley code), and has come up before on one of the lists (I think it might even have been pgsql-bugs). Too bad :-( I'm still hoping to scrounge up the time to continue working on arrays for 7.5, including figuring out how to deal with this. I have the same hope, though I know for sure that I won't have any time to work on anything until March, and then I will probably devote most of my time to Pg. When is beta freeze supposed to happen? -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Siempre hay que alimentar a los dioses, aunque la tierra esté seca (Orual) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: The more finely you slice your workspace, the more likely it becomes that one particular part will run out of space. So the inefficient case where a backend isn't able to insert something into the appropriate list will become considerably more of a factor. Well, running out of space in the list isn't that much of a problem. If the backends run out of list space (and the max size of the list could be a configurable thing, either as a percentage of shared memory or as an absolute size), then all that happens is that the background writer might end up fsync()ing some files that have already been fsync()ed. But that's not that big of a deal -- the fact they've already been fsync()ed means that there shouldn't be any data in the kernel buffers left to write to disk, so subsequent fsync()s should return quickly. Yes, it's a big deal. You're arguing as though the bgwriter is the thing that needs to be fast, when actually what we care about is the backends being fast. If the bgwriter isn't doing the vast bulk of the writing (and especially the fsync waits) then we are wasting our time having one at all. So we need a scheme that makes it as unlikely as possible that backends will have to do their own fsyncs. Small per-backend fsync lists aren't the way to do that. Perhaps a better way to do it would be to store the list of all the relfilenodes of everything in pg_class, with a flag for each indicating whether or not an fsync() of the file needs to take place. You're forgetting that we have a fixed-size workspace to do this in ... and no way to know at postmaster start how many relations there are in any of our databases, let alone predict how many there might be later on. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR Dead horse?
Tom Lane wrote Marc G. Fournier [EMAIL PROTECTED] writes: Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? In terms of the amount of code to be written, I expect it's larger than the win32 porting effort. And it should be mostly pretty separate from hacking the core backend, since most of what remains to do is writing external management utilities (I think). Yes it is! I'd like to start the discussion about PITR and try to go through some functional requirements and how those might be implemented. The Win32 port has a self-evident set of functional requirements; I'm not sure that the PITR stuff is as clear - so I couldn't pass any judgement at all (even if I did know the code well enough) on how big a coding task that is, but I can see that the analysis and discussion is large indeed. I've been dissatisfied with having the separate pgsql-hackers-win32 list; I feel it just fragments the discussion, and people tend to end up crossposting to -hackers anyway. But a separate list for PITR work might be a good idea despite that experience, since it seems like it'd be a more separable project. I'd vote for a new list dedicated to discussing Robustness issues, such as PITR and the fsync/sync issues. IMHO, PostgreSQL has the Functionality and Performance, it just needs some rock-solid analysis of where-things-can-go-wrong with it, so that the business data centre people will be able to use it with absolute confidence...even if the answer is we've got every base covered. For me, the issues about robustness are as much to do with risk reduction and confidence building as they are about specific features in that area. [Wow, I expect some flames on those comments!] The list probably would remain clearly differentiated, in the same way [Performance] covers lots of areas not discussed in [Hackers]. Not hung up on the name either, just something that indicates breadth-of-scope, e.g. Availability or Data Protection or Resilience etc..; maybe the Advocates would like to name it? It might even be a press-release: PostgreSQL community focuses new efforts towards Robustness features for its next major release. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR Dead horse?
Simon Riggs [EMAIL PROTECTED] writes: I'd vote for a new list dedicated to discussing Robustness issues, such as PITR and the fsync/sync issues. IMHO, PostgreSQL has the Functionality and Performance, it just needs some rock-solid analysis of where-things-can-go-wrong with it, so that the business data centre people will be able to use it with absolute confidence...even if the answer is we've got every base covered. For me, the issues about robustness are as much to do with risk reduction and confidence building as they are about specific features in that area. [Wow, I expect some flames on those comments!] You're right. Exactly where do you expect to find the expertise and interest to do such an analysis? On pghackers, that's where. There's no reason to invent a new mailing list for what should be a continuing topic in pghackers. And to the extent that you were to move such a discussion somewhere else, you'd just risk losing the attention of the pair of eyeballs that might notice a hole in your analysis. Not hung up on the name either, just something that indicates breadth-of-scope, e.g. Availability or Data Protection or Resilience etc..; maybe the Advocates would like to name it? It might even be a press-release: PostgreSQL community focuses new efforts towards Robustness features for its next major release. I think such a press release would be counterproductive, as it would immediately make people question whether we have reliability problems. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: Beta freeze? (was Re: [HACKERS] array surprising behavior)
Alvaro Herrera [EMAIL PROTECTED] writes: When is beta freeze supposed to happen? AFAIK, no date has been set at all. I doubt we'll even think about it until we see how the Windows port effort goes. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] PITR Dead horse?
Totally agree. Robustness and rock-solidness are the only things missing for PostgreSQL to become the killer of certain commercial enterprise databases out there. And the only thing that is missing in this respect is PITR. PITR should be there INGRES had it in '84 and some people as why PostgreSQL does not have it. I am well versed in the internals of PITR features of a certain leading enterprise-class database out there. And would like to contribute (write code) to this effort as much as I can. Best regards, Nicolai Tufar -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Simon Riggs Sent: Thursday, February 05, 2004 1:33 AM To: 'Tom Lane'; 'Marc G. Fournier' Cc: 'Tatsuo Ishii'; [EMAIL PROTECTED]; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED] Subject: Re: [HACKERS] PITR Dead horse? Tom Lane wrote Marc G. Fournier [EMAIL PROTECTED] writes: Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? In terms of the amount of code to be written, I expect it's larger than the win32 porting effort. And it should be mostly pretty separate from hacking the core backend, since most of what remains to do is writing external management utilities (I think). Yes it is! I'd like to start the discussion about PITR and try to go through some functional requirements and how those might be implemented. The Win32 port has a self-evident set of functional requirements; I'm not sure that the PITR stuff is as clear - so I couldn't pass any judgement at all (even if I did know the code well enough) on how big a coding task that is, but I can see that the analysis and discussion is large indeed. I've been dissatisfied with having the separate pgsql-hackers-win32 list; I feel it just fragments the discussion, and people tend to end up crossposting to -hackers anyway. But a separate list for PITR work might be a good idea despite that experience, since it seems like it'd be a more separable project. I'd vote for a new list dedicated to discussing Robustness issues, such as PITR and the fsync/sync issues. IMHO, PostgreSQL has the Functionality and Performance, it just needs some rock-solid analysis of where-things-can-go-wrong with it, so that the business data centre people will be able to use it with absolute confidence...even if the answer is we've got every base covered. For me, the issues about robustness are as much to do with risk reduction and confidence building as they are about specific features in that area. [Wow, I expect some flames on those comments!] The list probably would remain clearly differentiated, in the same way [Performance] covers lots of areas not discussed in [Hackers]. Not hung up on the name either, just something that indicates breadth-of-scope, e.g. Availability or Data Protection or Resilience etc..; maybe the Advocates would like to name it? It might even be a press-release: PostgreSQL community focuses new efforts towards Robustness features for its next major release. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] COPY from question
Hi Kevin, On Tue, 3 Feb 2004, Kevin Brown wrote: Slavisa Garic wrote: Using pg module in python I am trying to run the COPY command to populate the large table. I am using this to replace the INSERT which takes about few hours to add 7 entries where copy takes minute and a half. That difference in speed seems quite large. Too large. Are you batching your INSERTs into transactions (you should be in order to get good performance)? Do you have a ton of indexes on the table? Does it have triggers on it or some other thing (if so then COPY may well wind up doing the wrong thing since the triggers won't fire for the rows it inserts)? I don't know what kind of schema you're using, but it takes perhaps a couple of hours to insert 2.5 million rows on my system. But the rows in my schema may be much smaller than yours. You are right about the indexes. There is quite a few of them (5-6 without looking at the schema). The problem is that I do need those indexes as I have a lot of SELECTs on that table and inserts are only happening once. You are also right about the rows (i think) as I have about 15-20 columns. This could be split into few other table and it used to be but I have merged them because of the requirement for the faster SELECTs. With the current schema there most of my modules that access the database are not required to do expensive JOINs as they used to. Because faster SELECTs are more important to me then faster INSERTs I had to do this. THis wasn't a problem for me until I have started creating experiments which had more than 20 thousand jobs which translates to 20 thousand rows in this big table. I do batch INSERTs into one big transaction (1000 rows at a time). While i did get some improvement compared to the single transaction per insert it was still not fast enough (well not for me :) ). Could you please elaborate on the triggers? I have no idea what kind of triggers there are in PGSQL or relational databases. With regards to my problem, I did solve it by piping the data into the COPY stdin. Now I have about 75000 rows inserted in 40 seconds which is extremely good for me. Thank you for your help, Regards, Slavisa -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR Dead horse?
Simon, I'd vote for a new list dedicated to discussing Robustness issues, such as PITR and the fsync/sync issues. snip The list probably would remain clearly differentiated, in the same way [Performance] covers lots of areas not discussed in [Hackers]. Actually, Simon, you're welcome to bring this discussion over to PERFORMANCE. We discuss scalability and HA on Performance frequently, and I don't feel that the discussion you refer to would be out of place. But Tom is right that you need the feedback of a lot of the people on Hackers once you start discussing a code solution, so there's not much point in starting a new mailing list that all the same people need to subscribe to. Certainly Jan had enough trouble getting meaningful feedback on the sync issue here; on his own list he'd still be talking to himself. As far as promoting an image of reliability, that belongs on Advocacy. The image and the reality don't sync much; we're already about 500% more reliable than MS SQL Server but ask any ten CIOs what they think? That's just marketing. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster