Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote: I think this is a mistake -- the same mistake that got us into trouble with Turkish. Hashing depends on the concept of equality which is integral to the type. Two things are either the same or they aren't, and that can't change based on context. So someone who wants a case-insensetive search actually doesn't want Foo to equal foo? If you're arguing that that should be a different type, well, that's a possibility. But does that mean someone who wants an accent insensetive match also needs a new type? And a phonebook match, where Mc and Mac are the same? It was my understanding that the problem with Turkish/Hungarian was the we only allow one collation for strings over the whole database. The point is that in the future you will be able to select this on a per column/index/query basis, so we don't need to stick to such a restriction if the user explicitly asks to ignore it. On a more practical level, a Hash Join needs to produce the same results as a Merge Join, so if (a = b) then (hash(a) = hash(b)). So if the user types (a = b COLLATE ignorecase) then the hash function needs to change to match. Specifically in the case of strings, two strings should only be considered equal if they consist of the exact same series of characters. (That is, they could be encoded differently but they have to encode the same actual characters.) That they happen to sort equally compared to all other strings doesn't mean that they're equal. Sure, for straight strings (COLLATE POSIX), that's absolutly a requirement. But people also have other requirements, like treating strings case-insensetively. I don't think we should restrict ourselves to not being able to support their wishes. You do bring up the possibility of secondary sort functions. Functions which are not involved in testing for equality, but provide addition sorting so that even in a case-insensetive sort, the different variations in case appear together. All variations are equal, but some are more equal than others type setup. Thanks for the feedback, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgp1xpk5wgBqf.pgp Description: PGP signature
Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
Bruce Momjian wrote: The --single-transaction mode would apply even if the dump was created using an earlier version of pg_dump. pg_dump has *not* been altered at all. (And I would again add that the idea was not my own) I assume you mean this: http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php I guess with the ALTER commands I don't see much value in the --single-transaction flag. I am sure others suggested it, but would they suggest it now given our current direction. I just want to add that --single-transaction has a value of it's own. There were times when I wanted to restore parts of a dump all-or-nothing. This is possible with PostgreSQL, unlike many other DBM systems, because people like Tom Lane have invested in ensuring that all DDL is working without implicitly committing an enclosing transaction. Using pg_restore directly into a database, it is not possible to get a single transaction right now. One has to restore to a file and manually added BEGIN/COMMIT. Just for that I think --single-transaction is a great addition and a missing feature. I think more people have a use-case for that. Best Regards, Michael Paesold -- Telefonieren Sie schon oder sparen Sie noch? NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] EINTR error in SunOS
Greg Stark [EMAIL PROTECTED] writes: Qingqing Zhou [EMAIL PROTECTED] writes: I have patched IO routines in backend/storage that POSIX says EINTR is possible except unlink(). Though POSIX says EINTR is not possible, during many regressions, I found it sometimes sets this errno on NFS (I still don't know where is the smoking-gun): Well there is a reason intr is not the default for NFS mounts. It's precisely because it breaks the traditional unix filesystem interface. Yeah. We have looked at this before and decided that trying to defend against it is too invasive and too fragile (how will you ever be sure you've fixed everyplace, or keep other places from sneaking in later?) What I'd rather do is document prominently that running a DB over NFS isn't recommended, and running it over NFS with interrupts allowed is just not going to work. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] EINTR error in SunOS
On Sat, 31 Dec 2005, Tom Lane wrote: What I'd rather do is document prominently that running a DB over NFS isn't recommended, and running it over NFS with interrupts allowed is just not going to work. Agreed. IO syscalls is not the only problem for NFS -- if we can't fix them in a run, then don't do it. Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EINTR error in SunOS
On Sat, 2005-12-31 at 14:40 -0500, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Qingqing Zhou [EMAIL PROTECTED] writes: I have patched IO routines in backend/storage that POSIX says EINTR is possible except unlink(). Though POSIX says EINTR is not possible, during many regressions, I found it sometimes sets this errno on NFS (I still don't know where is the smoking-gun): Well there is a reason intr is not the default for NFS mounts. It's precisely because it breaks the traditional unix filesystem interface. What I'd rather do is document prominently that running a DB over NFS isn't recommended, and running it over NFS with interrupts allowed is just not going to work. Are there issues with having an archive_command which does things with NFS based filesystems? -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout kleptog@svana.org writes: On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote: Two things are either the same or they aren't, and that can't change based on context. So someone who wants a case-insensetive search actually doesn't want Foo to equal foo? That nice simple worldview falls down in other areas as well. An example is zero and minus zero in IEEE math: they are equal for some purposes but not others. I think you really have to say that equality is defined with respect to a particular datatype and a particular set of operators. The example of case-insensitive sorting suggests that we need to assume that sort comparison functions can make finer-grained comparisons than the associated equals operator does. The current infrastructure forces these to be exactly the same, but as long as we're busy reinventing stuff, we could have two comparison functions associated with a btree opclass: one that mimics the operators' behavior and one that makes finer-grained comparisons and defines the actual sort order. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Sat, Dec 31, 2005 at 02:54:18PM -0500, Tom Lane wrote: The example of case-insensitive sorting suggests that we need to assume that sort comparison functions can make finer-grained comparisons than the associated equals operator does. The current infrastructure forces these to be exactly the same, but as long as we're busy reinventing stuff, we could have two comparison functions associated with a btree opclass: one that mimics the operators' behavior and one that makes finer-grained comparisons and defines the actual sort order. Indeed, that's exactly the thought I had this afternoon, distiguish a collation and a comparison function. It's certainly a lot easier to implement than anything else I could think of Have a great New Year everyone, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpLhmlXjIqDY.pgp Description: PGP signature
Re: [HACKERS] EINTR error in SunOS
Qingqing Zhou [EMAIL PROTECTED] writes: On Sat, 31 Dec 2005, Tom Lane wrote: What I'd rather do is document prominently that running a DB over NFS isn't recommended, and running it over NFS with interrupts allowed is just not going to work. Agreed. IO syscalls is not the only problem for NFS -- if we can't fix them in a run, then don't do it. I don't think that's reasonable. The NFS intr option breaks the traditional unix filesystem semantics which breaks a lot of older or naive programs. But that's no reason to decide that Postgres can't handle the new semantics. Handling EINTR after all file system calls doesn't sound like it would be terribly hard. And Postgres of all systems has the infrastructure necessary to handle error conditions, abort and roll back the transaction when a file system error occurs. I think mainly this means it would be possible to hit C-c or shut down postgres (uncleanly) when there's a network outage. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] EINTR error in SunOS
On Sat, 31 Dec 2005, Greg Stark wrote: I don't think that's reasonable. The NFS intr option breaks the traditional unix filesystem semantics which breaks a lot of older or naive programs. But that's no reason to decide that Postgres can't handle the new semantics. Is that by default the EINTR is truned off in NFS? If so, I don't see that will be a problem. Sorry for my limited knowledge, is there any requirements/benefits that people turn on EINTR? Handling EINTR after all file system calls doesn't sound like it would be terribly hard. The problem is not restricted to file system. Actually my patched version(only backend/storage) passed hundreds times of regression without any problem, but EINTR can hurt other syscalls as well. Find out *all* the EINTR situtations may need big efforts AFAICS. Regards, Qingqing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] EINTR error in SunOS
Qingqing Zhou [EMAIL PROTECTED] writes: On Sat, 31 Dec 2005, Greg Stark wrote: I don't think that's reasonable. The NFS intr option breaks the traditional unix filesystem semantics which breaks a lot of older or naive programs. But that's no reason to decide that Postgres can't handle the new semantics. Is that by default the EINTR is truned off in NFS? If so, I don't see that will be a problem. Sorry for my limited knowledge, is there any requirements/benefits that people turn on EINTR? That's why the intr option (and the soft) option has traditionally not been enabled by default in NFS implementations. But many people don't like that when their NFS server disappears their client applications become unkillable. They like to be able to hit C-c and stop whatever is running. In the case of Postgres having intr off on the NFS mount point would mean you couldn't C-c a query stuck because the database is on NFS. Of course it's not like you would be able to run any more queries after that, but you might want your terminal back. You wouldn't even be able to shut down Postgres, even with kill -9. If your NFS server is unrecoverable and you want to bring up a Postgres instance using a backup restored some other place you would have to bring it up on another port or reboot your machine. That's the kind of thing that leads lots of sysadmins to use the intr and soft options. And those sysadmins generally aren't aware of these kinds of consequences since it's more of a programming level issue. Handling EINTR after all file system calls doesn't sound like it would be terribly hard. The problem is not restricted to file system. Actually my patched version(only backend/storage) passed hundreds times of regression without any problem, but EINTR can hurt other syscalls as well. Find out *all* the EINTR situtations may need big efforts AFAICS. Well NFS is only going to affect filesystem calls. If there are other syscalls that can signal EINTR on some obscure platform where Postgres isn't handling it then that's just a run-of-the-mill porting issue. But like I mentioned in the other thread POSIX is of no help here. With the exception of the pthreads syscalls POSIX doesn't prohibit functions from signalling errors other than the ones documented in the specification. So in other words, just about any function can signal just about any error including errors that are proprietary additions any time. Good luck :) -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] EINTR error in SunOS
On Sat, 31 Dec 2005, Greg Stark wrote: Qingqing Zhou [EMAIL PROTECTED] writes: Is that by default the EINTR is truned off in NFS? If so, I don't see that will be a problem. Sorry for my limited knowledge, is there any requirements/benefits that people turn on EINTR? That's why the intr option (and the soft) option has traditionally not been enabled by default in NFS implementations. But many people don't like that when their NFS server disappears their client applications become unkillable. They like to be able to hit C-c and stop whatever is running. Thanks Greg and Martin, I now understand better of intr :-) So we can killed Postgres or not depends on our signal handler. Query Cancel signal won't work because ImmediateInterruptOK forbids it and the retry style code in read/write will put the Postgres process into uninterruptable sleep again. But die signal will work I think. Regards, Qingqing ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] EINTR error in SunOS
Rod Taylor [EMAIL PROTECTED] writes: Are there issues with having an archive_command which does things with NFS based filesystems? Well, whatever command you use for archive_command -- probably just cp if you're using NFS would hang if the NFS server went away. What would happen then might be interesting. If Postgres finds the archive_command hanging indefinitely will it correctly avoid recycling the WAL log indefinitely? I assume so. What's nonoptimal here is that I don't think there would be any warning that anything was wrong until the WAL logs eventually filled up their filesystem and then postgres stopped running. In the meantime your archived WAL logs would be getting older and older and you would have no indication that anything was failing. This was the intention with the NFS error handling. The theory being that eventually the server comes back up and things resume functioning exactly where they left off with no lost operations. The upside is you don't have things failing, then resuming later and unhandled errors in the meantime leading to data corruption. The downside is there's no way for cp and ultimately Postgres to know anything's wrong except to have a timeout itself and an arbitrary maximum amount of time to expect operations to take. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] default resource limits
Andrew Dunstan [EMAIL PROTECTED] writes: In experimenting I needed to set this at 20 for it to bite much. If we wanted to fine tune it I'd be inclined to say that we wanted 20*connections buffers for the first, say, 50 or 100 connections and 10 or 16 times for each connection over that. But that might be getting a little too clever - something we should leave to a specialised tuning tool. After all, we try these in fairly discrete jumps anyway. Maybe a simple factor around 20 would be sufficient. I think 10 is probably a good compromise value. If we set it to 20 we'll find make check failing on Darwin because Apple's standard SHMMAX value doesn't allow more than about 300 buffers ... and the regression tests want max_connections to be at least 20. I noticed while fooling with this on my laptop that initdb was selecting a shared_buffers value less than the value it had just proved would work :-(. This is because the list of shared_buffers values it was probing did not include all the values corresponding to values tried during the max_connections scan. I've added documentation about that gotcha. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] EINTR error in SunOS
EINTR on read() or write() is not unique to NFS. It can happen on many file systems - it is just seen less frequently on most of them. The code should be able to handle ANY valid read() and write() errno. And EINTR is documented on Linux, BSD, Solaris (1 and 2), and POSIX. Even the Linux man pages can return ENTER on read() and write(). This can happen on soft-mirrors, SCSI disks, and SOME other disk drivers when they have errors. The 'intr' option to NFS is not the same as EINTR. It it means 'if the server does not respond for a while, then return an EINTR', just like any other disk read() or write() does when it fails to reply. I have seen lots of open source code that assumes that all disk reads and writs work 100% or fail 100%. Many do not check the return value to see if all data was written or read from disk. And many do not look at errno at all. I have NOT looked to see how postgres does it. If storage/*.c is where the reads occur, it does very LITTLE when checking for errors. Handling EINTR after all file system calls doesn't sound like it would be terribly hard. The problem is not restricted to file system. Actually my patched version(only backend/storage) passed hundreds times of regression without any problem, but EINTR can hurt other syscalls as well. Find out *all* the EINTR situtations may need big efforts AFAICS. Well NFS is only going to affect filesystem calls. If there are other syscalls that can signal EINTR on some obscure platform where Postgres isn't handling it then that's just a run-of-the-mill porting issue. But like I mentioned in the other thread POSIX is of no help here. With the exception of the pthreads syscalls POSIX doesn't prohibit functions from signalling errors other than the ones documented in the specification. So in other words, just about any function can signal just about any error including errors that are proprietary additions any time. Good luck :) -- Doug Royer | http://INET-Consulting.com ---|- We Do Standards - You Need Standards begin:vcard fn:Doug Royer n:Royer;Doug org:INET-Consulting.com adr:;;U.S.A email;internet:[EMAIL PROTECTED] title:CEO tel;work:866-594-8574 tel;fax:866-594-8574 note;quoted-printable:AOL: SupportUnix=0D=0A= MSN: [EMAIL PROTECTED] Yahoo: Help4Unix x-mozilla-html:FALSE url:http://Royer.com version:2.1 end:vcard smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
As a user and a list lurker I very much like Bruce's proposed ALTER TABLE syntax. COPY LOCK (and the variants I can imagine being required for all the other types of cases) don't seem as appealing. And ALTER TABLE seems to make it clear it is an object level change, feels like it fits the internal model of the change better. As a user a quick note that I've really found the documentation strong, which makes a big difference. Probably a low glory thing but much appreciated. Big props everyone I see posting for keeping an eye (and code) on the various optimizations, a lot of the use cases are ones I can relate too (beyond even the ETL ones). I'm always impressed at the range of areas folks are looking at improving, and hope to see it encouraged, even with something as useless as thanks :) To a good new year... and a great past one. - August ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EINTR error in SunOS
Doug Royer [EMAIL PROTECTED] writes: The 'intr' option to NFS is not the same as EINTR. It it means 'if the server does not respond for a while, then return an EINTR', just like any other disk read() or write() does when it fails to reply. No, you're thinking of 'soft'. 'intr' (which is actually a modifier to the 'hard' setting) causes the I/O to hang until the server comes back or the process gets a signal (in which case EINTR is returned). -Doug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] EINTR error in SunOS
Greg Stark [EMAIL PROTECTED] wrote Well NFS is only going to affect filesystem calls. If there are other syscalls that can signal EINTR on some obscure platform where Postgres isn't handling it then that's just a run-of-the-mill porting issue. Ok, NFS just affects filesystem calls(I mix it with another problem). If possible, I hope we can draw some conclusion / schetch a fix plan here for future developers who want to come up with a patch. The question is: Where and how should we fix exactly in order to incorporate intr NFS in server side? More details we write down here, more feasible/infeasible plan we can get. I could think of these places: + direct file system calls - open() family, fopen() family in backend/storage - scattered open() etc in the whole backend (seems unlink is with biggest problem) The problem of above is if a signal sneaks in, these syscalls will fail. With a retry, we can fix it. + indirect file system calls - system(xxx) calls, xxx = cp, etc. If intr NFS is enabled, what's the problem exactly? Any others? Regards, Qingqing ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] EINTR error in SunOS
Qingqing Zhou [EMAIL PROTECTED] writes: The problem of above is if a signal sneaks in, these syscalls will fail. With a retry, we can fix it. It's a bit stickier than that but only a bit. If you just retry then you're saying users have to use kill -9 to get away from the situation. For some filesystem operations that may be the best we can do. But for most it ought to be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c or kill -1 normally. Even having the single backend exit (to avoid file resource leaks) is nicer than having to restart the entire instance. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On 2005-12-31, Martijn van Oosterhout kleptog@svana.org wrote: You do bring up the possibility of secondary sort functions. Functions which are not involved in testing for equality, but provide addition sorting so that even in a case-insensetive sort, the different variations in case appear together. All variations are equal, but some are more equal than others type setup. Doesn't this result in incorrect output in multi-column sorts? i.e. if 'Foo' = 'foo', but for sorting purposes you always sort them with 'Foo' first, then a multicolumn sort of the following data: ('Foo',1) ('foo',2) ('Foo',3) would produce the wrong output, no? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] EINTR error in SunOS
On Sun, 1 Jan 2006, Greg Stark wrote: Qingqing Zhou [EMAIL PROTECTED] writes: The problem of above is if a signal sneaks in, these syscalls will fail. With a retry, we can fix it. It's a bit stickier than that but only a bit. If you just retry then you're saying users have to use kill -9 to get away from the situation. For some filesystem operations that may be the best we can do. But for most it ought to be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c or kill -1 normally. Even having the single backend exit (to avoid file resource leaks) is nicer than having to restart the entire instance. I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more graceful stop, but it won't work in some cases -- notice that the io routines we will patch can be used before the signal mechanism is setup. Regards, Qingqing ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq