Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Martijn van Oosterhout
On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote:
 I think this is a mistake -- the same mistake that got us into trouble with
 Turkish.
 
 Hashing depends on the concept of equality which is integral to the type. Two
 things are either the same or they aren't, and that can't change based on
 context.

So someone who wants a case-insensetive search actually doesn't want
Foo to equal foo? If you're arguing that that should be a different
type, well, that's a possibility. But does that mean someone who wants
an accent insensetive match also needs a new type? And a phonebook
match, where Mc and Mac are the same?

It was my understanding that the problem with Turkish/Hungarian was the
we only allow one collation for strings over the whole database. The
point is that in the future you will be able to select this on a per
column/index/query basis, so we don't need to stick to such a
restriction if the user explicitly asks to ignore it.

On a more practical level, a Hash Join needs to produce the same
results as a Merge Join, so if (a = b) then (hash(a) = hash(b)). So if
the user types (a = b COLLATE ignorecase) then the hash function needs
to change to match.

 Specifically in the case of strings, two strings should only be considered
 equal if they consist of the exact same series of characters. (That is, they
 could be encoded differently but they have to encode the same actual
 characters.) That they happen to sort equally compared to all other strings
 doesn't mean that they're equal.

Sure, for straight strings (COLLATE POSIX), that's absolutly a
requirement. But people also have other requirements, like treating
strings case-insensetively. I don't think we should restrict ourselves
to not being able to support their wishes.

You do bring up the possibility of secondary sort functions. Functions
which are not involved in testing for equality, but provide addition
sorting so that even in a case-insensetive sort, the different
variations in case appear together. All variations are equal, but some
are more equal than others type setup.

Thanks for the feedback,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgp1xpk5wgBqf.pgp
Description: PGP signature


Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and

2005-12-31 Thread Michael Paesold
Bruce Momjian wrote:

  The --single-transaction mode would apply even if the dump was created
  using an earlier version of pg_dump. pg_dump has *not* been altered at
  all. (And I would again add that the idea was not my own)
 
 I assume you mean this:
 
   http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php
 
 I guess with the ALTER commands I don't see much value in the
 --single-transaction flag.  I am sure others suggested it, but would
 they suggest it now given our current direction.

I just want to add that --single-transaction has a value of it's own. There
were times when I wanted to restore parts of a dump all-or-nothing. 

This is possible with PostgreSQL, unlike many other DBM systems, because
people like Tom Lane have invested in ensuring that all DDL is working
without implicitly committing an enclosing transaction.

Using pg_restore directly into a database, it is not possible to get a
single transaction right now. One has to restore to a file and manually
added BEGIN/COMMIT. Just for that I think --single-transaction is a great
addition and a missing feature.

I think more people have a use-case for that.

Best Regards,
Michael Paesold

-- 
Telefonieren Sie schon oder sparen Sie noch?
NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Qingqing Zhou [EMAIL PROTECTED] writes:
 I have patched IO routines in backend/storage that POSIX says EINTR is
 possible except unlink(). Though POSIX says EINTR is not possible, during
 many regressions, I found it sometimes sets this errno on NFS (I still
 don't know where is the smoking-gun):

 Well there is a reason intr is not the default for NFS mounts. It's precisely
 because it breaks the traditional unix filesystem interface.

Yeah.  We have looked at this before and decided that trying to defend
against it is too invasive and too fragile (how will you ever be sure
you've fixed everyplace, or keep other places from sneaking in later?)

What I'd rather do is document prominently that running a DB over NFS
isn't recommended, and running it over NFS with interrupts allowed is
just not going to work.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sat, 31 Dec 2005, Tom Lane wrote:

 What I'd rather do is document prominently that running a DB over NFS
 isn't recommended, and running it over NFS with interrupts allowed is
 just not going to work.


Agreed. IO syscalls is not the only problem for NFS -- if we can't fix
them in a run, then don't do it.

Regards,
Qingqing

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Rod Taylor
On Sat, 2005-12-31 at 14:40 -0500, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Qingqing Zhou [EMAIL PROTECTED] writes:
  I have patched IO routines in backend/storage that POSIX says EINTR is
  possible except unlink(). Though POSIX says EINTR is not possible, during
  many regressions, I found it sometimes sets this errno on NFS (I still
  don't know where is the smoking-gun):
 
  Well there is a reason intr is not the default for NFS mounts. It's 
  precisely
  because it breaks the traditional unix filesystem interface.

 What I'd rather do is document prominently that running a DB over NFS
 isn't recommended, and running it over NFS with interrupts allowed is
 just not going to work.

Are there issues with having an archive_command which does things with
NFS based filesystems?

-- 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote:
  Two things are either the same or they aren't, and that can't change
 based on context.

 So someone who wants a case-insensetive search actually doesn't want
 Foo to equal foo?

That nice simple worldview falls down in other areas as well.  An
example is zero and minus zero in IEEE math: they are equal for some
purposes but not others.  I think you really have to say that equality
is defined with respect to a particular datatype and a particular set
of operators.

The example of case-insensitive sorting suggests that we need to assume
that sort comparison functions can make finer-grained comparisons than
the associated equals operator does.  The current infrastructure
forces these to be exactly the same, but as long as we're busy
reinventing stuff, we could have two comparison functions associated
with a btree opclass: one that mimics the operators' behavior and one
that makes finer-grained comparisons and defines the actual sort order.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Martijn van Oosterhout
On Sat, Dec 31, 2005 at 02:54:18PM -0500, Tom Lane wrote:
 The example of case-insensitive sorting suggests that we need to assume
 that sort comparison functions can make finer-grained comparisons than
 the associated equals operator does.  The current infrastructure
 forces these to be exactly the same, but as long as we're busy
 reinventing stuff, we could have two comparison functions associated
 with a btree opclass: one that mimics the operators' behavior and one
 that makes finer-grained comparisons and defines the actual sort order.

Indeed, that's exactly the thought I had this afternoon, distiguish a
collation and a comparison function. It's certainly a lot easier to
implement than anything else I could think of

Have a great New Year everyone,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpLhmlXjIqDY.pgp
Description: PGP signature


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 On Sat, 31 Dec 2005, Tom Lane wrote:
 
  What I'd rather do is document prominently that running a DB over NFS
  isn't recommended, and running it over NFS with interrupts allowed is
  just not going to work.
 
 Agreed. IO syscalls is not the only problem for NFS -- if we can't fix
 them in a run, then don't do it.

I don't think that's reasonable. The NFS intr option breaks the traditional
unix filesystem semantics which breaks a lot of older or naive programs. But
that's no reason to decide that Postgres can't handle the new semantics.

Handling EINTR after all file system calls doesn't sound like it would be
terribly hard. And Postgres of all systems has the infrastructure necessary to
handle error conditions, abort and roll back the transaction when a file
system error occurs. I think mainly this means it would be possible to hit C-c
or shut down postgres (uncleanly) when there's a network outage.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sat, 31 Dec 2005, Greg Stark wrote:


 I don't think that's reasonable. The NFS intr option breaks the traditional
 unix filesystem semantics which breaks a lot of older or naive programs. But
 that's no reason to decide that Postgres can't handle the new semantics.


Is that by default the EINTR is truned off in NFS? If so, I don't see that
will be a problem. Sorry for my limited knowledge, is there any
requirements/benefits that people turn on EINTR?

 Handling EINTR after all file system calls doesn't sound like it would be
 terribly hard.

The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.

Regards,
Qingqing

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 On Sat, 31 Dec 2005, Greg Stark wrote:
 
 
  I don't think that's reasonable. The NFS intr option breaks the traditional
  unix filesystem semantics which breaks a lot of older or naive programs. But
  that's no reason to decide that Postgres can't handle the new semantics.
 
 
 Is that by default the EINTR is truned off in NFS? If so, I don't see that
 will be a problem. Sorry for my limited knowledge, is there any
 requirements/benefits that people turn on EINTR?

That's why the intr option (and the soft) option has traditionally not
been enabled by default in NFS implementations. But many people don't like
that when their NFS server disappears their client applications become
unkillable. They like to be able to hit C-c and stop whatever is running.

In the case of Postgres having intr off on the NFS mount point would mean
you couldn't C-c a query stuck because the database is on NFS. Of course it's
not like you would be able to run any more queries after that, but you might
want your terminal back.

You wouldn't even be able to shut down Postgres, even with kill -9. If your
NFS server is unrecoverable and you want to bring up a Postgres instance using
a backup restored some other place you would have to bring it up on another
port or reboot your machine.

That's the kind of thing that leads lots of sysadmins to use the intr and
soft options. And those sysadmins generally aren't aware of these kinds of
consequences since it's more of a programming level issue.

  Handling EINTR after all file system calls doesn't sound like it would be
  terribly hard.
 
 The problem is not restricted to file system. Actually my patched
 version(only backend/storage) passed hundreds times of regression without
 any problem, but EINTR can hurt other syscalls as well. Find out *all* the
 EINTR situtations may need big efforts AFAICS.

Well NFS is only going to affect filesystem calls. If there are other syscalls
that can signal EINTR on some obscure platform where Postgres isn't handling
it then that's just a run-of-the-mill porting issue.

But like I mentioned in the other thread POSIX is of no help here. With the
exception of the pthreads syscalls POSIX doesn't prohibit functions from
signalling errors other than the ones documented in the specification. So in
other words, just about any function can signal just about any error including
errors that are proprietary additions any time. Good luck :)

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sat, 31 Dec 2005, Greg Stark wrote:

 Qingqing Zhou [EMAIL PROTECTED] writes:

 
  Is that by default the EINTR is truned off in NFS? If so, I don't see that
  will be a problem. Sorry for my limited knowledge, is there any
  requirements/benefits that people turn on EINTR?

 That's why the intr option (and the soft) option has traditionally not
 been enabled by default in NFS implementations. But many people don't like
 that when their NFS server disappears their client applications become
 unkillable. They like to be able to hit C-c and stop whatever is running.


Thanks Greg and Martin, I now understand better of intr :-) So we can
killed Postgres or not depends on our signal handler. Query Cancel signal
won't work because ImmediateInterruptOK forbids it and the retry style
code in read/write will put the Postgres process into uninterruptable
sleep again. But die signal will work I think.

Regards,
Qingqing


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Rod Taylor [EMAIL PROTECTED] writes:

 Are there issues with having an archive_command which does things with
 NFS based filesystems?

Well, whatever command you use for archive_command -- probably just cp if
you're using NFS would hang if the NFS server went away. What would happen
then might be interesting. If Postgres finds the archive_command hanging
indefinitely will it correctly avoid recycling the WAL log indefinitely? I
assume so.

What's nonoptimal here is that I don't think there would be any warning that
anything was wrong until the WAL logs eventually filled up their filesystem
and then postgres stopped running. In the meantime your archived WAL logs
would be getting older and older and you would have no indication that
anything was failing.

This was the intention with the NFS error handling. The theory being that
eventually the server comes back up and things resume functioning exactly
where they left off with no lost operations. The upside is you don't have
things failing, then resuming later and unhandled errors in the meantime
leading to data corruption. The downside is there's no way for cp and
ultimately Postgres to know anything's wrong except to have a timeout itself
and an arbitrary maximum amount of time to expect operations to take.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] default resource limits

2005-12-31 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 In experimenting I needed to set this at 20 for it to bite much. If we 
 wanted to fine tune it I'd be inclined to say that we wanted 
 20*connections buffers for the first, say, 50  or 100 connections and 10 
 or 16 times for each connection over that. But that might be getting a 
 little too clever - something we should leave to a specialised tuning 
 tool. After all, we try these in fairly discrete jumps anyway. Maybe a 
 simple factor around 20 would be sufficient.

I think 10 is probably a good compromise value.  If we set it to 20
we'll find make check failing on Darwin because Apple's standard
SHMMAX value doesn't allow more than about 300 buffers ... and the
regression tests want max_connections to be at least 20.

I noticed while fooling with this on my laptop that initdb was selecting
a shared_buffers value less than the value it had just proved would work
:-(.  This is because the list of shared_buffers values it was probing
did not include all the values corresponding to values tried during the
max_connections scan.  I've added documentation about that gotcha.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Doug Royer


EINTR on read() or write() is not unique to NFS.
It can happen on many file systems - it is just seen
less frequently on most of them.

The code should be able to handle ANY valid read()
and write() errno. And EINTR is documented on Linux, BSD,
Solaris (1 and 2), and POSIX.

Even the Linux man pages can return ENTER on read() and
write(). This can happen on soft-mirrors, SCSI disks, and SOME
other disk drivers when they have errors.

The 'intr' option to NFS is not the same as EINTR. It
it means 'if the server does not respond for a while,
then return an EINTR', just like any other disk read()
or write() does when it fails to reply.

I have seen lots of open source code that assumes that all
disk reads and writs work 100% or fail 100%. Many do not
check the return value to see if all data was written or
read from disk. And many do not look at errno at all.
I have NOT looked to see how postgres does it.

If storage/*.c is where the reads occur, it does
very LITTLE when checking for errors.



Handling EINTR after all file system calls doesn't sound like it would be
terribly hard.


The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.



Well NFS is only going to affect filesystem calls. If there are other syscalls
that can signal EINTR on some obscure platform where Postgres isn't handling
it then that's just a run-of-the-mill porting issue.

But like I mentioned in the other thread POSIX is of no help here. With the
exception of the pthreads syscalls POSIX doesn't prohibit functions from
signalling errors other than the ones documented in the specification. So in
other words, just about any function can signal just about any error including
errors that are proprietary additions any time. Good luck :)



--

Doug Royer | http://INET-Consulting.com
---|-

  We Do Standards - You Need Standards

begin:vcard
fn:Doug Royer
n:Royer;Doug
org:INET-Consulting.com
adr:;;U.S.A
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:866-594-8574
tel;fax:866-594-8574
note;quoted-printable:AOL: SupportUnix=0D=0A=
	MSN: [EMAIL PROTECTED]
	Yahoo: Help4Unix
x-mozilla-html:FALSE
url:http://Royer.com
version:2.1
end:vcard



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and

2005-12-31 Thread August Zajonc
As a user and a list lurker I very much like Bruce's proposed ALTER 
TABLE syntax. COPY LOCK (and the variants I can imagine being required 
for all the other types of cases) don't seem as appealing.


And ALTER TABLE seems to make it clear it is an object level change, 
feels like it fits the internal model of the change better.


As a user a quick note that I've really found the documentation strong, 
which makes a big difference. Probably a low glory thing but much 
appreciated.


Big props everyone I see posting for keeping an eye (and code) on the 
various optimizations, a lot of the use cases are ones I can relate too 
(beyond even the ETL ones). I'm always impressed at the range of areas 
folks are looking at improving, and hope to see it encouraged, even with 
something as useless as thanks :)


To a good new year... and a great past one.

- August

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Doug McNaught
Doug Royer [EMAIL PROTECTED] writes:

 The 'intr' option to NFS is not the same as EINTR. It
 it means 'if the server does not respond for a while,
 then return an EINTR', just like any other disk read()
 or write() does when it fails to reply.

No, you're thinking of 'soft'.  'intr' (which is actually a modifier
to the 'hard' setting) causes the I/O to hang until the server comes
back or the process gets a signal (in which case EINTR is returned).

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou

Greg Stark [EMAIL PROTECTED] wrote

 Well NFS is only going to affect filesystem calls. If there are other 
 syscalls
 that can signal EINTR on some obscure platform where Postgres isn't 
 handling
 it then that's just a run-of-the-mill porting issue.


Ok, NFS just affects filesystem calls(I mix it with another problem). If 
possible, I hope we can draw some conclusion / schetch a fix plan here for 
future developers who want to come up with a patch. The question is:

Where and how should we fix exactly in order to incorporate intr NFS in 
server side?

More details we write down here, more feasible/infeasible plan we can get. I 
could think of these places:

+ direct file system calls
- open() family, fopen() family in backend/storage
- scattered open() etc in the whole backend (seems unlink is with 
biggest problem)

The problem of above is if a signal sneaks in, these syscalls will fail. 
With a retry, we can fix it.

+ indirect file system calls
- system(xxx) calls, xxx = cp, etc.

If intr NFS is enabled, what's the problem exactly?


Any others?

Regards,
Qingqing




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 The problem of above is if a signal sneaks in, these syscalls will fail. 
 With a retry, we can fix it.

It's a bit stickier than that but only a bit. If you just retry then you're
saying users have to use kill -9 to get away from the situation. For some
filesystem operations that may be the best we can do. But for most it ought to
be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c
or kill -1 normally. Even having the single backend exit (to avoid file
resource leaks) is nicer than having to restart the entire instance.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Andrew - Supernews
On 2005-12-31, Martijn van Oosterhout kleptog@svana.org wrote:
 You do bring up the possibility of secondary sort functions. Functions
 which are not involved in testing for equality, but provide addition
 sorting so that even in a case-insensetive sort, the different
 variations in case appear together. All variations are equal, but some
 are more equal than others type setup.

Doesn't this result in incorrect output in multi-column sorts?

i.e. if 'Foo' = 'foo', but for sorting purposes you always sort them
with 'Foo' first, then a multicolumn sort of the following data:

('Foo',1)
('foo',2)
('Foo',3)

would produce the wrong output, no?

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sun, 1 Jan 2006, Greg Stark wrote:

 Qingqing Zhou [EMAIL PROTECTED] writes:

  The problem of above is if a signal sneaks in, these syscalls will fail.
  With a retry, we can fix it.

 It's a bit stickier than that but only a bit. If you just retry then you're
 saying users have to use kill -9 to get away from the situation. For some
 filesystem operations that may be the best we can do. But for most it ought to
 be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c
 or kill -1 normally. Even having the single backend exit (to avoid file
 resource leaks) is nicer than having to restart the entire instance.


I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more
graceful stop, but it won't work in some cases -- notice that the io
routines we will patch can be used before the signal mechanism is setup.

Regards,
Qingqing

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq