from:"Nathan Myers"

Re: [HACKERS] OID wraparound: summary and proposal

2001-08-03 Thread Nathan Myers


On Thu, Aug 02, 2001 at 09:28:18AM +0200, Zeugswetter Andreas SB wrote:
> 
> > Strangely enough, I've seen no objection to optional OIDs
> > other than mine. Probably it was my mistake to have formulated
> > a plan on the flimsy assumption. 
> 
> I for one am more concerned about adding additional per
> tuple overhead (moving from 32 -> 64bit) than loosing OID's
> on some large tables. Imho optional OID's is the best way to combine 
> both worlds. 

At the same time that we announce support for optional OIDs,
we should announce that, in future releases, OIDs will only be 
guaranteed unique (modulo wraparounds) within a single table.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Bad timestamp external representation

2001-07-26 Thread Nathan Myers


On Thu, Jul 26, 2001 at 05:38:23PM -0400, Bruce Momjian wrote:
> Nathan Myers wrote:
> > Bruce wrote:
> > > 
> > > I can confirm that current CVS sources have the same bug.
> > > 
> > > > It's a bug in timestamp output.
> > > > 
> > > > # select '2001-07-24 15:55:59.999'::timestamp;
> > > >  ?column?  
> > > > ---
> > > >  2001-07-24 15:55:60.00-04
> > > > (1 row)
> > > > 
> > > > Richard Huxton wrote:
> > > > > 
> > > > > From: "tamsin" <[EMAIL PROTECTED]>
> > > > > 
> > > > > > Hi,
> > > > > >
> > > > > > Just created a db from a pg_dump file and got this error:
> > > > > >
> > > > > > ERROR:  copy: line 602, Bad timestamp external representation 
> > > > > > '2000-10-03 09:01:60.00+00'
> > > > > >
> > > > > > I guess its a bad representation because 09:01:60.00+00
> > > > > > is actually 09:02, but how could it have got into my
> > > > > > database/can I do anything about it? The value must have
> > > > > > been inserted by my app via JDBC, I can't insert that value
> > > > > > directly via psql.
> > > > >
> > > > > Seem to remember a bug in either pg_dump or timestamp
> > > > > rendering causing rounding-up problems like this. If no-one
> > > > > else comes up with a definitive answer, check the list
> > > > > archives. If you're not running the latest release, check the
> > > > > change-log.
> >
> > It is not a bug, in general, to generate or accept times like
> > 09:01:60. Leap seconds are inserted as the 60th second of a minute.
> > ANSI C defines the range of struct member tm.tm_sec as "seconds
> > after the minute [0-61]", inclusive, and strftime format %S as "the
> > second as a decimal number (00-61)". A footnote mentions "the range
> > [0-61] for tm_sec allows for as many as two leap seconds".
> >
> > This is not to say that pg_dump should misrepresent stored times,
> > but rather that PG should not reject those misrepresented times as
> > being ill-formed. We were lucky that PG has the bug which causes it
> > to reject these times, as it led to the other bug in pg_dump being
> > noticed.
>
> We should access :60 seconds but we should round 59.99 to 1:00, right?

If the xx:59.999 occurred immediately before a leap second, rounding it
up to (xx+1):00.00 would introduce an error of 1.001 seconds.

As I understand it, the problem is in trying to round 59.999 to two
digits.  My question is, why is pg_dump representing times with less 
precision than PostgreSQL's internal format?  Should pg_dump be lossy?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Bad timestamp external representation

2001-07-25 Thread Nathan Myers

On Wed, Jul 25, 2001 at 06:53:21PM -0400, Bruce Momjian wrote:
> 
> I can confirm that current CVS sources have the same bug.
> 
> > It's a bug in timestamp output.
> > 
> > # select '2001-07-24 15:55:59.999'::timestamp;
> >  ?column?  
> > ---
> >  2001-07-24 15:55:60.00-04
> > (1 row)
> > 
> > Richard Huxton wrote:
> > > 
> > > From: "tamsin" <[EMAIL PROTECTED]>
> > > 
> > > > Hi,
> > > >
> > > > Just created a db from a pg_dump file and got this error:
> > > >
> > > > ERROR:  copy: line 602, Bad timestamp external representation '2000-10-03
> > > > 09:01:60.00+00'
> > > >
> > > > I guess its a bad representation because 09:01:60.00+00 is actually 09:02,
> > > > but how could it have got into my database/can I do anything about it?
> > > The
> > > > value must have been inserted by my app via JDBC, I can't insert that
> > > value
> > > > directly via psql.
> > > 
> > > Seem to remember a bug in either pg_dump or timestamp rendering causing
> > > rounding-up problems like this. If no-one else comes up with a definitive
> > > answer, check the list archives. If you're not running the latest release,
> > > check the change-log.

It is not a bug, in general, to generate or accept times like 09:01:60.  
Leap seconds are inserted as the 60th second of a minute.  ANSI C 
defines the range of struct member tm.tm_sec as "seconds after the 
minute [0-61]", inclusive, and strftime format %S as "the second
as a decimal number (00-61)".  A footnote mentions "the range [0-61]
for tm_sec allows for as many as two leap seconds".

This is not to say that pg_dump should misrepresent stored times,
but rather that PG should not reject those misrepresented times as 
being ill-formed.  We were lucky that PG has the bug which causes
it to reject these times, as it led to the other bug in pg_dump being
noticed.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: RPM source files should be in CVS (was Re: [GENERAL] psql -l)

2001-07-20 Thread Nathan Myers


On Fri, Jul 20, 2001 at 07:05:46PM -0400, Trond Eivind Glomsr?d wrote:
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > BTW, the only python shebangs I can find in CVS look like
> > #! /usr/bin/env python
> > Isn't that OK on RedHat?
> 
> It is.

Probably the perl scripts should say, likewise, 

  #!/usr/bin/env perl

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] MySQL Gemini code

2001-07-18 Thread Nathan Myers

On Wed, Jul 18, 2001 at 06:37:48PM -0400, Trond Eivind Glomsr?d wrote:
> Michael Widenius <[EMAIL PROTECTED]> writes:
> > Assigning over the code is also something that FSF requires for all
> > code contributions.  If you criticize us at MySQL AB, you should
> > also criticize the above.
> 
> This is slightly different - FSF wants it so it will have a legal
> position to defend its programs: ...
> MySQL and TrollTech requires copyright assignment in order to sell
> non-open licenses. Some people will have a problem with this, while
> not having a problem with the FSF copyright assignment.

Nobody who works on MySQL is unaware of MySQL AB's business model.
Anybody who contributes to the core server has to expect that MySQL 
AB will need to relicense anything accepted into the core; that's 
their right as originators.  Everybody who contributes has a choice 
to make: fork, or sign over.  (With the GPL, forking remains possible;
Apple and Sun "community" licenses don't allow it.)

Anybody who contributes to PG has to make the same choice: fork, 
or put your code under the PG license.  The latter choice is 
equivalent to "signing over" to all proprietary vendors, who are 
then free to take your code proprietary.  Some of us like that.

> > I had actually hoped to get support from you guys at PostgreSQL
> > regarding this.  You may have similar experience or at least
> > understand our position. The RedHat database may be a good thing
> > for PostgreSQL, but I am not sure if it's a good thing for RedHat
> > or for the main developers to PostgreSQL. 
> 
> This isn't even a remotely similar situation: ...

It's similar enough.  One difference is that PG users are less
afraid to fork.  Another is that without the GPL, we have elected 
not to (and indeed cannot) stop any company from doing with PG what 
NuSphere is doing with MySQL.

This is why characterizing the various licenses as more or less
"business-friendly" is misleading (i.e. dishonest) -- it evades the 
question, "friendly to whom?".  Businesses sometimes compete...

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] MySQL Gemini code

2001-07-18 Thread Nathan Myers

On Wed, Jul 18, 2001 at 08:35:58AM -0400, Jan Wieck wrote:
> And this press release
> 
> http://www.nusphere.com/releases/071601.htm
> 
> also  explains why they had to do it this way.

They were always free to fork, but doing it the way they did --
violating MySQL AB's license -- they shot the dog.

The lesson?  Ask somebody competent, first, before you bet your
company playing license games.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] MySQL Gemini code

2001-07-18 Thread Nathan Myers

On Wed, Jul 18, 2001 at 11:45:54AM -0400, Bruce Momjian wrote:
> > And this press release
> > 
> > http://www.nusphere.com/releases/071601.htm
> ...
> On a more significant note, I hear the word "fork" clearly suggested
> in that text.  It is almost like MySQL AB GPL'ed the MySQL code and
> now they may not be able to keep control of it.

Anybody is free to fork MySQL or PostgreSQL alike.  The only difference
is that all published MySQL forks must remain public, where PostgreSQL 
forks need not.  MySQL AB is demonstrating their legal right to keep as
much control as they chose, and NuSphere will lose if it goes to court.

The interesting event here is that since NuSphere violated the license 
terms, they no longer have any rights to use or distribute the MySQL AB 
code, and won't until they get forgiveness from MySQL AB.  MySQL AB 
would be within their rights to demand that the copyright to Gemini be 
signed over, before offering forgiveness.

If Red Hat forks PostgreSQL, nobody will have any grounds for complaint.
(It's been forked lots of times already, less visibly.)

Nathan Myers 
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[HACKERS] dependent dependants

2001-07-18 Thread Nathan Myers



For the record:

  http://www.lineone.net/dictionaryof/englishusage/d0081889.html

dependent or dependant

  "Dependent is the adjective, used for a person or thing that depends
  on someone or something: Admission to college is dependent on A-level
  results. Dependant is the noun, and is a person who relies on someone
  for financial support: Do you have any dependants?"

This is not for mailing-list pendantism, but just to make sure 
that the right spelling gets into the code.  (The page mentioned 
above was found by entering "dependent dependant" into Google.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-17 Thread Nathan Myers

On Thu, Jul 12, 2001 at 11:08:34PM +0200, Peter Eisentraut wrote:
> Nathan Myers writes:
> 
> > When the system is too heavily loaded (however measured), any further
> > login attempts will fail.  What I suggested is, instead of the
> > postmaster accept()ing the connection, why not leave the connection
> > attempt in the queue until we can afford a back end to handle it?
> 
> Because the new connection might be a cancel request.

Supporting cancel requests seems like a poor reason to ignore what
load-shedding support operating systems provide.  

To support cancel requests, it would suffice for PG to listen at 
another socket dedicated to administrative requests.  (It might 
even ignore MaxBackends for connections on that socket.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-16 Thread Nathan Myers

On Sat, Jul 14, 2001 at 11:38:51AM -0400, Tom Lane wrote:
> 
> The state of affairs in current sources is that the listen queue
> parameter is MIN(MaxBackends * 2, PG_SOMAXCONN), where PG_SOMAXCONN
> is a constant defined in config.h --- it's 1, hence a non-factor,
> by default, but could be reduced if you have a kernel that doesn't
> cope well with large listen-queue requests.  We probably won't know
> if there are any such systems until we get some field experience with
> the new code, but we could have "configure" select a platform-dependent
> value if we find such problems.

Considering the Apache comment about some systems truncating instead
of limiting... 1&0xff is 16.  Maybe 10239 would be a better choice, 
or 16383.  

> So, having thought that through, I'm still of the opinion that holding
> off accept is of little or no benefit to us.  But it's not as simple
> as it looks at first glance.  Anyone have a different take on what the
> behavior is likely to be?

After doing some more reading, I find that most OSes do not reject
connect requests that would exceed the specified backlog; instead,
they ignore the connection request and assume the client will retry 
later.  Therefore, it appears cannot use a small backlog to shed load 
unless we assume that clients will time out quickly by themselves.

OTOH, maybe it's reasonable to assume that clients will time out,
and that in the normal case authentication happens quickly.

Then we can use a small listen() backlog, and never accept() if we
have more than MaxBackend back ends.  The OS will keep a small queue
corresponding to our small backlog, and the clients will do our load 
shedding for us.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-13 Thread Nathan Myers


On Fri, Jul 13, 2001 at 07:53:02AM -0400, mlw wrote:
> Zeugswetter Andreas SB wrote:
> > I liked the idea of min(MaxBackends, PG_SOMAXCONN), since there is no use in
> > accepting more than your total allowed connections concurrently.
> 
> I have been following this thread and I am confused why the queue
> argument to listen() has anything to do with Max backends. All the
> parameter to listen does is specify how long a list of sockets open
> and waiting for connection can be. It has nothing to do with the
> number of back end sockets which are open.

Correct.

> If you have a limit of 128 back end connections, and you have 127
> of them open, a listen with queue size of 128 will still allow 128
> sockets to wait for connection before turning others away.

Correct.

> It should be a parameter based on the time out of a socket connection
> vs the ability to answer connection requests within that period of
> time.

It's not really meaningful at all, at present.

> There are two was to think about this. Either you make this parameter
> tunable to give a proper estimate of the usability of the system, i.e.
> tailor the listen queue parameter to reject sockets when some number
> of sockets are waiting, or you say no one should ever be denied,
> accept everyone and let them time out if we are not fast enough.
>
> This debate could go on, why not make it a parameter in the config
> file that defaults to some system variable, i.e. SOMAXCONN.

With postmaster's current behavior there is no benefit in setting
the listen() argument to anything less than 1000.  With a small
change in postmaster behavior, a tunable system variable becomes
useful.

But using SOMAXCONN blindly is always wrong; that is often 5, which
is demonstrably too small.

> BTW: on linux, the backlog queue parameter is silently truncated to
> 128 anyway.

The 128 limit is common, applied on BSD and Solaris as well.
It will probably increase in future releases.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: SOMAXCONN

2001-07-13 Thread Nathan Myers

On Fri, Jul 13, 2001 at 10:36:13AM +0200, Zeugswetter Andreas SB wrote:
> 
> > When the system is too heavily loaded (however measured), any further 
> > login attempts will fail.  What I suggested is, instead of the 
> > postmaster accept()ing the connection, why not leave the connection 
> > attempt in the queue until we can afford a back end to handle it?  
> 
> Because the clients would time out ?

It takes a long time for half-open connections to time out, by default.
Probably most clients would time out, themselves, first, if PG took too
long to get to them.  That would be a Good Thing.

Once the SOMAXCONN threshold is reached (which would only happen when 
the system is very heavily loaded, because when it's not then nothing 
stays in the queue for long), new connection attempts would fail 
immediately, another Good Thing.  When the system is very heavily 
loaded, we don't want to spare attention for clients we can't serve.

> > Then, the argument to listen() will determine how many attempts can 
> > be in the queue before the network stack itself rejects them without 
> > the postmaster involved.
> 
> You cannot change the argument to listen() at runtime, or are you suggesting
> to close and reopen the socket when maxbackends is reached ? I think 
> that would be nonsense.

Of course that would not work, and indeed nobody suggested it.

If postmaster behaved a little differently, not accept()ing when
the system is too heavily loaded, then it would be reasonable to
call listen() (once!) with PG_SOMAXCONN set to (e.g.) N=20.  

Where the system is not too heavily-loaded, the postmaster accept()s
the connection attempts from the queue very quickly, and the number
of half-open connections never builds up to N.  (This is how PG has
been running already, under light load -- except that on Solaris with 
Unix sockets N has been too small.)

When the system *is* heavily loaded, the first N attempts would be 
queued, and then the OS would automatically reject the rest.  This 
is better than accept()ing any number of attempts and then refusing 
to authenticate.  The N half-open connections in the queue would be 
picked up by postmaster as existing back ends drop off, or time out 
and give up if that happens too slowly.  

> I liked the idea of min(MaxBackends, PG_SOMAXCONN), since there is no
> use in accepting more than your total allowed connections concurrently.

That might not have the effect you imagine, where many short-lived
connections are being made.  In some cases it would mean that clients 
are rejected that could have been served after a very short delay.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-12 Thread Nathan Myers

On Thu, Jul 12, 2001 at 10:14:44AM +0200, Zeugswetter Andreas SB wrote:
> 
> > The question is really whether you ever want a client to get a
> > "rejected" result from an open attempt, or whether you'd rather they 
> > got a report from the back end telling them they can't log in.  The 
> > second is more polite but a lot more expensive.  That expense might 
> > really matter if you have MaxBackends already running.
> 
> One of us has probably misunderstood the listen parameter.

I don't think so.

> It only limits the number of clients that can connect concurrently.
> It has nothing to do with the number of clients that are already 
> connected.  It sort of resembles a maximum queue size for the accept 
> loop.  Incoming connections fill the queue, accept frees the queue by
> taking the connection to a newly forked backend.

The MaxBackends constant and the listen() parameter have no effect 
until the number of clients already connected or trying to connect
and not yet noticed by the postmaster (respectively) exceed some 
threshold.  We would like to choose such thresholds so that we don't 
promise service we can't deliver.

We can assume the administrator has tuned MaxBackends so that a
system with that many back ends running really _is_ heavily loaded.  
(We have talked about providing a better measure of load than the
gross number of back ends; is that on the Todo list?)

When the system is too heavily loaded (however measured), any further 
login attempts will fail.  What I suggested is, instead of the 
postmaster accept()ing the connection, why not leave the connection 
attempt in the queue until we can afford a back end to handle it?  
Then, the argument to listen() will determine how many attempts can 
be in the queue before the network stack itself rejects them without 
the postmaster involved.

As it is, the listen() queue limit is not useful.  It could be made
useful with a slight change in postmaster behavior.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-11 Thread Nathan Myers

On Wed, Jul 11, 2001 at 12:26:43PM -0400, Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > Tom Lane writes:
> >> Right.  Okay, it seems like just making it a hand-configurable entry
> >> in config.h.in is good enough for now.  When and if we find that
> >> that's inadequate in a real-world situation, we can improve on it...
> 
> > Would anything computed from the maximum number of allowed connections
> > make sense?
> 
> [ looks at code ... ]  Hmm, MaxBackends is indeed set before we arrive
> at the listen(), so it'd be possible to use MaxBackends to compute the
> parameter.  Offhand I would think that MaxBackends or at most
> 2*MaxBackends would be a reasonable value.
>
> Question, though: is this better than having a hardwired constant?
> The only case I can think of where it might not be is if some platform
> out there throws an error from listen() when the parameter is too large
> for it, rather than silently reducing the value to what it can handle.
> A value set in config.h.in would be simpler to adapt for such a platform.

The question is really whether you ever want a client to get a
"rejected" result from an open attempt, or whether you'd rather they 
got a report from the back end telling them they can't log in.  The 
second is more polite but a lot more expensive.  That expense might 
really matter if you have MaxBackends already running.

I doubt most clients have tested either failure case more thoroughly 
than the other (or at all), but the lower-level code is more likely 
to have been cut-and-pasted from well-tested code. :-)

Maybe PG should avoid accept()ing connections once it has MaxBackends
back ends already running (as hinted at by Ian), so that the listen()
parameter actually has some meaningful effect, and excess connections 
can be rejected more cheaply.  That might also make it easier to respond 
more adaptively to true load than we do now.

> BTW, while I'm thinking about it: why doesn't pqcomm.c test for a
> failure return from the listen() call?  Is this just an oversight,
> or is there a good reason to ignore errors?

The failure of listen() seems impossible.  In the Linux, NetBSD, and 
Solaris man pages, none of the error returns mentioned are possible 
with PG's current use of the function.  It seems as if the most that 
might be needed now would be to add a comment to the call to socket() 
noting that if any other address families are supported (besides 
AF_INET and AF_LOCAL aka AF_UNIX), the call to listen() might need to 
be looked at.  AF_INET6 (which PG will need to support someday)
doesn't seem to change matters.

Probably if listen() did fail, then one or other of bind(), accept(),
and read() would fail too.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Re: Encrypting pg_shadow passwords

2001-07-11 Thread Nathan Myers

On Wed, Jul 11, 2001 at 01:24:53PM +1000, Michael Samuel wrote:
> The crypt authentication currently used offers _no_ security. ...
> Of course, SSL *if done correctly with certificate verification* is the
> correct fix.  If no certificate verification is done, you fall victim to
> a man-in-the-middle attack.

It seems worth noting here that you don't have to depend on
SSL authentication; PG can do its own authentication over SSL
and avoid the man-in-the-middle attack that way.  

Of course, PG would have to do its authentication properly, e.g. 
with the HMAC method.  That seems better than depending on SSL 
authentication, because SSL certification seems to be universally
misconfigured.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: SOMAXCONN (was Re: [HACKERS] Solaris source code)

2001-07-10 Thread Nathan Myers

On Tue, Jul 10, 2001 at 06:36:21PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > All the OSes we know of fold it to 128, currently.  We can jump it 
> > to 10240 now, or later when there are 20GHz CPUs.
> 
> > If you want to make it more complicated, it would be more useful to 
> > be able to set the value lower for runtime environments where PG is 
> > competing for OS resources with another daemon that deserves higher 
> > priority.
> 
> Hmm, good point.  Does anyone have a feeling for the amount of kernel
> resources that are actually sucked up by an accept-queue entry?  If 128
> is the customary limit, is it actually worth worrying about whether
> we are setting it to 128 vs. something smaller?

I don't think the issue is the resources that are consumed by the 
accept-queue entry.  Rather, it's a tuning knob to help shed load 
at the entry point to the system, before significant resources have 
been committed.  An administrator would tune it according to actual
system and traffic characteristics.

It is easy enough for somebody to change, if they care, that it seems 
to me we have already devoted it more time than it deserves right now.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: SOMAXCONN (was Re: [HACKERS] Solaris source code)

2001-07-10 Thread Nathan Myers


On Tue, Jul 10, 2001 at 05:06:28PM -0400, Bruce Momjian wrote:
> > Mathijs Brands <[EMAIL PROTECTED]> writes:
> > > OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to
> > > be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression
> > > tests on two different Sparc boxes (Solaris 7 and 8). The regression
> > > test still fails, but for a different reason. The abstime test fails;
> > > not only on Solaris but also on FreeBSD (4.3-RELEASE).
> > 
> > The abstime diff is to be expected (if you look closely, the test is
> > comparing 'current' to 'June 30, 2001'.  Ooops).  If that's the only
> > diff then you are in good shape.
> > 
> > 
> > Based on this and previous discussions, I am strongly tempted to remove
> > the use of SOMAXCONN and instead use, say,
> > 
> > #define PG_SOMAXCONN1000
> > 
> > defined in config.h.in.  That would leave room for configure to twiddle
> > it, if that proves necessary.  Does anyone know of a platform where this
> > would cause problems?  AFAICT, all versions of listen(2) are claimed to
> > be willing to reduce the passed parameter to whatever they can handle.
> 
> Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN
> is less than 1000?

All the OSes we know of fold it to 128, currently.  We can jump it 
to 10240 now, or later when there are 20GHz CPUs.

If you want to make it more complicated, it would be more useful to 
be able to set the value lower for runtime environments where PG is 
competing for OS resources with another daemon that deserves higher 
priority.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: AW: [HACKERS] pg_index.indislossy

2001-07-10 Thread Nathan Myers


On Tue, Jul 10, 2001 at 01:36:33PM -0400, Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > But why is this called lossy?  Shouldn't it be called "exceedy"?
> 
> Good point ;-).  "lossy" does sound like the index might "lose" tuples,
> which is exactly what it's not allowed to do; it must find all the
> tuples that match the query.
> 
> The terminology is correct by analogy to "lossy compression" --- the
> index loses information, in the sense that its result isn't quite the
> result you wanted.  But I can see where it'd confuse the unwary.
> Perhaps we should consult the literature and see if there is another
> term for this concept.

How about "hinty"? :-)

Seriously, "indislossy" is a singularly poor name for a predicate.
Also, are we so poor that we can't afford whole words, or even word 
breaks?  I propose "index_is_hint".  

Actually, is the "ind[ex]" part even necessary?  
How about "must_check_heap"?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: Backup and Recovery

2001-07-09 Thread Nathan Myers

On Fri, Jul 06, 2001 at 06:52:49AM -0400, Bruce Momjian wrote:
> Nathan wrote:
> > How hard would it be to turn these row records into updates against a 
> > pg_dump image, assuming access to a good table-image file?
> 
> pg_dump is very hard because WAL contains only tids.  No way to match
> that to pg_dump-loaded rows.

Maybe pg_dump can write out a mapping of TIDs to line numbers, and the
back-end can create a map of inserted records' line numbers when the dump 
is reloaded, so that the original TIDs can be traced to the new TIDs.
I guess this would require a new option on IMPORT.  I suppose the
mappings could be temporary tables.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Doing authentication in backend

2001-06-15 Thread Nathan Myers

On Thu, Jun 14, 2001 at 01:42:26PM -0400, Tom Lane wrote:
> Also note that we could easily fix things so that the max-number-of-
> backends limit is not checked until we have passed the authentication
> procedure.  A PM child that's still busy authenticating doesn't have
> to count.

And impose a very short timeout on authentication.

> Another problem with the present setup is total cost of servicing each
> connection request.  We've seen several complaints about connection-
> refused problems under heavy load, occurring because the single
> postmaster process simply can't service the requests quickly enough to
> keep its accept() queue from overflowing.

This last could also be addressed (along with Solaris's Unix Sockets 
problem!) by changing the second argument to listen(2) from the current 
SOMAXCONN -- which is 5 in Solaris 2.7 -- to 127.  See the six-page
discussion in Stevens UNPv1 beginning at page 93.

This is not to say we shouldn't fork before authentication, for
the above and other reasons, but the fix to listen(2)'s argument 
should happen anyway.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] RE: Row Versioning, for jdbc updateable result sets

2001-06-15 Thread Nathan Myers


On Fri, Jun 15, 2001 at 10:21:37AM -0400, Tom Lane wrote:
> "Dave Cramer" <[EMAIL PROTECTED]> writes:
> > I had no idea that xmin even existed, but having a quick look I think this
> > is what I am looking for. Can I assume that if xmin has changed, then
> > another process has changed the underlying data ?
> 
> xmin is a transaction ID, not a process ID, but looking at it should
> work for your purposes at present.
> 
> There has been talk of redefining xmin as part of a solution to the
> XID-overflow problem: what would happen is that all "sufficiently old"
> tuples would get relabeled with the same special xmin, so that only
> recent transactions would need to have distinguishable xmin values.
> If that happens then your code would break, at least if you want to
> check for changes just at long intervals.

An simpler alternative was change all "sufficiently old" tuples to have 
an xmin value, N, equal to the oldest that would need to be distinguished.  
xmin values could then be compared using normal arithmetic: less(xminA, 
xminB) is just ((xminA - N) < (xminB - N)), with no special cases.

> A hack that comes to mind is that when relabeling an old tuple this way,
> we could copy its original xmin into cmin while setting xmin to the
> permanently-valid XID.  Then, if you compare both xmin and cmin, you
> have only about a 1 in 2^32 chance of being fooled.  (At least if we
> use a wraparound style of allocating XIDs.  I think Vadim is advocating
> resetting the XID counter to 0 at each system restart, so the active
> range of XIDs might be a lot smaller than 2^32 in that scenario.)

That assumes a pretty frequent system restart.  Many of us prefer
to code to the goal of a system that could run for decades.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] What (not) to do in signal handlers

2001-06-14 Thread Nathan Myers


On Thu, Jun 14, 2001 at 05:10:58PM -0400, Tom Lane wrote:
> Doug McNaught <[EMAIL PROTECTED]> writes:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> >> Hm.  That's one way, but is it really any cleaner than our existing
> >> technique?  Since you still need to assume you can do a system call
> >> in a signal handler, it doesn't seem like a real gain in
> >> bulletproofness to me.
> 
> > Doing write() in a signal handler is safe; doing fprintf() (and
> > friends) is not.
> 
> If we were calling the signal handlers from random places, then I'd
> agree.  But we're not: we use sigblock to ensure that signals are only
> serviced at the place in the postmaster main loop where select() is
> called.  So there's no actual risk of reentrant use of non-reentrant
> library functions.
> 
> Please recall that in practice the postmaster is extremely reliable.
> The single bug we have seen with the signal handlers in recent releases
> was the problem that they were clobbering errno, which was easily fixed
> by saving/restoring errno.  This same bug would have arisen (though at
> such low probability we'd likely never have solved it) in a signal
> handler that only invoked write().  So I find it difficult to buy the
> argument that there's any net gain in robustness to be had here.
> 
> In short: this code isn't broken, and so I'm not convinced we should
> "fix" it.
 
Formally speaking, it *is* broken: we depend on semantics that are
documented as unportable and undefined.  In a sense, we have been so 
unlucky as not to have perceived, thus far, the undefined effects.  

This is no different from depending on finding a NUL at *(char*)0, or 
on being able to say "free(p); p = p->next;".  Yes, it appears to work,
at the moment, on some platforms, but that doesn't make it correct.

It may not be terribly urgent to fix it right now, but that's far from
"isn't broken".  It at least merits a TODO entry.

Nathan Myers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] What (not) to do in signal handlers

2001-06-14 Thread Nathan Myers

On Thu, Jun 14, 2001 at 04:27:14PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > It could open a pipe, and write(2) a byte to it in the signal handler, 
> > and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
> > Of course this is still a system call in a signal handler, but it can't
> > (modulo coding bugs) fail.
> 
> Hm.  That's one way, but is it really any cleaner than our existing
> technique?  Since you still need to assume you can do a system call
> in a signal handler, it doesn't seem like a real gain in
> bulletproofness to me.

Quoting Stevens (UNPv2, p. 90),

  Posix uses the term *async-signal-safe* to describe the functions that
  may be called from a signal handler.  Figure 5.10 lists these Posix
  functions, along with a few that were added by Unix98.

  Functions not listed may not be called from a signal andler.  Note that
  none of the standard I/O functions ... are listed.  Of call the IPC
  functions covered in this text, only sem_post, read, and write are
  listed (we are assuming the latter two would be used with pipes and
  FIFOs).

Restricting the handler to use those in the approved list seems like an 
automatic improvement to me, even in the apparent absence of evidence 
of problems on those platforms that happen to get tested most.  

> > A pipe per backend might be considered pretty expensive.
> 
> Pipe per postmaster, no?  That doesn't seem like a huge cost.  

I haven't looked at how complex the signal handling in the backends is;
maybe they don't need anything this fancy.  (OTOH, maybe they should be 
using a pipe to communicate with postmaster, instead of using signals.)

> I'd be
> more concerned about the two extra kernel calls (write and read) per
> signal received, actually.

Are there so many signals flying around?  The signal handler would check 
a flag before writing, so a storm of signals would result in only one 
call to write, and one call to read, per select loop.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] What (not) to do in signal handlers

2001-06-14 Thread Nathan Myers

On Thu, Jun 14, 2001 at 02:18:40PM -0400, Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > I notice that the signal handlers in postmaster.c do quite a lot of work,
> > much more than what they teach you in school they should do.
> 
> Yes, they're pretty ugly.  However, we have not recently heard any
> complaints suggesting problems with it.  Since we block signals
> everywhere except just around the select() for new input, there's not
> really any risk of recursive resource use AFAICS.
> 
> > ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP
> > handler, i.e., set a global variable and evaluate right after the
> > select().
> 
> I would love to see it done that way, *if* you can show me a way to
> guarantee that the signal response will happen promptly.  AFAIK there's
> no portable way to ensure that we don't end up sitting and waiting for a
> new client message before we get past the select().  

It could open a pipe, and write(2) a byte to it in the signal handler, 
and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
Writing to and reading from your own pipe can be a recipe for deadlock, 
but here it would be safe if the signal handler knows not to get too far
ahead of select.  (The easy way would be to allow no more than one byte
in the pipe per signal handler.)

Of course this is still a system call in a signal handler, but it can't
(modulo coding bugs) fail.  See Stevens, "Unix Network Programming, 
Vol. 2, Interprocess Communication", p. 91, Figure 5.10, "Functions 
that are async-signal-safe".  The figure lists write() among others.
Sample code implementing the above appears on page 94.  Examples using 
other techniques (sigwait, nonblocking mq_receive) are presented also.

A pipe per backend might be considered pretty expensive.  Does UNIX 
allocate a pipe buffer before there's anything to put in it?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[HACKERS] Re: Australian timezone configure option

2001-06-13 Thread Nathan Myers

On Thu, Jun 14, 2001 at 12:23:22AM +, Thomas Lockhart wrote:
> > Surely the correct solution is to have a config file somewhere
> > that gets read on startup? That way us Australians don't have to be the only
> > ones in the world that need a custom built postgres.
> 
> I will point out that "you Australians", and, well, "us 'mericans", are
> the only countries without the sense to choose unique conventions for
> time zone names.
> 
> It sounds like having a second lookup table for the Australian rules is
> a possibility, and this sounds fairly reasonable to me. Btw, is there an
> Australian convention for referring to North American time zones for
> those zones with naming conflicts?

For years I've been on the TZ list, the announcement list for a 
community-maintained database of time zones.  One point they have 
firmly established is that there is no reasonable hope of making 
anything like a standard system of time zone name abbreviations work.  
Legislators and dictators compete for arbitrariness in their time
zone manipulations.

Even if you assign, for your own use, an abbreviation to a particular
administrative region, you still need a history of legislation for that 
region to know what any particular time record (particularly and April 
or September) really means.

The "best practice" for annotating times is to tag them with the numeric
offset from UTC at the time the sample is formed.  If the time sample is
the present time, you don't have to know very much make or use it.  If 
it's in the past, you have to know the legislative history of the place 
to form a proper time record, but not to use it.  If the time is in the 
future, you cannot know what offset will be in popular use at that time, 
but at least you can be precise about what actual time you really mean,
even if you can't be sure about what the wall clock says.  (Actual wall 
clock times are not reliably predictable, a fact that occasionally makes 
things tough on airline passengers.)

Things are a little more stable in some places (e.g. in Europe it is
improving) but worldwide all is chaos.

Assigning some country's current abbreviations at compile time is madness.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Idea: quicker abort after loss of client connection

2001-06-06 Thread Nathan Myers


On Tue, Jun 05, 2001 at 08:01:02PM -0400, Tom Lane wrote:
> 
> Thoughts?  Is there anything about this that might be unsafe?  Should
> QueryCancel be set after *any* failure of recv() or send(), or only
> if certain errno codes are detected (and if so, which ones)?

Stevens identifies some errno codes that are not significant;
in particular, EINTR, EAGAIN, and EWOULDBLOCK.  Of these, maybe
only the first occurs on a blocking socket.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: Interesting Atricle

2001-06-04 Thread Nathan Myers

On Mon, Jun 04, 2001 at 04:55:13PM -0400, Bruce Momjian wrote:
> > This is getting off-topic, but ... 
> > 
> > I keep CSS, Javascript, Java, dynamic fonts, and images turned off, and
> > Netscape 4.77 stays up for many weeks at a time.  I also have no Flash 
> > plugin.  All together it makes for a far more pleasant web experience.
> > 
> > I didn't notice any problem with the Zend page.
> 
> You are running no images!  You may as well have Netscape minimized and
> say it is running for weeks.  :-)

Over 98% of the images on the web are either pr0n or wankage.  
If you don't need to see that, you can save a lot of time.

But it's usually Javascript that crashes Netscape.  (CSS appears to
be implemented using Javascript, because if you turn off Javascript,
then CSS stops working (and crashing).) That's not to say that Java 
doesn't also crash Netscape; it's just that pages with Java in them 
are not very common.

There's little point in bookmarking a site that depends on client-side
Javascript or Java, because it won't be up for very long.

But this is *really* off topic, now.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: Interesting Atricle

2001-06-04 Thread Nathan Myers


On Sat, Jun 02, 2001 at 10:59:20AM -0400, Vince Vielhaber wrote:
> On Fri, 1 Jun 2001, Bruce Momjian wrote:
> 
> > > > Thought some people might find this article interesting.
> > > > http://www.zend.com/zend/art/databases.php
> > >
> > > The only interesting thing I noticed is how fast it crashes my
> > > Netscape-4.76 browser ;)
> >
> > Yours too?  I turned off Java/Javascript to get it to load and I am on
> > BSD/OS.  Strange it so univerally crashes.
> 
> Really odd.  I have Java/Javascript with FreeBSD and Netscape 4.76 and
> read it just fine.  One difference tho probably, I keep style sheets
> shut off.  Netscape crashes about 1% as often as it used to.

This is getting off-topic, but ... 

I keep CSS, Javascript, Java, dynamic fonts, and images turned off, and
Netscape 4.77 stays up for many weeks at a time.  I also have no Flash 
plugin.  All together it makes for a far more pleasant web experience.

I didn't notice any problem with the Zend page.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Imperfect solutions

2001-05-31 Thread Nathan Myers

On Thu, May 31, 2001 at 10:07:36AM -0400, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > What got me thinking about this is that I don't think my gram.y fix
> > would be accepted given the current review process,
> 
> Not to put too fine a point on it: the project has advanced a long way
> since you did that code.  Our standards *should* be higher than they
> were then.
> 
> > and that is bad
> > because we would have to live with no LIKE optimization for 1-2 years
> > until we learned how to do it right.
> 
> We still haven't learned how to do it right, actually.  I think the
> history of the LIKE indexing problem is a perfect example of why fixes
> that work for some people but not others don't survive long.  We put out
> several attempts at making it work reliably in non-ASCII locales, but
> none of them have withstood the test of actual usage.
> 
> > I think there are a few rules we can use to decide how to deal with
> > imperfect solutions:
> 
> You forgot
> 
> * will the fix institutionalize user-visible behavior that will in the
>   long run be considered the wrong thing?
> 
> * will the fix contort new code that is written in the same vicinity,
>   thereby making it harder and harder to replace as time goes on?
> 
> The first of these is the core of my concern about %TYPE.

This list points up a problem that needs a better solution than a 
list: you have to put in questionable features now to get the usage 
experience you need to do it right later.  The set of prospective
features that meet that description does not resemble the set that
would pass all the criteria in the list.

This is really a familiar problem, with a familiar solution.  
When a feature is added that is "wrong", make sure it's "marked" 
somehow -- at worst, in the documentation, but ideally with a 
NOTICE or something when it's used -- as experimental.  If anybody 
complains later that when you ripped it out and redid it correctly, 
you broke his code, you can just laugh, and add, if you're feeling 
charitable, "experimental features are not to be depended on".

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: charin(), text_char() should return something else for empty input

2001-05-29 Thread Nathan Myers


On Mon, May 28, 2001 at 02:37:32PM -0400, Tom Lane wrote:
> I wrote:
> > I propose that both of these operations should return a space character
> > for an empty input string.  This is by analogy to space-padding as you'd
> > get with char(1).  Any objections?
> 
> An alternative approach is to make charin and text_char map empty
> strings to the null character (\0), and conversely make charout and
> char_text map the null character to empty strings.  charout already
> acts that way, in effect, since it has to produce a null-terminated
> C string.  This way would have the advantage that there would still
> be a reversible dump and reload representation for a "char" field
> containing '\0', whereas space-padding would cause such a field to
> become ' ' after reload.  But it's a little strange if you think that
> "char" ought to behave the same as char(1).

Does the standard require any particular behavior in with NUL 
characters?  I'd like to see PG move toward treating them as ordinary 
control characters.  I realize that at best it will take a long time 
to get there.  C is irretrievably mired in the "NUL is a terminator"
swamp, but SQL isn't C.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] BSD gettext

2001-05-24 Thread Nathan Myers

On Thu, May 24, 2001 at 10:30:01AM -0400, Bruce Momjian wrote:
> > The HPUX man page for mmap documents its failure return value as "-1",
> > so I hacked around this with
> > 
> > #ifndef MAP_FAILED
> > #define MAP_FAILED ((void *) (-1))
> > #endif
> > 
> > whereupon it built and passed the simple self-test you suggested.
> > However, I think it's pretty foolish to depend on mmap for such
> > little reason as this code does.  I suggest ripping out the mmap
> > usage and just reading the file with good old read(2).
> 
> Agreed.  Let read() use mmap() internally if it wants to.

The reason mmap() is faster than read() is that it can avoid copying 
data to the place you specify.  read() can "use mmap() internally" only 
in cases rare enough to hardly be worth checking for.  

Stdio is often able to use mmap() internally for parsing, and in 
glibc-2.x (and, I think, on recent Solarix and BSDs) it does.  Usually, 
therefore, it would be better to use stdio functions (except fread()!) 
in place of read(), where possible, to allow this optimization.

Using mmap() in place of disk read() almost always results in enough
performance improvement to make doing so worth a lot of disruption.
Today mmap() is used heavily enough, in important programs, that 
worries about unreliability are no better founded than worries about
read().

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] C++ Headers

2001-05-23 Thread Nathan Myers


On Wed, May 23, 2001 at 11:35:31AM -0400, Bruce Momjian wrote:
> > > > > We have added more const-ness to libpq++ for 7.2.
> > > > 
> > > > Breaking link compatibility without bumping the major version number
> > > > on the library seems to me serious no-no.
> > > > 
> > > > To const-ify member functions without breaking link compatibility,
> > > > you have to add another, overloaded member that is const, and turn
> > > > the non-const function into a wrapper.  For example:
> > > > 
> > > >   void Foo::bar() { ... }   // existing interface
> > > > 
> > > > becomes
> > > > 
> > > >   void Foo::bar() { ((const Foo*)this)->bar(); }   
> > > >   void Foo::bar() const { ... }   
> > > 
> > > Thanks.  That was my problem, not knowing when I break link compatiblity
> > > in C++.  Major updated.
> > 
> > Wouldn't it be better to add the forwarding function and keep
> > the same major number?  It's quite disruptive to change the
> > major number for what are really very minor changes.  Otherwise
> > you accumulate lots of near-copies of almost-identical libraries
> > to be able to run old binaries.
> > 
> > A major-number bump should usually be something planned for
> > and scheduled.
> 
> That const was just one of many const's added, and I am sure there will
> be more stuff happening to C++.  I changed a function returning short
> for tuple length to int.  Not worth mucking it up.
> 
> If it was just that one it would be OK.

I'll bet lots of people would like to see more careful planning about 
breaking link compatibility.  Other changes that break link compatibility 
include changing a struct or class referred to from inline functions, and 
adding a virtual function in a base class.

It's possible to make a lot of improvements without breaking link
compatibility, but it does take more care than in C.  If you wonder
whether a change would break link compatibility, please ask on the list.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] More pgindent follies

2001-05-23 Thread Nathan Myers

On Wed, May 23, 2001 at 11:58:51AM -0400, Bruce Momjian wrote:
> > > I don't see the problem here.  My assumption is that the comment is not
> > > part of the define, right?
> > 
> > Well, that's the question.  ANSI C requires comments to be replaced by
> > whitespace before preprocessor commands are detected/executed, but there
> > was an awful lot of variation in preprocessor behavior before ANSI.
> > I suspect there are still preprocessors out there that might misbehave
> > on this input --- for example, by leaving the text "* end-of-string */"
> > present in the preprocessor output.  Now we still go to considerable
> > lengths to support not-quite-ANSI preprocessors.  I don't like the idea
> > that all the work done by configure and c.h in that direction might be
> > wasted because of pgindent carelessness.
> 
> I agree, but in a certain sense, we would have found those compilers
> already.  This is not new behavour as far as I know, and clearly this
> would throw a compiler error.

This is good news!

Maybe this process can be formalized.  That is, each official release 
migh contain a source file with various "modern" constructs which we 
suspect might break old compilers.

A comment block at the top requests that any breakage be reported.

A configure option would allow a user to avoid compiling it, and a
comment in the file would explain how to use the option.  After a
major release, any modern construct that caused no trouble in the 
last release is considered OK to use.

This process makes it easy to leave behind obsolete language 
restrictions: if you wonder if it's OK now to use a feature that once 
broke some crufty platform, drop it in modern.c and forget about it.  
After the next release, you know the answer.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] C++ Headers

2001-05-22 Thread Nathan Myers


On Tue, May 22, 2001 at 05:52:20PM -0400, Bruce Momjian wrote:
> > On Tue, May 22, 2001 at 12:19:41AM -0400, Bruce Momjian wrote:
> > > > This in fact has happened within ECPG. But since sizeof(bool) is
> > > > passed to libecpg it was possible to figure out which 'bool' is
> > > > requested.
> > > >
> > > > Another issue of C++ compatibility would be cleaning up the
> > > > usage of 'const' declarations. C++ is really strict about
> > > > 'const'ness. But I don't know whether postgres' internal headers
> > > > would need such a cleanup. (I suspect that in ecpg there is an
> > > > oddity left with respect to host variable declaration. I'll
> > > > check that later)
> > >
> > > We have added more const-ness to libpq++ for 7.2.
> > 
> > Breaking link compatibility without bumping the major version number
> > on the library seems to me serious no-no.
> > 
> > To const-ify member functions without breaking link compatibility,
> > you have to add another, overloaded member that is const, and turn
> > the non-const function into a wrapper.  For example:
> > 
> >   void Foo::bar() { ... }   // existing interface
> > 
> > becomes
> > 
> >   void Foo::bar() { ((const Foo*)this)->bar(); }   
> >   void Foo::bar() const { ... }   
> 
> Thanks.  That was my problem, not knowing when I break link compatiblity
> in C++.  Major updated.

Wouldn't it be better to add the forwarding function and keep
the same major number?  It's quite disruptive to change the
major number for what are really very minor changes.  Otherwise
you accumulate lots of near-copies of almost-identical libraries
to be able to run old binaries.

A major-number bump should usually be something planned for
and scheduled.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] C++ Headers

2001-05-22 Thread Nathan Myers


On Tue, May 22, 2001 at 12:19:41AM -0400, Bruce Momjian wrote:
> > This in fact has happened within ECPG. But since sizeof(bool) is passed to
> > libecpg it was possible to figure out which 'bool' is requested.
> > 
> > Another issue of C++ compatibility would be cleaning up the usage of
> > 'const' declarations. C++ is really strict about 'const'ness. But I don't
> > know whether postgres' internal headers would need such a cleanup. (I
> > suspect that in ecpg there is an oddity left with respect to host variable
> > declaration. I'll check that later)
> 
> We have added more const-ness to libpq++ for 7.2.

Breaking link compatibility without bumping the major version number
on the library seems to me serious no-no.

To const-ify member functions without breaking link compatibility,
you have to add another, overloaded member that is const, and turn
the non-const function into a wrapper.  For example:

  void Foo::bar() { ... }   // existing interface

becomes

  void Foo::bar() { ((const Foo*)this)->bar(); }   
  void Foo::bar() const { ... }   

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Plans for solving the VACUUM problem

2001-05-18 Thread Nathan Myers


On Fri, May 18, 2001 at 06:10:10PM -0700, Mikheev, Vadim wrote:
> > Vadim, can you remind me what UNDO is used for?
> 
> Ok, last reminder -:))
> 
> On transaction abort, read WAL records and undo (rollback)
> changes made in storage. Would allow:
> 
> 1. Reclaim space allocated by aborted transactions.
> 2. Implement SAVEPOINTs.
>Just to remind -:) - in the event of error discovered by server
>- duplicate key, deadlock, command mistyping, etc, - transaction
>will be rolled back to the nearest implicit savepoint setted
>just before query execution; - or transaction can be aborted by
>ROLLBACK TO  command to some explicit savepoint
>setted by user. Transaction rolled back to savepoint may be continued.
> 3. Reuse transaction IDs on postmaster restart.
> 4. Split pg_log into small files with ability to remove old ones (which
>do not hold statuses for any running transactions).

I missed the original discussions; apologies if this has already been
beaten into the ground.  But... mightn't sub-transactions be a 
better-structured way to expose this service?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Upgrade issue (again).

2001-05-18 Thread Nathan Myers

On Thu, May 17, 2001 at 12:43:49PM -0400, Rod Taylor wrote:
> Best way to upgrade might bee to do something as simple as get the
> master to master replication working.

Master-to-master replication is not simple, and (fortunately) isn't 
strictly necessary.  The minimal sequence is,

1. Start a backup and a redo log at the same time.
2. Start the new database and read the backup.
3. Get the new database consuming the redo logs.
4. When the new database catches up, make it a hot failover for the old.
5. Turn off the old database and fail over.

The nice thing about this approach is that all the parts used are 
essential parts of an enterprise database anyway, regardless of their 
usefulness in upgrading.  

Master-to-master replication is nice for load balancing, but not
necessary for failover.  Its chief benefit, there, is that you wouldn't 
need to abort the uncompleted transactions on the old database when 
you make the switch.  But master-to-master replication is *hard* to
make work, and intrusive besides.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] storage density

2001-05-18 Thread Nathan Myers



When organizing available free storage for re-use, we will probably have
a choice whether to favor using space in (mostly-) empty blocks, or in 
mostly-full blocks.  Empty and mostly-empty blocks are quicker -- you 
can put lots of rows in them before they fill up and you have to choose 
another.   Preferring mostly-full blocks improves active-storage and 
cache density because a table tends to occupy fewer total blocks.

Does anybody know of papers that analyze the tradeoffs involved?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: "End-to-end" paper

2001-05-17 Thread Nathan Myers

On Thu, May 17, 2001 at 06:04:54PM +0800, Lincoln Yeoh wrote:
> At 12:24 AM 17-05-2001 -0700, Nathan Myers wrote:
> >
> >For those of you who have missed it, here
> >
> 
>>http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+end&hl=en
> >
> >is the paper some of us mention, "END-TO-END ARGUMENTS IN SYSTEM DESIGN"
> >by Saltzer, Reed, and Clark.
> >
> >The abstract is:
> >
> >This paper presents a design principle that helps guide placement
> >of functions among the modules of a distributed computer system.
> >The principle, called the end-to-end argument, suggests that
> >functions placed at low levels of a system may be redundant or
> >of little value when compared with the cost of providing them
> >at that low level. Examples discussed in the paper include
> >bit error recovery, security using encryption, duplicate
> >message suppression, recovery from system crashes, and delivery
> >acknowledgement. Low level mechanisms to support these functions
> >are justified only as performance enhancements.
> >
> >It was written in 1981 and is undiminished by the subsequent decades.
>
> Maybe I don't understand the paper.

Yes.  It bears re-reading.

> The end-to-end argument might be true if taking the monolithic
> approach. I find more useful ideas gleaned from the RFCs, TCP/IP and
> the OSI 7 layer model: modularity, "useful standard interfaces", "Be
> liberal in what you accept, and conservative in what you send" and so
> on.

The end-to-end principle has had profound effects on the design of 
Internet protocols, perhaps most importantly in keeping them simpler 
than OSI's.

> Within a module I figure the end to end argument might hold,

The end-to-end principle isn't particularly applicable within a module.
It's a system-design principle. Its prescription for individual modules
is: don't imagine that anybody else gets much value from your complex
error recovery shenanigans; they have to do their own error recovery
anyway. You provide more value by making a good effort.

> but the author keeps talking about networks and networking.

Of course networking is just an example, but it's a particularly
good example. Data storage (e.g. disk) is another good example; in
the context of the paper it may be thought of as a mechanism for
communicating with other (later) times. The point there is that the CRCs
and ECC performed by the disk are not sufficient to ensure reliability
for the system (e.g. database service); for that, end-to-end measures
such as hot-failover, backups, redo logs, and block- or record-level
CRCs are needed. The purpose of the disk CRCs is not reliability, a job
they cannot do alone, but performance: they help make the need to use
the backups and redo logs infrequent enough to be tolerable.

> SSL and TCP are useful. The various CRC checks down the IP stack to
> the datalink layer have their uses too.

Yes, of course they are useful. The authors say so in the paper, and
they say precisely how (and how not).

> By splitting stuff up at appropriate points, adding or substituting
> objects at various layers becomes so much easier. People can download
> Postgresql over token ring, Gigabit ethernet, X.25 and so on.

As noted in the paper, the principle is most useful in helping to decide
what goes in each layer.

> Splitting stuff up does mean that the bits and pieces now do have
> a certain responsibility. If those responsibilities involve some
> redundancies in error checking or encryption or whatever, so be
> it, because if done well people can use those bits and pieces in
> interesting ways never dreamed of initially.
>
> For example SSL over TCP over IPSEC over encrypted WAP works (even
> though IPSEC is way too complicated :)). There's so much redundancy
> there, but at the same time it's not a far fetched scenario - just
> someone ordering online on a notebook pc.

The authors quote a similar example in the paper, even though it was
written twenty years ago.

> But if a low level module never bothered with error
> correction/detection/handling or whatever and was optimized for
> an application specific purpose, it's harder to use it for other
> purposes. And if you do, some chap could post an article to Bugtraq on
> it, mentioning exploit, DoS or buffer overflow.

The point is that leaving that stuff _out_ is how you keep low-level
mechanisms useful for a variety of purposes. Putting in complicated
error-recovery stuff might suit it better for a particular application,
but make it less suitable for others.

This is why, at the IP layer, packets get tossed at the first sign of
congestion. It's why

[HACKERS] "End-to-end" paper

2001-05-17 Thread Nathan Myers



For those of you who have missed it, here

http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+end&hl=en

is the paper some of us mention, "END-TO-END ARGUMENTS IN SYSTEM DESIGN"
by Saltzer, Reed, and Clark.

The abstract is:

This paper presents a design principle that helps guide placement of
functions among the modules of a distributed computer system. The
principle, called the end-to-end argument, suggests that functions
placed at low levels of a system may be redundant or of little value
when compared with the cost of providing them at that low level.
Examples discussed in the paper include bit error recovery, security
using encryption, duplicate message suppression, recovery from
system crashes, and delivery acknowledgement. Low level mechanisms
to support these functions are justified only as performance
enhancements.

It was written in 1981 and is undiminished by the subsequent decades.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Configurable path to look up dynamic libraries

2001-05-15 Thread Nathan Myers


On Tue, May 15, 2001 at 05:53:36PM -0400, Bruce Momjian wrote:
> > But, if I may editorialize a little myself, this is just indicative of a 
> > 'Fortress PostgreSQL' attitude that is easy to get into.  'We've always
> 
> I have to admit I like the sound of 'Fortress PostgreSQL'.  :-)

Ye Olde PostgreSQL Shoppe
The PostgreSQL of Giza
Our Lady of PostgreSQL, Ascendant
PostgreSQL International Airport
PostgreSQL Galactica
PostgreSQL's Tavern

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] tables/indexes/logs on different volumes

2001-04-25 Thread Nathan Myers


On Wed, Apr 25, 2001 at 09:41:57AM -0300, The Hermit Hacker wrote:
> On Tue, 24 Apr 2001, Nathan Myers wrote:
> 
> > On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote:
> > > I have a Dual-866, 1gig of RAM and strip'd file systems ... this past
> > > week, I've hit many times where CPU usage is 100%, RAM is 500Meg free
> > > and disks are pretty much sitting idle ...
> >
> > Assuming "strip'd" above means "striped", it strikes me that you
> > might be much better off operating the drives independently, with
> > the various tables, indexes, and logs scattered each entirely on one
> > drive.
> 
> have you ever tried to maintain a database doing this?  PgSQL is
> definitely not designed for this sort of setup, I had symlinks going
> everywhere, and with the new numbering schema, this is even more 
> difficult to try and do :)

Clearly you need to build a tool to organize it.  It would help a lot if 
PG itself could provide some basic assistance, such as calling a stored
procedure to generate the pathname of the file.

Has there been any discussion of anything like that?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Cursor support in pl/pg

2001-04-25 Thread Nathan Myers


Now that 7.1 is safely in the can, is it time to consider
this patch?  It provides cursor support in PL.

  http://www.airs.com/ian/postgresql-cursor.patch

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Re: refusing connections based on load ...

2001-04-24 Thread Nathan Myers

On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote:
> I have a Dual-866, 1gig of RAM and strip'd file systems ... this past
> week, I've hit many times where CPU usage is 100%, RAM is 500Meg free and
> disks are pretty much sitting idle ...

Assuming "strip'd" above means "striped", it strikes me that you
might be much better off operating the drives independently, with
the various tables, indexes, and logs scattered each entirely on one 
drive.  That way the heads can move around independently reading and 
writing N blocks, rather than all moving in concert reading or writing 
only one block at a time.  (Striping the WAL file on a couple of raw 
devices might be a good idea along with the above.  Can we do that?)

But of course speculation is much less useful than trying it.  Some 
measurements before and after would be really, really interesting
to many of us.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: refusing connections based on load ...

2001-04-23 Thread Nathan Myers

On Tue, Apr 24, 2001 at 12:39:29PM +0800, Lincoln Yeoh wrote:
> At 03:09 PM 23-04-2001 -0300, you wrote:
> >Basically, if great to set max clients to 256, but if load hits 50 
> >as a result, the database is near to useless ... if you set it to 256, 
> >and 254 idle connections are going, load won't rise much, so is safe, 
> >but if half of those processes are active, it hurts ...
> 
> Sorry, but I still don't understand the reasons why one would want to do
> this. Could someone explain?
> 
> I'm thinking that if I allow 256 clients, and my hardware/OS bogs down
> when 60 users are doing lots of queries, I either accept that, or
> figure that my hardware/OS actually can't cope with that many clients
> and reduce the max clients or upgrade the hardware (or maybe do a
> little tweaking here and there).
>
> Why not be more deterministic about refusing connections and stick
> to reducing max clients? If not it seems like a case where you're
> promised something but when you need it, you can't have it.

The point is that "number of connections" is a very poor estimate of 
system load.  Sometimes a connection is busy, sometimes it's not.
Some connections are busy, some are not.  The goal is maximum 
throughput or some tradeoff of maximum throughput against latency.  
If system throughput varies nonlinearly with load (as it almost 
always does) then this happens at some particular load level.

Refusing a connection and letting the client try again later can be 
a way to maximize throughput by keeping the system at the optimum 
point.  (Waiting reduces delay.  Yes, this is counterintuitive, but 
why do we queue up at ticket windows?)

Delaying response, when under excessive load, to clients who already 
have a connection -- even if they just got one -- can have a similar 
effect, but with finer granularity and with less complexity in the 
clients.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] refusing connections based on load ...

2001-04-23 Thread Nathan Myers


On Mon, Apr 23, 2001 at 10:50:42PM -0400, Tom Lane wrote:
> Basically, if we do this then we are abandoning the notion that Postgres
> runs as an unprivileged user.  I think that's a BAD idea, especially in
> an environment that's open enough that you might feel the need to
> load-throttle your users.  By definition you do not trust them, eh?

No.  It's not a case of trust, but of providing an adaptive way
to keep performance reasonable.  The users may have no independent
way to cooperate to limit load, but the DB can provide that.

> A less dangerous way of approaching it might be to have an option
> whereby the postmaster invokes 'uptime' via system() every so often
> (maybe once a minute?) and throttles on the basis of the results.
> The reaction time would be poorer, but security would be a whole lot
> better.

Yes, this alternative looks much better to me.  On Linux you have
the much more efficient alternative, /proc/loadavg.  (I wouldn't
use system(), though.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] refusing connections based on load ...

2001-04-23 Thread Nathan Myers

On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote:
> 
> Anyone thought of implementing this, similar to how sendmail does it?  If
> load > n, refuse connections?
> ... 
> If nobody is working on something like this, does anyone but me feel that
> it has merit to make use of?  I'll play with it if so ...

I agree that it would be useful.  Even more useful would be soft load 
shedding, where once some load average level is exceeded the postmaster 
delays a bit (proportionately) before accepting a connection.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: Is it possible to mirror the db in Postgres?

2001-04-20 Thread Nathan Myers

On Fri, Apr 20, 2001 at 04:53:43PM -0700, G. Anthony Reina wrote:
> Nathan Myers wrote:
> 
> > Does the replication have to be reliable?  Are you equipped to
> > reconcile databases that have got out of sync, when it's not?  
> > Will the different labs ever try to update the same existing 
> > record, or insert conflicting (unique-key) records?
> 
> (1) Yes, of course.  (2) Willing--yes; equipped--dunno.   (3) Yes,
> probably.

Hmm, good luck.  Replication, by itself, is not hard, but it's only
a tiny part of the job.  Most of the job is in handling failures
and conflicts correctly, for some (usually enormous) definition of
"correctly".

> > Reliable WAN replication is harder.  Most of the proprietary database
> > companies will tell you they can do it, but their customers will tell
> > you they can't.
> 
> Joel Burton suggested the rserv utility. I don't know how well it would
> work over a wide network.

The point about WANs is that things which work nicely in the lab, on a 
LAN, behave very differently when the communication medium is, like the 
Internet, only fitfully reliable.  You will tend to have events occurring
in unexpected order, and communications lost, and queues topping over, 
and conflicting entries in different instances which you must somehow 
reconcile after the fact.  Reconciliation by shipping the whole database 
across the WAN is often impractical, particularly when you're trying to
use it at the same time.

WAN replication is an important part of Zembu's business, and it's hard.
I would expect the rserv utility (about which I admit I know little) not
to have been designed for the job.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Is it possible to mirror the db in Postgres?

2001-04-20 Thread Nathan Myers

On Fri, Apr 20, 2001 at 03:33:38PM -0700, G. Anthony Reina wrote:
> We use Postgres 7.0.3 to store data for our scientific research. We have
> two other labs in St. Louis, MO and Tempe, AZ. I'd like to see if
> there's a way for them to mirror our database. They would be able to
> update our database when they received new results and we would be able
> to update theirs. So, in effect, we'd have 3 copies of the same db. Each
> copy would be able to update the other.
> 
> Any thoughts on if this is possible?

Does the replication have to be reliable?  Are you equipped to
reconcile databases that have got out of sync, if not?  Will the
different labs ever try to update the same existing record, or
insert conflicting (unique-key) records?

Symmetric replication is easy or impossible, but usually somewhere 
in between, depending on many details.  Usually when it's made to
work, it runs on a LAN.  

Reliable WAN replication is harder.  Most of the proprietary database 
companies will tell you they can do it, but their customers will tell 
you they can't.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] timeout on lock feature

2001-04-18 Thread Nathan Myers


On Wed, Apr 18, 2001 at 09:39:39PM -0400, Bruce Momjian wrote:
> > On Wed, Apr 18, 2001 at 07:33:24PM -0400, Bruce Momjian wrote:
> > > > What might be a reasonable alternative would be a BEGIN timeout:
> > > > report failure as soon as possible after N seconds unless the
> > > > timer is reset, such as by a commit. Such a timeout would be
> > > > meaningful at the database-interface level. It could serve as a
> > > > useful building block for application-level timeouts when the
> > > > client environment has trouble applying timeouts on its own.
> > > 
> > > Now that is a nifty idea. Just put it on one command, BEGIN, and
> > > have it apply for the whole transaction. We could just set an
> > > alarm and do a longjump out on timeout.
> > 
> > Of course, it begs the question why the client couldn't do that
> > itself, and leave PG out of the picture.  But that's what we've 
> > been talking about all along.
> 
> Yes, they can, but of course, they could code the database in the
> application too.  It is much easier to put the timeout in a psql script
> than to try and code it.

Good: add a timeout feature to psql.  

There's no limit to what features you might add to the database 
core once you decide that new features need have nothing to do with 
databases.  Why not (drum roll...) deliver e-mail?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] timeout on lock feature

2001-04-18 Thread Nathan Myers


On Wed, Apr 18, 2001 at 07:33:24PM -0400, Bruce Momjian wrote:
> > What might be a reasonable alternative would be a BEGIN timeout: report 
> > failure as soon as possible after N seconds unless the timer is reset, 
> > such as by a commit.  Such a timeout would be meaningful at the 
> > database-interface level.  It could serve as a useful building block 
> > for application-level timeouts when the client environment has trouble 
> > applying timeouts on its own.
> 
> Now that is a nifty idea.  Just put it on one command, BEGIN, and have
> it apply for the whole transaction.  We could just set an alarm and do a
> longjump out on timeout.

Of course, it begs the question why the client couldn't do that
itself, and leave PG out of the picture.  But that's what we've 
been talking about all along.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] timeout on lock feature

2001-04-18 Thread Nathan Myers

On Wed, Apr 18, 2001 at 09:54:11AM +0200, Zeugswetter Andreas SB wrote:
> > > In short, I think lock timeout is a solution searching in vain for a
> > > problem.  If we implement it, we are just encouraging bad application
> > > design.
> > 
> > I agree with Tom completely here.
> > 
> > In any real-world application the database is the key component of a 
> > larger system: the work it does is the most finicky, and any mistakes
> > (either internally or, more commonly, from misuse) have the most 
> > far-reaching consequences.  The responsibility of the database is to 
> > provide a reliable and easily described and understood mechanism to 
> > build on.
> 
> It is not something that makes anything unrelyable or less robust.
> It is also simple: "I (the client) request that you (the backend) 
> dont wait for any lock longer than x seconds"

Many things that are easy to say have complicated consequences.

> > Timeouts are a system-level mechanism that to be useful must refer to 
> > system-level events that are far above anything that PG knows about.
> 
> I think you are talking about different kinds of timeouts here.  

Exactly.  I'm talking about useful, meaningful timeouts, not random
timeouts attached to invisible events within the database.

> > The only way PG could apply reasonable timeouts would be for the 
> > application to dictate them, 
> 
> That is exactly what we are talking about here.

No.  You wrote elsewhere that the application sets "30 seconds" and
leaves it.  But that 30 seconds doesn't have any application-level
meaning -- an operation could take twelve hours without tripping your
30-second timeout.  For the application to dictate the timeouts
reasonably, PG would have to expose all its lock events to the client
and expect it to deduce how they affect overall behavior.

> > but the application can better implement them itself.
> 
> It can, but it makes the program more complicated (needs timers 
> or threads, which violates your last statement "simplest interface".

It is good for the program to be more complicated if it is doing a 
more complicated thing -- if it means the database may remain simple.  
People building complex systems have an even greater need for simple
components than people building little ones.

What might be a reasonable alternative would be a BEGIN timeout: report 
failure as soon as possible after N seconds unless the timer is reset, 
such as by a commit.  Such a timeout would be meaningful at the 
database-interface level.  It could serve as a useful building block 
for application-level timeouts when the client environment has trouble 
applying timeouts on its own.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CRN article not updated

2001-04-18 Thread Nathan Myers

On Wed, Apr 18, 2001 at 02:22:48PM -0400, Bruce Momjian wrote:
> I just checked the CRN PostgreSQL article at:
> 
>http://www.crn.com/Sections/Fast_Forward/fast_forward.asp?ArticleID=25670
> 
> I see no changes to the article, even though Vince our webmaster, Geoff
> Davidson of PostgreSQL, Inc, and Dave Mele of Great Bridge have
> requested it be fixed.  

If _you_ had been deluged with that kind of vitriol, what kind of favors 
would you feel like doing?

> Not sure what we can do now.

It's too late.  "We" screwed it up.  (Thanks again, guys.)
The responses have done far more lasting damage than any article 
could ever have done.  The horse is dead.  

The best we can do is to plan for the future.  

1. What happens the next time a slightly inaccurate article is published? 
2. What happens when an openly hostile article is published?

Will our posse ride off again with guns blazing, making more enemies?  
Will they make us all look to potential users like a bunch of hotheaded, 
childish nobodies?

Or will we have somebody appointed, already, to write a measured,
rational, mature clarification?  Will we have articles already written,
and handed to more responsible reporters, so that an isolated badly-done 
article can do little damage?

We're not even on Oracle's radar yet.  When PG begins to threaten their 
income, their marketing department will go on the offensive.  Oracle 
marketing is very, very skillful, and very, very nasty.  If they find 
that by seeding the press with reasonable-sounding criticisms of PG, 
they can prod the PG community into making itself look like idiots, 
they will go to town on it.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] timeout on lock feature

2001-04-17 Thread Nathan Myers

On Tue, Apr 17, 2001 at 12:56:11PM -0400, Tom Lane wrote:
> In short, I think lock timeout is a solution searching in vain for a
> problem.  If we implement it, we are just encouraging bad application
> design.

I agree with Tom completely here.

In any real-world application the database is the key component of a 
larger system: the work it does is the most finicky, and any mistakes
(either internally or, more commonly, from misuse) have the most 
far-reaching consequences.  The responsibility of the database is to 
provide a reliable and easily described and understood mechanism to 
build on.  

Timeouts are a system-level mechanism that to be useful must refer to 
system-level events that are far above anything that PG knows about.  
The only way PG could apply reasonable timeouts would be for the 
application to dictate them, but the application can better implement 
them itself.

You can think of this as another aspect of the "end-to-end" principle: 
any system-level construct duplicated in a lower-level system component 
can only improve efficiency, not provide the corresponding high-level 
service.  If we have timeouts in the database, they should be there to
enable the database to better implement its abstraction, and not pretend 
to be a substitute for system-level timeouts.

There's no upper limit on how complicated a database interface can
become (cf. Oracle).  The database serves its users best by having 
the simplest interface that can possibly provide the needed service. 

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Another news story in need of 'enlightenment'

2001-04-17 Thread Nathan Myers


On Tue, Apr 17, 2001 at 01:31:43PM -0400, Lamar Owen wrote:
> This one probably needs the 'iron hand and the velvet paw' touch.  The
> iron hand to pound some sense into the author, and the velvet paw to
> make him like having sense pounded into him. Title of article is 'Open
> Source Databases Won't Fly' --
> http://www.dqindia.com/content/enterprise/datawatch/101041201.asp

This one is best just ignored.  

It's content-free, just a his frightened opinions.  The only thing 
that will change his mind is the improvements planned for releases 
7.2 and 7.3, and lots of deployments.  Few will read his rambling.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: Hey guys, check this out.

2001-04-15 Thread Nathan Myers

On Sun, Apr 15, 2001 at 10:05:46PM -0400, Vince Vielhaber wrote:
> On Mon, 16 Apr 2001, Lincoln Yeoh wrote:
> 
> > Maybe you guys should get some Great Bridge marketing/PR person to handle
> > stuff like this.
> 
> After reading Ned's comments I figured that's how it got that way in
> the first place.  But that's just speculation.

You probably figured wrong.  

All those publications have editors who generally feel they're not 
doing their job if they don't introduce errors, usually without even 
talking to the reporter.  That's probably how the "FreeBSD" reference 
got in there: somebody saw "Berkeley" and decided "FreeBSD" would look 
more "techie".  It's stupid, but nothng to excoriate the reporter about.

Sam Williams's articles read completely differently according to 
who publishes them.  Typically the Linux magazines print what he 
writes, and thereby get it mostly right, but the finance magazines 
mangle them to total nonsense.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Fast Forward (fwd)

2001-04-15 Thread Nathan Myers

On Sun, Apr 15, 2001 at 11:44:48AM -0300, The Hermit Hacker wrote:
> On Sat, 14 Apr 2001, Nathan Myers wrote:
> 
> > This is probably a good time to point out that this is the _worst_
> > _possible_ response to erroneous reportage.  The perception by readers
> > will not be that the reporter failed, but that PostgreSQL advocates
> > are rabid weasels who don't appreciate favorable attention, and are
> 
> favorable attention??

Yes, totally favorable.   There wasn't a hint of the condescension 
typically accorded free software.  All of the details you find so 
objectionable (April vs. June?  "The" marketing arm vs. "a" marketing
arm?) would not even be noticed by a non-cultist.

> > dangerous to write anything about.  You can bet this reporter and her
> > editor will treat the topic very circumspectly (i.e. avoid it) in the
> > future.
> 
> woo hoo, if that is the result, then I think Vince did us a great service,
> not dis-service ...

False.  

This may have been the reporter's and the editor's first direct
exposure to free software advocates.  You guys came across as 
hate-filled religious whackos, and that reflects on all of us.  

> > Most reporters are ignorant, most reporters are lazy, and many are
> > both.  It's part of the job description.  Getting angry about it is
> > like getting angry at birds for fouling their cage.  Their job is to
> > regurgitate what they're given, and quickly.  They have no time to
> > learn the depths, or to write coherently about it, or even to check
> > facts.
> 
> Out of all the articles on PgSQL that I've read over the years, this one
> should have been shot before it hit the paper (so to say) ... it was the
> most blatantly inaccurate article I've ever read ...

It had a number of minor errors, easily corrected.  The next will 
probably talk about what a bunch of nasty cranks and lunatics 
PostgreSQL fans are, unless you who wrote can display a lot more 
finesse in your apologies.  Thanks a lot, guys.

> > It will be harder than the original mailings, but I urge each who
> > wrote to write again and apologize for attacking her.
> 
> In a way, I think you are right .. I think the attack was aimed at the
> wrong ppl :(  She obviously didn't get *any* of her information from ppl
> that belong *in* the Pg community, or that have any knowledge of how it
> works, or of its history :(

How is this reporter going to have developed contacts within the 
community?  She has just started.  Now you've burnt her to a crisp, 
and she will figure the less contact with that "community" she has, 
the happier she'll be.  Her editor will know that mentioning PG in
any context will result in a raft of hate mail from cranks, and will 
treat press releases from our community with the scorn they have earned.

Reporters are fragile creatures, and must be gently guided toward the
light.  They will always get facts wrong, but that matter not at all.
The overall tone of the writing is the only thing that stays with their
equally dim audience.  That dim audience controls the budgets for 
technology deployment, including databases.  Next time you propose a
deployment on PG instead of Oracle, thank Vince et al. when it's 
dismissed as a crank toy.

Finally, their talkback page was most probably implemented _not_ with 
MySQL, but with MS SQL Server.  These intramural squabbles (between 
MySQL and PG, between Linux and BSD, between NetBSD and OpenBSD) are 
justifiably seen as pathetic in the outside world.  Respectful attention 
among projects doesn't just create a better impression, it also allows 
you, maybe, to learn something.  (MySQL is not objectively as good as 
PG, but those guys are doing something right, in their presentation, 
that some of us could learn from.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Fast Forward (fwd)

2001-04-14 Thread Nathan Myers

On Sun, Apr 15, 2001 at 01:17:15AM -0400, Vince Vielhaber wrote:
> 
> Here's my response to the inaccurate article cmp produced.  After
> chatting with Marc I decided to post it myself.
> ... 
> Where do you get your info?  Do you just make it up?  PostgreSQL is
> not a product of Great Bridge and never has been.  It's 100% independant.
> Is Linux a keyword you figure you can use to draw readers?  Won't take
> long before folks determine you're full of it.  The PostgreSQL team takes
> great pride (not to be confused with great bridge) in ensuring that the
> work we do runs on ALL platforms; be it Mac's OSX, FreeBSD 4.3, or even
> Windows 2000.  So why do you figure this is a Great Bridge product?  Why
> do you figure it's Linux only?  What is it with you writers lately?  Are
> you getting lazy and simply using Linux as a quick out for a paycheck?

This is probably a good time to point out that this is the _worst_
_possible_ response to erroneous reportage.  The perception by readers
will not be that the reporter failed, but that PostgreSQL advocates are 
rabid weasels who don't appreciate favorable attention, and are dangerous 
to write anything about.  You can bet this reporter and her editor will 
treat the topic very circumspectly (i.e. avoid it) in the future.  
When they have to mention it, their reporting will be colored by their 
personal experience.  They (and their readers) don't run the code, 
so they must get their impressions from those who do.  

Most reporters are ignorant, most reporters are lazy, and many
are both.  It's part of the job description.  Getting angry about
it is like getting angry at birds for fouling their cage.  Their
job is to regurgitate what they're given, and quickly.  They have no 
time to learn the depths, or to write coherently about it, or even 
to check facts.

None of the errors in the article matter.  Nobody will develop an
enduring impression of PG from them.  What matters is that PG is being 
mentioned in the same article with Oracle.  In her limited way, she
did the PG community the biggest favor in her limited power, and all 
we can do is attack?

It will be harder than the original mailings, but I urge each who
wrote to write again and apologize for attacking her.  Thank her 
graciously for making an effort, and offer to help her check her 
facts next time.  PostgreSQL needs friends in the press, even if
they are ignorant or lazy.  It doesn't need any enemies in the press.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Anyone have any good addresses ... ?

2001-04-13 Thread Nathan Myers


On Fri, Apr 13, 2001 at 06:32:26PM -0400, Trond Eivind Glomsr?d wrote:
> The Hermit Hacker <[EMAIL PROTECTED]> writes:
> 
> > Here is what we've always sent to to date ... anyone have any good ones
> > to add?
> > 
> > 
> > Addresses : [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED],
> > [EMAIL PROTECTED]
> 
> Freshmeat, linuxtoday. If the release includes RPMs for Red Hat Linux,
> redhat-announce is also a suitable location.

Linux Journal: [EMAIL PROTECTED]
Freshmeat:  [EMAIL PROTECTED]
LinuxToday: http://linuxtoday.com/contribute.php3

-- 
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Truncation of object names

2001-04-13 Thread Nathan Myers

On Fri, Apr 13, 2001 at 04:27:15PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > We are thinking about working around the name length limitation 
> > (encountered in migrating from other dbs) by allowing "foo.bar.baz" 
> > name syntax, as a sort of rudimentary namespace mechanism.
> 
> Have you thought about simply increasing NAMEDATALEN in your
> installation?  If you really are generating names that aren't unique
> in 31 characters, that seems like the way to go ...

We discussed that, and will probably do it (too).

One problem is that, having translated "foo.bar.baz" to "foo_bar_baz", 
you have a problem when you encounter "foo.bar_baz" in subsequent code.
I.e., a separate delimiter character helps, even when name length isn't 
an issue.  Also, accepting the names as they appear in the source code 
already means the number of changes needed is much smaller, even when
you don't have true schema support.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Truncation of object names

2001-04-13 Thread Nathan Myers


On Fri, Apr 13, 2001 at 02:54:47PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > Sorry, false alarm.  When I got the test case, it turned out to
> > be the more familiar problem:
> 
> >   create table foo_..._bar1 (id1 ...);
> > [notice, "foo_..._bar1" truncated to "foo_..._bar"]
> >   create table foo_..._bar (id2 ...);
> > [error, foo_..._bar already exists]
> >   create index foo_..._bar_ix on foo_..._bar(id2);
> > [notice, "foo_..._bar_ix" truncated to "foo_..._bar"]
> > [error, foo_..._bar already exists]
> > [error, attribute "id2" not found]
> 
> > It would be more helpful for the first "create" to fail so we don't 
> > end up cluttered with objects that shouldn't exist, and which interfere
> > with operations on objects which should.
> 
> Seems to me that if you want a bunch of CREATEs to be mutually
> dependent, then you wrap them all in a BEGIN/END block.

Yes, but...  The second and third commands weren't supposed to be 
related to the first at all, never mind dependent on it.  They were 
made dependent by PG crushing the names together.

We are thinking about working around the name length limitation 
(encountered in migrating from other dbs) by allowing "foo.bar.baz" 
name syntax, as a sort of rudimentary namespace mechanism.  It ain't
schemas, but it's better than "foo__bar__baz".

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] Truncation of object names

2001-04-13 Thread Nathan Myers


On Fri, Apr 13, 2001 at 01:16:43AM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > We have noticed here also that object (e.g. table) names get truncated 
> > in some places and not others.  If you create a table with a long name, 
> > PG truncates the name and creates a table with the shorter name; but 
> > if you refer to the table by the same long name, PG reports an error.
> 
> Example please?  This is clearly a bug.  

Sorry, false alarm.  When I got the test case, it turned out to
be the more familiar problem:

  create table foo_..._bar1 (id1 ...);
[notice, "foo_..._bar1" truncated to "foo_..._bar"]
  create table foo_..._bar (id2 ...);
[error, foo_..._bar already exists]
  create index foo_..._bar_ix on foo_..._bar(id2);
[notice, "foo_..._bar_ix" truncated to "foo_..._bar"]
[error, foo_..._bar already exists]
[error, attribute "id2" not found]

It would be more helpful for the first "create" to fail so we don't 
end up cluttered with objects that shouldn't exist, and which interfere
with operations on objects which should.

But I'm not proposing that for 7.1.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: Hand written parsers

2001-04-12 Thread Nathan Myers

On Wed, Apr 11, 2001 at 10:44:59PM -0700, Ian Lance Taylor wrote:
> Mark Butler <[EMAIL PROTECTED]> writes:
> > ...
> > The advantages of using a hand written recursive descent parser lie in
> > 1) ease of implementing grammar changes 
> > 2) ease of debugging
> > 3) ability to handle unusual cases
> > 4) ability to support context sensitive grammars
> > ...
> > Another nice capability is the ability to enable and disable grammar
> > rules at run time ...
>
> On the other hand, recursive descent parsers tend to be more ad hoc,
> they tend to be harder to maintain, and they tend to be less
> efficient.  ...  And I note that despite the
> difficulties, the g++ parser is yacc based.

Yacc and yacc-like programs are most useful when the target grammar (or 
your understanding of it) is not very stable.  With Yacc you can make 
sweeping changes much more easily; big changes can be a lot of work in 
a hand-coded parser.  Once your grammar stabilizes, though, hand coding 
can provide flexibility that is inconceivable in a parser generator, 
albeit at some cost in speed and compact description.  (I doubt parser 
speed is an issue for PG.)

G++ has flirted seriously with switching to a recursive-descent parser,
largely to be able to offer meaningful error messages and to recover
better from errors, as well as to be able to parse some problematic
but conformant (if unlikely) programs.

Note that the choice is not just between Yacc and a hand-coded parser.
Since Yacc, many more powerful parser generators have been released,
one of which might be just right for PG.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Truncation of char, varchar types

2001-04-09 Thread Nathan Myers

On Mon, Apr 09, 2001 at 09:20:42PM +0200, Peter Eisentraut wrote:
> Excessively long values are currently silently truncated when they are
> inserted into char or varchar fields.  This makes the entire notion of
> specifying a length limit for these types kind of useless, IMO.  Needless
> to say, it's also not in compliance with SQL.
> 
> How do people feel about changing this to raise an error in this
> situation?  Does anybody rely on silent truncation?  Should this be
> user-settable, or can those people resort to using triggers?

Yes, detecting and reporting errors early is a Good Thing.  You don't 
do anybody any favors by pretending to save data, but really throwing 
it away.

We have noticed here also that object (e.g. table) names get truncated 
in some places and not others.  If you create a table with a long name, 
PG truncates the name and creates a table with the shorter name; but 
if you refer to the table by the same long name, PG reports an error.  
(Very long names may show up in machine- generated schemas.) Would 
patches for this, e.g. to refuse to create a table with an impossible 
name, be welcome?  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers

On Thu, Apr 05, 2001 at 06:25:17PM -0400, Tom Lane wrote:
> "Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> >> If the reason that a block CRC isn't on the TODO list is that Vadim
> >> objects, maybe we should hear some reasons why he objects?  Maybe 
> >> the objections could be dealt with, and everyone satisfied.
> 
> > Unordered disk writes are covered by backing up modified blocks
> > in log. It allows not only catch such writes, as would CRC do,
> > but *avoid* them.
> 
> > So, for what CRC could be used? To catch disk damages?
> > Disk has its own CRC for this.
> 
> Blocks that have recently been written, but failed to make it down to
> the disk platter intact, should be restorable from the WAL log.  So we
> do not need a block-level CRC to guard against partial writes.

If a block is missing some sectors in the middle, how would you know
to reconstruct it from the WAL, without a block CRC telling you that
the block is corrupt?

> A block-level CRC might be useful to guard against long-term data
> lossage, but Vadim thinks that the disk's own CRCs ought to be
> sufficient for that (and I can't say I disagree).

The people who make the disks don't agree.  

They publish the error rate they guarantee, and they meet it, more 
or less.  They publish a rate that is _just_ low enough to satisfy 
noncritical requirements (on the correct assumption that they can't 
satisfy critical requirements in any case) and high enough not to 
interfere with benchmarks.  They assume that if you need better 
reliability you can and will provide it yourself, and rely on their 
CRC only as a performance optimization.

At the raw sector level, they get (and correct) errors very frequently; 
when they are not getting "enough" errors, they pack the bits more 
densely until they do, and sell a higher-density drive.

> So the only real benefit of a block-level CRC would be to guard against
> bits dropped in transit from the disk surface to someplace else, ie,
> during read or during a "cp -r" type copy of the database to another
> location.  That's not a totally negligible risk, but is it worth the
> overhead of updating and checking block CRCs?  Seems dubious at best.

Vadim didn't want to re-open this discussion until after 7.1 is out
the door, but that "dubious at best" demands an answer.  See the archive 
posting:

http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/msg00473.html

...

Incidentally, is the page at 

  http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/

the best place to find old messages?  It's never worked right for me.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers


On Thu, Apr 05, 2001 at 02:47:41PM -0700, Mikheev, Vadim wrote:
> > > So, for what CRC could be used? To catch disk damages?
> > > Disk has its own CRC for this.
> > 
> > OK, this was already discussed, maybe while Vadim was absent.  
> > Should I re-post the previous text?
> 
> Let's return to this discussion *after* 7.1 release.
> My main objection was (and is) - no time to deal with
> this issue for 7.1.

OK, everybody agreed on that before.  

This doesn't read like an objection to having it on the TODO list for
some future release.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers


On Thu, Apr 05, 2001 at 02:27:48PM -0700, Mikheev, Vadim wrote:
> > If the reason that a block CRC isn't on the TODO list is that Vadim
> > objects, maybe we should hear some reasons why he objects?  Maybe 
> > the objections could be dealt with, and everyone satisfied.
> 
> Unordered disk writes are covered by backing up modified blocks
> in log. It allows not only catch such writes, as would CRC do,
> but *avoid* them.
> 
> So, for what CRC could be used? To catch disk damages?
> Disk has its own CRC for this.

OK, this was already discussed, maybe while Vadim was absent.  
Should I re-post the previous text?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers

On Thu, Apr 05, 2001 at 04:25:42PM -0400, Ken Hirsch wrote:
> > > > TODO updated.  I know we did number 2, but did we agree on #1 and is
> it
> > > > done?
> > >
> > > #2 is indeed done.  #1 is not done, and possibly not agreed to ---
> > > I think Vadim had doubts about its usefulness, though personally I'd
> > > like to see it.
> >
> > That was my recollection too.  This was the discussion about testing the
> > disk hardware.  #1 removed.
> 
> What is recommended in the bible (Gray and Reuter), especially for larger
> disk block sizes that may not be written atomically, is to have a word at
> the end of the that must match a word at the beginning of the block.  It
> gets changed each time you write the block.

That only works if your blocks are atomic.  Even SCSI disks reorder
sector writes, and they are free to write the first and last sectors
of an 8k-32k block, and not have written the intermediate blocks 
before the power goes out.  On IDE disks it is of course far worse.

(On many (most?) IDE drives, even when they have been told to report 
write completion only after data is physically on the platter, they will 
"forget" if they see activity that looks like benchmarking.  Others just 
ignore the command, and in any case they all default to unsafe mode.)

If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects?  Maybe 
the objections could be dealt with, and everyone satisfied.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: Final call for platform testing

2001-04-03 Thread Nathan Myers

On Tue, Apr 03, 2001 at 11:19:04PM +, Thomas Lockhart wrote:
> > I saw three separate reports of successful builds on Linux 2.4.2 on x86
> > (including mine), but it isn't listed here.
> 
> It is listed in the comments in the real docs. At least one report was
> for an extensively patched 2.4.2, and I'm not sure of the true lineage
> of the others.

You could ask.  Just to ignore reports that you have asked for is not 
polite.  My report was based on a virgin, unpatched 2.4.2 kernel, and 
(as noted) the Debian-packaged glibc-2.2.2.  

If you are trying to trim your list, would be reasonable to drop 
Linux-2.0.x, because that version is not being maintained any more.

> I *could* remove the version info from the x86 listing, and mention both
> 2.2.x and 2.4.x in the comments.

Linux-2.2 and Linux-2.4 are different codebases.  It is worth noting,
besides, the glibc-version tested along with each Linux kernel version.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Final call for platform testing

2001-04-03 Thread Nathan Myers


On Tue, Apr 03, 2001 at 03:31:25PM +, Thomas Lockhart wrote:
> 
> OK. So we are close to a final tally of supported machines.
> ...
> Here are the up-to-date platforms:
> 
> AIX 4.3.3 RS6000   7.1 2001-03-21, Gilles Darold
> BeOS 5.0.4 x86 7.1 2000-12-18, Cyril Velter
> BSDI 4.01  x86 7.1 2001-03-19, Bruce Momjian
> Compaq Tru64 4.0g Alpha 7.1 2001-03-19, Brent Verner
> FreeBSD 4.3 x867.1 2001-03-19, Vince Vielhaber
> HPUX PA-RISC   7.1 2001-03-19, 10.20 Tom Lane, 11.00 Giles Lean
> IRIX 6.5.11 MIPS   7.1 2001-03-22, Robert Bruccoleri
> Linux 2.2.x Alpha  7.1 2001-01-23, Ryan Kirkpatrick
> Linux 2.2.x armv4l 7.1 2001-03-22, Mark Knox
> Linux 2.0.x MIPS   7.1 2001-03-30, Dominic Eidson
> Linux 2.2.18 PPC74xx 7.1 2001-03-19, Tom Lane
> Linux 2.2.x S/390  7.1 2000-11-17, Neale Ferguson
> Linux 2.2.15 Sparc 7.1 2001-01-30, Ryan Kirkpatrick
> Linux 2.2.16 x86   7.1 2001-03-19, Thomas Lockhart
> MacOS X Darwin PPC 7.1 2000-12-11, Peter Bierman
> NetBSD 1.5 Alpha   7.1 2001-03-22, Giles Lean
> NetBSD 1.5E arm32  7.1 2001-03-21, Patrick Welche
> NetBSD m68k7.0 2000-04-10 (Henry has lost machine)
> NetBSD Sparc   7.0 2000-04-13, Tom I. Helbekkmo
> NetBSD VAX 7.1 2001-03-30, Tom I. Helbekkmo
> NetBSD 1.5 x86 7.1 2001-03-23, Giles Lean
> OpenBSD 2.8 Sparc  7.1 2001-03-23, Brandon Palmer
> OpenBSD 2.8 x867.1 2001-03-22, Brandon Palmer
> SCO OpenServer 5 x86   7.1 2001-03-13, Billy Allie
> SCO UnixWare 7.1.1 x86 7.1 2001-03-19, Larry Rosenman
> Solaris 2.7-8 Sparc7.1 2001-03-22, Marc Fournier
> Solaris x867.1 2001-03-27, Mathijs Brands
> SunOS 4.1.4 Sparc  7.1 2001-03-23, Tatsuo Ishii
> WinNT/Cygwin x86   7.1 2001-03-16, Jason Tishler
> 
> And the "unsupported platforms":
> 
> DGUX m88k
> MkLinux DR1 PPC750 7.0 2000-04-13, Tatsuo Ishii
> NextStep x86
> QNX 4.25 x86   7.0 2000-04-01, Dr. Andreas Kardos
> System V R4 m88k
> System V R4 MIPS
> Ultrix MIPS7.1 2001-03-26, Alexander Klimov
> Windows/Win32 x86  7.1 2001-03-26, Magnus Hagander (clients only)

I saw three separate reports of successful builds on Linux 2.4.2 on x86
(including mine), but it isn't listed here.  

-- 
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-04-02 Thread Nathan Myers


On Mon, Apr 02, 2001 at 01:27:06PM -0400, Tom Lane wrote:
> Philip: the rule that pg_dump needs to apply w.r.t. defaults for
> inherited fields is that if an inherited field has a default and
> either (a) no parent table supplies a default, or (b) any parent
> table supplies a default different from the child's, then pg_dump
> had better emit the child field explicitly.

The rule above appears to work even if inherited-default conflicts 
are not taken as an error, but just result in a derived-table column 
with no default.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-04-02 Thread Nathan Myers


On Sun, Apr 01, 2001 at 03:15:56PM -0400, Tom Lane wrote:
> Christopher Masto <[EMAIL PROTECTED]> writes:
> > Another thing that seems kind of interesting would be to have:
> > CREATE TABLE base (table_id CHAR(8) NOT NULL [, etc.]);
> > CREATE TABLE foo  (table_id CHAR(8) NOT NULL DEFAULT 'foo');
> > CREATE TABLE bar  (table_id CHAR(8) NOT NULL DEFAULT 'bar');
> > Then a function on "base" could look at table_id and know which
> > table it's working on.  A waste of space, but I can think of
> > uses for it.
> 
> This particular need is superseded in 7.1 by the 'tableoid'
> pseudo-column.  However you can certainly imagine variants of this
> that tableoid doesn't handle, for example columns where the subtable
> creator can provide a useful-but-not-always-correct default value.

A bit of O-O doctrine... when you find yourself tempted to do something 
like the above, it usually means you're trying to do the wrong thing.  
You may not have a choice, in some cases, but you should know you are 
on the way to architecture meltdown.  "She'll blow, Cap'n!"

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-04-02 Thread Nathan Myers


On Sat, Mar 31, 2001 at 07:44:30PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> >> This seems pretty random.  It would be more reasonable if multiple
> >> (default) inheritance weren't allowed unless you explicitly specify a new
> >> default for the new column, but we don't have a syntax for this.
> 
> > I agree, but I thought the original issue was that PG _does_ now have 
> > syntax for it.  Any conflict in default values should result in either 
> > a failure, or "no default".  Choosing a default randomly, or according 
> > to an arbitrary and complicated rule (same thing), is a source of
> > bugs.
> 
> Well, we *do* have a syntax for specifying a new default (the same one
> that worked pre-7.0 and does now again).  I guess what you are proposing
> is the rule "If conflicting default values are inherited from multiple
> parents that each define the same column name, then an error is reported
> unless the child table redeclares the column and specifies a new default
> to override the inherited ones".
> 
> That is:
> 
>   create table p1 (f1 int default 1);
>   create table p2 (f1 int default 2);
>   create table c1 (f2 float) inherits(p1, p2);   # XXX
> 
> would draw an error about conflicting defaults for c1.f1, but
> 
>   create table c1 (f1 int default 3, f2 float) inherits(p1, p2);
> 
> would be accepted (and 3 would become the default for c1.f1).
> 
> This would take a few more lines of code, but I'm willing to do it if
> people think it's a safer behavior than picking one of the inherited
> default values.

I do.  

Allowing the line marked XXX above, but asserting no default for 
c1.f1 in that case, would be equally safe.  (A warning would be 
polite, anyhow.) User code that doesn't rely on the default wouldn't 
notice.  You only need to choose a default if somebody adding rows to 
c1 uses it.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Third call for platform testing (linux 2.4.x)

2001-03-31 Thread Nathan Myers



I just built and tested RC1 on Linux 2.4.2, with glibc-2.2.2 and
gcc-2.95.2 on a Debian 2.2+ x86 system.  ("+" implying some packages
from "unstable".)

I configured it --with-perl --with-openssl --with-CXX.
It built without errors, but with a few warnings.

This one seemed (portably) odd:
--
 In file included from gram.y:43:
 lex.plpgsql_yy.c: In function `plpgsql_yylex':
 lex.plpgsql_yy.c:972: warning: label `find_rule' defined but not used
--

And this:
--
 ar crs libpq.a `lorder fe-auth.o fe-connect.o fe-exec.o fe-misc.o fe-print.o 
fe-lobj.o pqexpbuffer.o dllist.o pqsignal.o | tsort`
 tsort: -: input contains a loop:

 tsort: dllist.o
--

And this:
--
 ar crs libecpg.a `lorder execute.o typename.o descriptor.o data.o error.o prepare.o 
memory.o connect.o misc.o | tsort`
 tsort: -: input contains a loop:

 tsort: connect.o
 tsort: execute.o
 tsort: data.o
--

And this:

--
 ar crs libplpgsql.a `lorder pl_parse.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o | 
tsort`
 tsort: -: input contains a loop:

 tsort: pl_comp.o
 tsort: pl_parse.o
--

I ran "make check".  It said:

--
 All 76 tests passed. 
--

Nathan Myers
[EMAIL PROTECTED]

On Sat, Mar 31, 2001 at 12:02:35PM +1200, Franck Martin wrote:
> I still don't see an entry for Linux 2.4.x
> 
> Cheers.
> 
> Thomas Lockhart wrote:
> 
> > Unreported or problem platforms:
> >
> > Linux 2.0.x MIPS   7.0 2000-04-13 (Tatsuo has lost machine)
> > mklinux PPC750 7.0 2000-04-13, Tatsuo Ishii
> > NetBSD m68k7.0 2000-04-10 (Henry has lost machine)
> > NetBSD Sparc   7.0 2000-04-13, Tom I. Helbekkmo
> > QNX 4.25 x86   7.0 2000-04-01, Dr. Andreas Kardos
> > Ultrix MIPS7.1 2001-??-??, Alexander Klimov
> >
> > mklinux has failed Tatsuo's testing afaicr. Demote to unsupported?
> >
> > Any NetBSD partisans who can do testing or solicit testing from the
> > NetBSD crowd? Same for OpenBSD?
> >
> > QNX is known to have problems with 7.1. Any hope of fixing for 7.1.1? Is
> > there anyone able to work on it? If not, I'll move to the unsupported
> > list.
> >
> > And here are the up-to-date platforms; thanks for the reports:
> >
> > AIX 4.3.3 RS6000   7.1 2001-03-21, Gilles Darold
> > BeOS 5.0.3 x86 7.1 2000-12-18, Cyril Velter
> > BSDI 4.01  x86 7.1 2001-03-19, Bruce Momjian
> > Compaq Tru64 4.0g Alpha 7.1 2001-03-19, Brent Verner
> > FreeBSD 4.3 x867.1 2001-03-19, Vince Vielhaber
> > HPUX PA-RISC   7.1 2001-03-19, 10.20 Tom Lane, 11.00 Giles Lean
> > IRIX 6.5.11 MIPS   7.1 2001-03-22, Robert Bruccoleri
> > Linux 2.2.x Alpha  7.1 2001-01-23, Ryan Kirkpatrick
> > Linux 2.2.x armv4l 7.1 2001-03-22, Mark Knox
> > Linux 2.2.18 PPC750 7.1 2001-03-19, Tom Lane
> > Linux 2.2.x S/390  7.1 2000-11-17, Neale Ferguson
> > Linux 2.2.15 Sparc 7.1 2001-01-30, Ryan Kirkpatrick
> > Linux 2.2.16 x86   7.1 2001-03-19, Thomas Lockhart
> > MacOS X Darwin PPC 7.1 2000-12-11, Peter Bierman
> > NetBSD 1.5 alpha   7.1 2001-03-22, Giles Lean
> > NetBSD 1.5E arm32  7.1 2001-03-21, Patrick Welche
> > NetBSD 1.5S x867.1 2001-03-21, Patrick Welche
> > OpenBSD 2.8 x867.1 2001-03-22, Brandon Palmer
> > SCO OpenServer 5 x86   7.1 2001-03-13, Billy Allie
> > SCO UnixWare 7.1.1 x86 7.1 2001-03-19, Larry Rosenman
> > Solaris 2.7 Sparc  7.1 2001-03-22, Marc Fournier
> > Solaris x867.1 2001-03-27, Mathijs Brands
> > SunOS 4.1.4 Sparc  7.1 2001-03-23, Tatsuo Ishii
> > Windows/Win32 x86  7.1 2001-03-26, Magnus Hagander (clients only)
> > WinNT/Cygwin x86   7.1 2001-03-16, Jason Tishler
> >
> > ---(end of broadcast)---
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-03-30 Thread Nathan Myers


On Fri, Mar 30, 2001 at 11:05:53PM +0200, Peter Eisentraut wrote:
> Tom Lane writes:
> 
> > 3. The new column will have a default value if any of the combined
> > column specifications have one.  The last-specified default (the one
> > in the explicitly given column list, or the rightmost parent table
> > that gives a default) will be used.
> 
> This seems pretty random.  It would be more reasonable if multiple
> (default) inheritance weren't allowed unless you explicitly specify a new
> default for the new column, but we don't have a syntax for this.

I agree, but I thought the original issue was that PG _does_ now have 
syntax for it.  Any conflict in default values should result in either 
a failure, or "no default".  Choosing a default randomly, or according 
to an arbitrary and complicated rule (same thing), is a source of bugs.

> > 4. All relevant constraints from all the column specifications will
> > be applied.  In particular, if any of the specifications includes NOT
> > NULL, the resulting column will be NOT NULL.  (But the current
> > implementation does not support inheritance of UNIQUE or PRIMARY KEY
> > constraints, and I do not have time to add that now.)
> 
> This is definitely a violation of that Liskov Substitution.  If a context
> expects a certain table and gets a more restricted table, it will
> certainly notice.

Not so.  The rule is that the base-table code only has to understand
the derived table.  The derived table need not be able to represent
all values possible in the base table. 

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-03-30 Thread Nathan Myers

On Fri, Mar 30, 2001 at 12:10:59PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > The O-O principle involved here is Liskov Substitution: if the derived
> > table is used in the context of code that thinks it's looking at the
> > base table, does anything break?
> 
> I propose the following behavior:
> 
> 1. A table can have only one column of a given name.  If the same
> column name occurs in multiple parent tables and/or in the explicitly
> specified column list, these column specifications are combined to
> produce a single column specification.  A NOTICE will be emitted to
> warn the user that this has happened.  The ordinal position of the
> resulting column is determined by its first appearance.

Treatment of like-named members of multiple base types is not done
consistently in the various O-O languages.  It's really a snakepit, and 
anything you do automatically will cause terrible problems for somebody.  
Nonetheless, for any given circumstances some possible approaches are 
clearly better than others.

In C++, as in most O-O languages, the like-named members are kept 
distinct.  When referred to in the context of a base type, the member 
chosen is the "right one".  Used in the context of the multiply-derived 
type, the compiler reports an ambiguity, and you are obliged to qualify 
the name explicitly to identify which among the like-named inherited 
members you meant.  You can declare which one is "really inherited".  
Some other languages presume to choose automatically which one they 
think you meant.  The real danger is from members inherited from way
back up the trees, which you might not know one are there.

Of course PG is different from any O-O language.  I don't know if PG 
has an equivalent to the "base-class context".  I suppose PG has a long 
history of merging like-named members, and that the issue is just of 
the details of how the merge happens.  

> 4. All relevant constraints from all the column specifications will
> be applied.  In particular, if any of the specifications includes NOT
> NULL, the resulting column will be NOT NULL.  (But the current
> implementation does not support inheritance of UNIQUE or PRIMARY KEY
> constraints, and I do not have time to add that now.)

Sounds like a TODO item...

Do all the triggers of the base tables get applied, to be run one after 
another?

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-03-29 Thread Nathan Myers


On Thu, Mar 29, 2001 at 02:29:38PM +0100, Oliver Elphick wrote:
> Peter Eisentraut wrote:
>   >Tom Lane writes:
>   >
>   >> It seems that in pre-7.0 Postgres, this works:
>   >>
>   >> create table one(id int default 1, descr text);
>   >> create table two(id int default 2, tag text) inherits (one);
>   >>
>   >> with the net effect that table "two" has just one "id" column with
>   >> default value 2.
>   >
>   >Although the liberty to do anything you want seems appealing at first, I
>   >would think that allowing this is not correct from an OO point of view.
> 
> I don't agree; this is equivalent to redefinition of a feature (=method) in
> a descendant class, which is perfectly acceptable so long as the feature's
> signature (equivalent to column type) remains unchanged.

The O-O principle involved here is Liskov Substitution: if the derived
table is used in the context of code that thinks it's looking at the
base table, does anything break?

Changing the default value of a column should not break anything, 
because the different default value could as well have been entered 
in the column manually.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] MIPS test-and-set

2001-03-26 Thread Nathan Myers

On Mon, Mar 26, 2001 at 07:09:38PM -0500, Tom Lane wrote:
> Thomas Lockhart <[EMAIL PROTECTED]> writes:
> > That is not already available from the Irix support code?
> 
> What we have for IRIX is
> ... 
> Doesn't look to me like it's likely to work on anything but IRIX ...

I have attached linuxthreads/sysdeps/mips/pt-machine.h from glibc-2.2.2
below.  (Glibc linuxthreads has alpha, arm, hppa, i386, ia64, m68k, mips,
powerpc, s390, SH, and SPARC support, at least in some degree.)

Since the actual instruction sequence is probably lifted from the 
MIPS manual, it's probably much freer than GPL.  For the paranoid,
the actual instructions, extracted, are just

   1:
 ll   %0,%3
 bnez %0,2f
  li  %1,1
     sc   %1,%2
 beqz %1,1b
   2:

Nathan Myers
[EMAIL PROTECTED]

---
/* Machine-dependent pthreads configuration and inline functions.

   Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Ralf Baechle <[EMAIL PROTECTED]>.
   Based on the Alpha version by Richard Henderson <[EMAIL PROTECTED]>.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If
   not, write to the Free Software Foundation, Inc.,
   59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */

#include 
#include 

#ifndef PT_EI
# define PT_EI extern inline
#endif

/* Memory barrier.  */
#define MEMORY_BARRIER() __asm__ ("" : : : "memory")

/* Spinlock implementation; required.  */

#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)

PT_EI long int
testandset (int *spinlock)
{
  long int ret, temp;

  __asm__ __volatile__
("/* Inline spinlock test & set */\n\t"
 "1:\n\t"
 "ll%0,%3\n\t"
 ".set  push\n\t"
 ".set  noreorder\n\t"
 "bnez  %0,2f\n\t"
 " li   %1,1\n\t"
 ".set  pop\n\t"
 "sc%1,%2\n\t"
 "beqz  %1,1b\n"
 "2:\n\t"
 "/* End spinlock test & set */"
 : "=&r" (ret), "=&r" (temp), "=m" (*spinlock)
 : "m" (*spinlock)
 : "memory");

  return ret;
}

#else /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */

PT_EI long int
testandset (int *spinlock)
{
  return _test_and_set (spinlock, 1);
}
#endif /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */

/* Get some notion of the current stack.  Need not be exactly the top
   of the stack, just something somewhere in the current frame.  */
#define CURRENT_STACK_FRAME  stack_pointer
register char * stack_pointer __asm__ ("$29");

/* Compare-and-swap for semaphores. */

#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)

#define HAS_COMPARE_AND_SWAP
PT_EI int
__compare_and_swap (long int *p, long int oldval, long int newval)
{
  long int ret;

  __asm__ __volatile__
("/* Inline compare & swap */\n\t"
 "1:\n\t"
 "ll%0,%4\n\t"
 ".set  push\n"
 ".set  noreorder\n\t"
 "bne   %0,%2,2f\n\t"
 " move %0,%3\n\t"
 ".set  pop\n\t"
 "sc%0,%1\n\t"
 "beqz  %0,1b\n"
 "2:\n\t"
 "/* End compare & swap */"
 : "=&r" (ret), "=m" (*p)
 : "r" (oldval), "r" (newval), "m" (*p)
 : "memory");

  return ret;
}

#endif /* (_MIPS_ISA >= _MIPS_ISA_MIPS2) */

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] Re: RELEASE STOPPER? nonportable int64 constant s in pg_crc.c

2001-03-24 Thread Nathan Myers


On Sat, Mar 24, 2001 at 02:05:05PM -0800, Ian Lance Taylor wrote:
> Tom Lane <[EMAIL PROTECTED]> writes:
> > Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> > > A safe way to construct a long long constant is to do it using an
> > > expression:
> > > uint64) 0xdeadbeef) << 32) | (uint64) 0xfeedface)
> > > It's awkward, obviously, but it works with any compiler.
> > 
> > An interesting example.  That will work as intended if and only if the
> > compiler regards 0xfeedface as unsigned ...
> 
> True, for additional safety, do this:
> uint64) (unsigned long) 0xdeadbeef) << 32) |
>   (uint64) (unsigned long) 0xfeedface)

For the paranoid,

   uint64) 0xdead) << 48) | (((uint64) 0xbeef) << 32) | \
(((uint64) 0xfeed) << 16) | ((uint64) 0xface))

Or, better

   #define FRAG64(bits,shift) (((uint64)(bits)) << (shift))
   #define LITERAL64(a,b,c,d) \
 FRAG64(a,48) | FRAG64(b,32) | FRAG64(c,16) | FRAG64(d,0)
   LITERAL64(0xdead,0xbeef,0xfeed,0xface)

That might be overkill for just a single literal...

Nathan Myers
ncm

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] WAL & SHM principles

2001-03-12 Thread Nathan Myers

Sorry for taking so long to reply...

On Wed, Mar 07, 2001 at 01:27:34PM -0800, Mikheev, Vadim wrote:
> Nathan wrote:
> > It is possible to build a logging system so that you mostly don't care
> > when the data blocks get written 
[after being changed, as long as they get written by an fsync];
> > a particular data block on disk is 
> > considered garbage until the next checkpoint, so that you 
> 
> How to know if a particular data page was modified if there is no
> log record for that modification?
> (Ie how to know where is garbage? -:))

In such a scheme, any block on disk not referenced up to (and including) 
the last checkpoint is garbage, and is either blank or reflects a recent 
logged or soon-to-be-logged change.  Everything written (except in the 
log) after the checkpoint thus has to happen in blocks not otherwise 
referenced from on-disk -- except in other post-checkpoint blocks.

During recovery, the log contents get written to those pages during
startup. Blocks that actually got written before the crash are not
changed by being overwritten from the log, but that's ok. If they got
written before the corresponding log entry, too, nothing references
them, so they are considered blank.

> > might as well allow the blocks to be written any time,
> > even before the log entry.
> 
> And what to do with index tuples pointing to unupdated heap pages
> after that?

Maybe index pages are cached in shm and copied to mmapped blocks 
after it is ok for them to be written.

What platforms does PG run on that don't have mmap()?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Uh, this is not a 64-bit CRC ...

2001-03-12 Thread Nathan Myers

On Mon, Mar 05, 2001 at 02:00:59PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> >   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> > ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> 
> >   From the README:
> 
> >   The code in this package has been derived from the BTLib package
> >   obtained from Christian Iseli <[EMAIL PROTECTED]>.
> >   From his mail:
> 
> >   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> >   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> >   Press.  Pages 896ff.
> 
> >   The generator polynomial is x64 + x4 + x3 + x1 + 1.
> 
> Nathan (or anyone else with a copy of "Numerical recipes in C", which
> I'm embarrassed to admit I don't own), is there any indication in there
> that anyone spent any effort on choosing that particular generator
> polynomial?  As far as I can see, it violates one of the standard
> guidelines for choosing a polynomial, namely that it be a multiple of
> (x + 1) ... which in modulo-2 land is equivalent to having an even
> number of terms, which this ain't got.  See Ross Williams'
> A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from
> ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is
> by far the most thorough and readable thing I've ever seen on CRCs.
> 
> I spent some time digging around the net for standard CRC64 polynomials,
> and the only thing I could find that looked like it might have been
> picked by someone who understood what they were doing is in the DLT
> (digital linear tape) standard, ECMA-182 (available from
> http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM):
> 
> x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 +
> x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 +
> x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
> x^7 + x^4 + x + 1

I'm sorry to have taken so long to reply.  

The polynomial chosen for SWISS-PROT turns out to be presented, in 
Numerical Recipes, just as an example of a primitive polynomial of 
that degree; no assertion is made about its desirability for error 
checking.  It is (in turn) drawn from E. J. Watson, "Mathematics of 
Computation", vol. 16, pp368-9.

Having (x + 1) as a factor guarantees to catch all errors in which
an odd number of bits have been changed.  Presumably you are then
infinitesimally less likely to catch all errors in which an even 
number of bits have been changed.

I would have posted the ECMA-182 polynomial if I had found it.  (That 
was good searching!)  One hopes that the ECMA polynomial was chosen more 
carefully than entirely at random.  High-degree codes are often chosen 
by Monte Carlo methods, by applying statistical tests to randomly-chosen 
values, because the search space is so large.

I have verified that Tom transcribed the polynomial correctly from
the PDF image.  The ECMA document doesn't say whether their polynomial
is applied "bit-reversed", but the check would be equally strong either
way.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Internationalized dates (was Internationalized error messages)

2001-03-12 Thread Nathan Myers


On Mon, Mar 12, 2001 at 11:11:46AM +0100, Karel Zak wrote:
> On Fri, Mar 09, 2001 at 10:58:02PM +0100, Kaare Rasmussen wrote:
> > Now you're talking about i18n, maybe someone could think about input and
> > output of dates in local language.
> > 
> > As fas as I can tell, PostgreSQL will only use English for dates, eg January,
> > February and weekdays, Monday, Tuesday etc. Not the local name.
> 
>  May be add special mask to to_char() and use locales for this, but I not
> sure. It isn't easy -- arbitrary size of strings, to_char's cache problems
> -- more and more difficult is parsing input with locales usage. 
> The other thing is speed...
> 
>  A solution is use number based dates without names :-(

ISO has published a standard on date/time formats, ISO 8601.  
Dates look like "2001-03-22".  Times look like "12:47:63".  
The only unfortunate feature is their standard format for a 
date/time: "2001-03-22T12:47:63".  To me the ISO date format
is far better than something involving month names. 

I'd like to see ISO 8601 as the default data format.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Banner links not working (fwd)

2001-03-12 Thread Nathan Myers


On Mon, Mar 12, 2001 at 08:05:26PM +, Peter Mount wrote:
> At 11:41 12/03/01 -0500, Vince Vielhaber wrote:
> >On Mon, 12 Mar 2001, Peter Mount wrote:
> >
> > > Bottom of every page (part of the template) is both my name and email
> > > address ;-)
> >
> >Can we slightly enlarge the font?
> 
> Can do. What size do you think is best?
> 
> I've always used size=1 for that line...

Absolute font sizes in HTML are always a mistake.  size="-1" would do.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] doxygen & PG

2001-03-10 Thread Nathan Myers

On Sat, Mar 10, 2001 at 06:29:37PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > Is this page 
> >   http://members.fortunecity.com/nymia/postgres/dox/backend/html/
> > common knowledge?
> 
> Interesting, but bizarrely incomplete.  (Yeah, we have only ~100
> struct types ... sure ...)

It does say "version 0.0.1".  

What was interesting to me is that the interface seems a lot more 
helpful than the current CVS web gateway.  If it were to be completed, 
and could be kept up to date automatically, something like it could 
be very useful.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] doxygen & PG

2001-03-10 Thread Nathan Myers


Is this page 

  http://members.fortunecity.com/nymia/postgres/dox/backend/html/

common knowledge?  It appears to be an automatically-generated
cross-reference documentation web site.  My impression is that
appropriately-marked comments in the code get extracted to the 
web pages, too, so it is also a way to automate internal 
documentation.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Nathan Myers

On Fri, Mar 09, 2001 at 12:05:22PM -0500, Tom Lane wrote:
> > Gettext takes care of this.  In the source you'd write
> 
> > elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
> > string, string);
> 
> Duh.  For some reason I was envisioning the localization substitution as
> occurring on the client side, but of course we'd want to do it on the
> server side, and before parameters are substituted into the message.
> Sorry for the noise.
> 
> I am not sure we can/should use gettext (possible license problems?),
> but certainly something like this could be cooked up.

I've been assuming that PG's needs are specialized enough that the
project wouldn't use gettext directly, but instead something inspired 
by it.  

If you look at my last posting on the subject, by the way, you will see 
that it could work without a catalog underneath; integrating a catalog 
would just require changes in a header file (and the programs to generate 
the catalog, of course).  That quality seems to me essential to allow the 
changeover to be phased in gradually, and to allow different underlying 
catalog implementations to be tried out.

Nathan
ncm

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Internationalized error messages

2001-03-08 Thread Nathan Myers

On Thu, Mar 08, 2001 at 09:00:09PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > Similar approaches have been tried frequently, and even enshrined 
> > in standards (e.g. POSIX catgets), but have almost always proven too
> > cumbersome.  The problem is that keeping programs that interpret the 
> > numeric code in sync with the program they monitor is hard, and trying 
> > to avoid breaking all those secondary programs hinders development on 
> > the primary program.  Furthermore, assigning code numbers is a nuisance,
> > and they add uninformative clutter.  
> 
> There's a difficult tradeoff to make here, but I think we do want to
> distinguish between the "official error code" --- the thing that has
> translations into various languages --- and what the backend is actually
> allowed to print out.  It seems to me that a fairly large fraction of
> the unique messages found in the backend can all be lumped under the
> category of "internal error", and that we need to have only one official
> error code and one user-level translated message for the lot of them.
> But we do want to be able to print out different detail messages for
> each of those internal errors.  There are other categories that might be
> lumped together, but that one alone is sufficiently large to force us
> to recognize it.  This suggests a distinction between a "primary" or
> "user-level" error message, which we catalog and provide translations
> for, and a "secondary", "detail", or "wizard-level" error message that
> exists only in the backend source code, and only in English, and so
> can be made up on the spur of the moment.

I suggest using different named functions/macros for different 
categories of message, rather than arguments to a common function.  
(I.e. "elog(ERROR, ...)" Considered Harmful.)  

You might even have more than one call at a site, one for the official
message and another for unofficial or unstable informative details.

> Another thing that I missed in Peter's proposal is how we are going to
> cope with messages that include parameters.  Surely we do not expect
> gettext to start with 'Attribute "foo" not found' and distinguish fixed
> from variable parts of that string?

The common way to deal with this is to catalog the format string itself,
with its embedded % directives.  The tricky bit, and what the printf 
family has had to be extended to handle, is that the order of the formal 
arguments varies with the target language.  The original string is an 
ordinary printf string, but the translations may have to refer to the 
substitution arguments by numeric position (as well as type).

There is probably Free code to implement that.

As much as possible, any compile-time annotations should be extracted 
into the catalog and filtered out of the source, to be reunited only
when you retrieve the catalog entry.  

> So it's clear that we need to devise a way of breaking an "error
> message" into multiple portions, including:
> 
>   Primary error message (localizable)
>   Parameters to insert into error message (user identifiers, etc)
>   Secondary (wizard) error message (optional)
>   Source code location
>   Query text location (optional)
> 
> and perhaps others that I have forgotten about.  One of the key things
> to think about is whether we can, or should try to, transmit all this
> stuff in a backwards-compatible protocol.  That would mean we'd have
> to dump all the info into a single string, which is doable but would
> perhaps look pretty ugly:
> 
>   ERROR: Attribute "foo" not found  -- basic message for dumb frontends
>   ERRORCODE: UNREC_IDENT  -- key for finding localized message
>   PARAM1: foo -- something to embed in the localized message
>   MESSAGE: Attribute or table name not known within context of query
>   CODELOC: src/backend/parser/parse_clause.c line 345
>   QUERYLOC: 22

Whitespace can be used effectively.  E.g. only primary messages appear
in column 0.  PG might emit this, which is easily filtered:

   Attribute "foo" not found
severity: cannot proceed
explain: An attribute or table was name not known within
explain: the context of the query.
index: 237 Attribute \"%s\" not found
location: src/backend/parser/parse_clause.c line 345
query_position: 22

Here the first line is the localized replacement of what appears in the 
code, with arguments substituted in.   The other stuff comes from the
catalog

The call looks like

  elog_query("Attribute \"%s\" not found", foo);
  elog_explain("An attribute or table was name not known within"

Re: [HACKERS] Internationalized error messages

2001-03-08 Thread Nathan Myers


On Thu, Mar 08, 2001 at 11:49:50PM +0100, Peter Eisentraut wrote:
> I really feel that translated error messages need to happen soon.
> Managing translated message catalogs can be done easily with available
> APIs.  However, translatable messages really require an error code
> mechanism (otherwise it's completely impossible for programs to interpret
> error messages reliably).  I've been thinking about this for much too long
> now and today I finally settled to the simplest possible solution.
> 
> Let the actual method of allocating error codes be irrelevant for now,
> although the ones in the SQL standard are certainly to be considered for a
> start.  Essentially, instead of writing
> 
> elog(ERROR, "disaster struck");
> 
> you'd write
> 
> elog(ERROR, "XYZ01", "disaster struck");
> 
> Now you'll notice that this approach doesn't make the error message text
> functionally dependend on the error code.  The alternative would have been
> to write
> 
> elog(ERROR, "XYZ01");
> 
> which makes the code much less clear.  Additonally, most of the elog()
> calls use printf style variable argument lists.  So maybe
> 
> elog(ERROR, "XYZ01", (arg + 1), foo);
> 
> This is not only totally obscure, but also incredibly cumbersome to
> maintain and very error prone.  One earlier idea was to make the "XYZ01"
> thing a macro instead that expands to a string with % arguments, that GCC
> can check as it does now.  But I don't consider this a lot better, because
> the initial coding is still obscured, and additonally the list of those
> macros needs to be maintained.  (The actual error codes might still be
> provided as readable macro names similar to the errno codes, but I'm not
> sure if we should share these between server and client.)
> 
> Finally, there might also be legitimate reasons to have different error
> message texts for the same error code.  For example, "type errors" (don't
> know if this is an official code) can occur in a number of places that
> might warrant different explanations.  Indeed, this approach would
> preserve "artistic freedom" to some extent while still maintaining some
> structure alongside.  And it would be rather straightforward to implement,
> too.  Those who are too bored to assign error codes to new code can simply
> pick some "zero" code as default.
> 
> On the protocol front, this could be pretty easy to do.  Instead of
> "message text" we'd send a string "XYZ01: message text".  Worst case, we
> pass this unfiltered to the client and provide an extra function that
> returns only the first five characters.  Alternatively we could strip off
> the prefix when returning the message text only.
> 
> At the end, the i18n part would actually be pretty easy, e.g.,
> 
> elog(ERROR, "XYZ01", gettext("stuff happened"));

Similar approaches have been tried frequently, and even enshrined 
in standards (e.g. POSIX catgets), but have almost always proven too
cumbersome.  The problem is that keeping programs that interpret the 
numeric code in sync with the program they monitor is hard, and trying 
to avoid breaking all those secondary programs hinders development on 
the primary program.  Furthermore, assigning code numbers is a nuisance,
and they add uninformative clutter.  

It's better to scan the program for elog() arguments, and generate
a catalog by using the string itself as the index code.  Those 
maintaining the secondary programs can compare catalogs to see what 
has been broken by changes and what new messages to expect.  elog()
itself can (optionally) invent tokens (e.g. catalog indices) to help 
out those programs.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Use SIGQUIT instead of SIGUSR1?

2001-03-08 Thread Nathan Myers


On Thu, Mar 08, 2001 at 04:06:16PM -0500, Tom Lane wrote:
> To implement the idea of performing a checkpoint after every so many
> XLOG megabytes (as well as after every so many seconds), I need to pick
> an additional signal number for the postmaster to accept.  Seems like
> the most appropriate choice for this is SIGUSR1, which isn't currently
> being used at the postmaster level.
> 
> However, if I just do that, then SIGUSR1 and SIGQUIT will have
> completely different meanings for the postmaster and for the backends,
> in fact SIGQUIT to the postmaster means send SIGUSR1 to the backends.
> This seems hopelessly confusing.
> 
> I think it'd be a good idea to change the code so that SIGQUIT is the
> per-backend quickdie() signal, not SIGUSR1, to bring the postmaster and
> backend signals back into some semblance of agreement.
> 
> For the moment we could leave the backends also accepting SIGUSR1 as
> quickdie, just in case someone out there is in the habit of sending
> that signal manually to individual backends.  Eventually backend SIGUSR1
> might be reassigned to mean something else.  (I suspect Bruce is
> coveting it already ;-).)

The number and variety of signals used in PG is already terrifying.

Attaching a specific meaning to SIGQUIT may be dangerous if the OS and 
its daemons also send SIGQUIT to mean something subtly different.  I'd 
rather see a reduction in the use of signals, and a movement toward more 
modern, better behaved interprocess communication mechanisms.  Still, 
"if it were done when 'tis done, then 'twere well It were done" cleanly.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Proposed WAL changes

2001-03-07 Thread Nathan Myers

On Wed, Mar 07, 2001 at 12:03:41PM -0800, Mikheev, Vadim wrote:
> Ian wrote:
> > > I feel that the fact that
> > > 
> > > WAL can't help in the event of disk errors
> > > 
> > > is often overlooked.
> > 
> > This is true in general.  But, nevertheless, WAL can be written to
> > protect against predictable disk errors, when possible.  Failing to
> > write a couple of disk blocks when the system crashes 

or, more likely, when power drops; a system crash shouldn't keep the
disk from draining its buffers ...

> > is a reasonably predictable disk error.  WAL should ideally be 
> > written to work correctly in that situation.
> 
> But what can be done if fsync returns before pages flushed?

Just what Tom has done: preserve a little more history.  If it's not
too expensive, then it doesn't hurt you when running on sound hardware,
but it offers a good chance of preventing embarrassments for (the 
overwhelming fraction of) users on garbage hardware.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] WAL & SHM principles

2001-03-07 Thread Nathan Myers

On Wed, Mar 07, 2001 at 11:21:37AM -0500, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > The only problem is that we would no longer have control over which
> > pages made it to disk.  The OS would perhaps write pages as we modified
> > them.  Not sure how important that is.
> 
> Unfortunately, this alone is a *fatal* objection.  See nearby
> discussions about WAL behavior: we must be able to control the relative
> timing of WAL write/flush and data page writes.

Not so fast!

It is possible to build a logging system so that you mostly don't care
when the data blocks get written; a particular data block on disk is 
considered garbage until the next checkpoint, so that you might as well 
allow the blocks to be written any time, even before the log entry.

Letting the OS manage sharing of disk block images via mmap should be 
an enormous win vs. a fixed shm and manual scheduling by PG.  If that
requires changes in the logging protocol, it's worth it.

(What supported platforms don't have mmap?)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposed WAL changes

2001-03-07 Thread Nathan Myers


On Wed, Mar 07, 2001 at 11:09:25AM -0500, Tom Lane wrote:
> "Vadim Mikheev" <[EMAIL PROTECTED]> writes:
> >> * Store two past checkpoint locations, not just one, in pg_control.
> >> On startup, we fall back to the older checkpoint if the newer one
> >> is unreadable.  Also, a physical copy of the newest checkpoint record
> 
> > And what to do if older one is unreadable too?
> > (Isn't it like using 2 x CRC32 instead of CRC64 ? -:))
> 
> Then you lose --- but two checkpoints gives you twice the chance of
> recovery (probably more, actually, since it's much more likely that
> the previous checkpoint will have reached disk safely).

Actually far more: if the checkpoints are minutes apart, even the 
worst disk drive will certainly have flushed any blocks written for 
the earlier checkpoint.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Red Hat bashing

2001-03-06 Thread Nathan Myers

On Tue, Mar 06, 2001 at 04:20:13PM -0500, Lamar Owen wrote:
> Nathan Myers wrote:
> > That is why there is no problem with version skew in the syscall
> > argument structures on a correctly-configured Linux system.  (On a
> > Red Hat system it is very easy to get them out of sync, but RH fans
> > are used to problems.)
> 
> Is RedHat bashing really necessary here? 

I recognize that my last seven words above contributed nothing.
In the future I will only post strictly factual statements about
Red Hat and similarly charged topics, and keep the opinions to
myself.  I value the collegiality of this list too much to risk 
it further.  I offer my apologies for violating it.

By the way... do they call Red Hat "RedHat" at Red Hat? 

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Nathan Myers

On Tue, Mar 06, 2001 at 08:19:12PM +0100, Peter Eisentraut wrote:
> Alfred Perlstein writes:
> 
> > Seriously, there's some dispute on the type that 'shm_nattch' is,
> > under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
> > it's 'short' (i should fix this. :)).
> 
> What I don't like is that my /usr/include/sys/shm.h (through other
> headers) has:
> 
> typedef unsigned long int shmatt_t;
> 
> /* Data structure describing a set of semaphores.  */
> struct shmid_ds
>   {
> struct ipc_perm shm_perm;   /* operation permission struct */
> size_t shm_segsz;   /* size of segment in bytes */
> __time_t shm_atime; /* time of last shmat() */
> unsigned long int __unused1;
> __time_t shm_dtime; /* time of last shmdt() */
> unsigned long int __unused2;
> __time_t shm_ctime; /* time of last change by shmctl() */
> unsigned long int __unused3;
> __pid_t shm_cpid;   /* pid of creator */
> __pid_t shm_lpid;   /* pid of last shmop */
> shmatt_t shm_nattch;/* number of current attaches */
> unsigned long int __unused4;
> unsigned long int __unused5;
>   };
> 
> whereas /usr/src/linux/include/shm.h has:
> 
> struct shmid_ds {
> struct ipc_perm shm_perm;   /* operation perms */
> int shm_segsz;  /* size of segment (bytes) */
> __kernel_time_t shm_atime;  /* last attach time */
> __kernel_time_t shm_dtime;  /* last detach time */
> __kernel_time_t shm_ctime;  /* last change time */
> __kernel_ipc_pid_t  shm_cpid;   /* pid of creator */
> __kernel_ipc_pid_t  shm_lpid;   /* pid of last operator */
> unsigned short  shm_nattch; /* no. of current attaches */
> unsigned short  shm_unused; /* compatibility */
> void*shm_unused2;   /* ditto - used by DIPC */
> void*shm_unused3;   /* unused */
> };
> 
> 
> Not only note the shm_nattch type, but also shm_segsz, and the "unused"
> fields in between.  I don't know a thing about the Linux kernel sources,
> but this doesn't seem right.

On Linux, /usr/src/linux/include is meaningless for anything in userland; 
it's meant only for building the kernel and kernel modules.  That Red Hat 
tends to expose it to user-level builds is a long-standing bug in Red 
Hat's distribution, in violation of the File Hierarchy Standard as well 
as explicit instructions from Linus & crew and from the maintainer of the 
C library.

User-level programs see what's in /usr/include, which only has to match 
what the C library wants.  It's the C library's job to do any mapping 
needed, and it does.  The C library is maintained very, very carefully
to keep binary compatibility with all old versions.  (One sometimes
encounters commercial programs that rely on a bug or undocumented/
usupported feature that disappears in a later library version.)

That is why there is no problem with version skew in the syscall
argument structures on a correctly-configured Linux system.  (On a
Red Hat system it is very easy to get them out of sync, but RH fans 
are used to problems.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-05 Thread Nathan Myers

On Mon, Mar 05, 2001 at 08:55:41PM -0500, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > killproc should send a kill -15 to the process, wait a few seconds for
> > it to exit.  If it does not, try kill -1, and if that doesn't kill it,
> > then kill -9.
> 
> Tell it to the Linux people ... this is their boot-script code we're
> talking about.

Not to be a zealot, but this isn't _Linux_ boot-script code, it's
_Red Hat_ boot-script code.  Red Hat would like for us all to confuse
the two, but they jes' ain't the same.  (As a rule of thumb, where it
works right, credit Linux; where it doesn't, blame Red Hat. :-)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] WAL & RC1 status

2001-03-02 Thread Nathan Myers

On Fri, Mar 02, 2001 at 10:54:04AM -0500, Bruce Momjian wrote:
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > Is there a version number in the WAL file?
> > 
> > catversion.h will do fine, no?
> > 
> > > Can we put conditional code in there to create
> > > new log file records with an updated format?
> > 
> > The WAL stuff is *far* too complex already.  I've spent a week studying
> > it and I only partially understand it.  I will not consent to trying to
> > support multiple log file formats concurrently.
> 
> Well, I was thinking a few things.  Right now, if we update the
> catversion.h, we will require a dump/reload.  If we can update just the
> WAL version stamp, that will allow us to fix WAL format problems without
> requiring people to dump/reload.  I can imagine this would be valuable
> if we find we need to make changes in 7.1.1, where we can not require
> dump/reload.

It Seems to Me that after an orderly shutdown, the WAL files should be, 
effectively, slag -- they should contain no deltas from the current 
table contents.  In practice that means the only part of the format that 
*should* matter is whatever it takes to discover that they really are 
slag.

That *should* mean that, at worst, a change to the WAL file format should 
only require doing an orderly shutdown, and then (perhaps) running a simple
program to generate a new-format empty WAL.  It ought not to require an 
initdb.  

Of course the details of the current implementation may interfere with
that ideal, but it seems a worthy goal for the next beta, if it's not
possible already.  Given the opportunity to change the current WAL format, 
it ought to be possible to avoid even needing to run a program to generate 
an empty WAL.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Uh, this is not a 64-bit CRC ...

2001-02-28 Thread Nathan Myers


On Wed, Feb 28, 2001 at 09:17:19PM -0500, Bruce Momjian wrote:
> > On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > > I just took a close look at the COMP_CRC64 macro in xlog.c.
> > > 
> > > This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> > > on just the odd-numbered bytes and one on just the even-numbered bytes
> > > of the datastream.  That's hardly any stronger than a single 32-bit CRC;
> > > it's certainly not what I thought we had agreed to implement.
> > > 
> > > We can't change this algorithm without forcing an initdb, which would be
> > > a rather unpleasant thing to do at this late stage of the release cycle.
> > > But I'm not happy with it.  Comments?
> > 
> > This might be a good time to update:
> > 
> >   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> > 
> > ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> > 
> >   From the README:
> > 
> >   The code in this package has been derived from the BTLib package
> >   obtained from Christian Iseli <[EMAIL PROTECTED]>.
> >   From his mail:
> > 
> >   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> >   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> >   Press.  Pages 896ff.
> > 
> >   The generator polynomial is x64 + x4 + x3 + x1 + 1.
> > 
> > I would suggest that if you don't change the algorithm, at least change
> > the name in the sources.  Were you to #ifdef in a real crc-64, and make 
> > a compile-time option to select the old one, you could allow users who 
> > wish to avoid the initdb a way to continue with the existing pair of 
> > CRC-32s.
>
> Added to TODO:
> 
>   * Correct CRC WAL code to be normal CRC32 algorithm 

Um, how about

  * Correct CRC WAL code to be a real CRC64 algorithm

instead?

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Uh, this is not a 64-bit CRC ...

2001-02-28 Thread Nathan Myers

On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> I just took a close look at the COMP_CRC64 macro in xlog.c.
> 
> This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> on just the odd-numbered bytes and one on just the even-numbered bytes
> of the datastream.  That's hardly any stronger than a single 32-bit CRC;
> it's certainly not what I thought we had agreed to implement.
> 
> We can't change this algorithm without forcing an initdb, which would be
> a rather unpleasant thing to do at this late stage of the release cycle.
> But I'm not happy with it.  Comments?

This might be a good time to update:

  The CRC-64 code used in the SWISS-PROT genetic database is (now) at:

ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz

  From the README:

  The code in this package has been derived from the BTLib package
  obtained from Christian Iseli <[EMAIL PROTECTED]>.
  From his mail:

  The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
  B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
  Press.  Pages 896ff.

  The generator polynomial is x64 + x4 + x3 + x1 + 1.

I would suggest that if you don't change the algorithm, at least change
the name in the sources.  Were you to #ifdef in a real crc-64, and make 
a compile-time option to select the old one, you could allow users who 
wish to avoid the initdb a way to continue with the existing pair of 
CRC-32s.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Re: [PATCHES] A patch for xlog.c

2001-02-26 Thread Nathan Myers

On Sun, Feb 25, 2001 at 11:28:46PM -0500, Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > It allows no backing store on disk.  

I.e. it allows you to map memory without an associated inode; the memory
may still be swapped.  Of course, there is no problem with mapping an 
inode too, so that unrelated processes can join in.  Solarix has a flag
to pin the shared pages in RAM so they can't be swapped out.

> > It is the BSD solution to SysV
> > share memory.  Here are all the BSDi flags:
> 
> >  MAP_ANONMap anonymous memory not associated with any specific
> >  file.  The file descriptor used for creating MAP_ANON
> >  must be -1.  The offset parameter is ignored.
> 
> Hmm.  Now that I read down to the "nonstandard extensions" part of the
> HPUX man page for mmap(), I find
> 
>  If MAP_ANONYMOUS is set in flags:
> 
>   oA new memory region is created and initialized to all zeros.
>This memory region can be shared only with descendants of
>the current process.

This is supported on Linux and BSD, but not on Solarix 7.  It's not 
necessary; you can just map /dev/zero on SysV systems that don't 
have MAP_ANON.

> While I've said before that I don't think it's really necessary for
> processes that aren't children of the postmaster to access the shared
> memory, I'm not sure that I want to go over to a mechanism that makes it
> *impossible* for that to be done.  Especially not if the only motivation
> is to avoid having to configure the kernel's shared memory settings.

There are enormous advantages to avoiding the need to configure kernel 
settings.  It makes PG a better citizen.  PG is much easier to drop in 
and use if you don't need attention from the IT department.

But I don't know of any reason to avoid mapping an actual inode,
so using mmap doesn't necessarily mean giving up sharing among
unrelated processes.

> Besides, what makes you think there's not a limit on the size of shmem
> allocatable via mmap()?

I've never seen any mmap limit documented.  Since mmap() is how 
everybody implements shared libraries, such a limit would be equivalent 
to a limit on how much/many shared libraries are used.  mmap() with 
MAP_ANONYMOUS (or its SysV /dev/zero equivalent) is a common, modern 
way to get raw storage for malloc(), so such a limit would be a limit
on malloc() too.

The mmap architecture comes to us from the Mach microkernel memory
manager, backported into BSD and then copied widely.  Since it was
the fundamental mechanism for all memory operations in Mach, arbitrary
limits would make no sense.  That it worked so well is the reason it 
was copied everywhere else, so adding arbitrary limits while copying 
it would be silly.  I don't think we'll see any systems like that.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Nathan Myers

On Sun, Feb 25, 2001 at 12:41:28AM -0500, Tom Lane wrote:
> Attached are graphs from more thorough runs of pgbench with a commit
> delay that occurs only when at least N other backends are running active
> transactions. ...
> It's not entirely clear what set of parameters is best, but it is
> absolutely clear that a flat zero-commit-delay policy is NOT best.
> 
> The test conditions are postmaster options -N 100 -B 1024, pgbench scale
> factor 10, pgbench -t (transactions per client) 100.  (Hence the results
> for a single client rely on only 100 transactions, and are pretty noisy.
> The noise level should decrease as the number of clients increases.)

It's hard to interpret these results.  In particular, "delay 10k, sibs 20"
(10k,20), or cyan-triangle, is almost the same as "delay 50k, sibs 1" 
(50k,1), or green X.  Those are pretty different parameters to get such
similar results.

The only really bad performers were (0), (10k,1), (100k,20).  The best
were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
Why would 30k be a magic delay, regardless of siblings?  What happened
at 40?

At low loads, it seems (100k,1) (brown +) did best by far, which seems
very odd.  Even more odd, it did pretty well at very high loads but had 
problems at intermediate loads.  

Nathan Myers
[EMAIL PROTECTED]

1 2 >

1 - 100 of 192 matches

Mail list logo