[HACKERS] pgadmin.postgresql.org displaying errors

2002-06-24 Thread Dave Cramer

I am getting lots of errors on pgadmin.postgresql.org

Dave





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Curt Sampson

On Sun, 23 Jun 2002, Bruce Momjian wrote:

 Yes, I don't see writing to two files vs. one to be any win, especially
 when we need to fsync both of them.  What I would really like is to
 avoid the double I/O of writing to WAL and to the data file;  improving
 that would be a huge win.

You mean, the double I/O of writing the block to the WAL and data file?
(We'd still have to write the changed columns or whatever to the WAL,
right?)

I'd just add an option to turn it off. If you need it, you need it;
there's no way around that except to buy hardware that is really going
to guarantee your writes (which then means you don't need it).

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread J. R. Nield

On Sun, 2002-06-23 at 21:29, J. R. Nield wrote:

 If is impossible to do what you want. You can not protect against...
Wow. The number of typo's in that last one was just amazing. I even
started with one.

Have an nice weekend everybody :-)

;jrnield

-- 
J. R. Nield
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] pg_dump and ALTER TABLE / ADD FOREIGN KEY

2002-06-24 Thread Christopher Kings-Lynne

 Some have expressed that this could be quite slow for large databases,
 and want a type of:

 SET CONSTRAINTS UNCHECKED;

 However, others don't believe constraints other than foreign keys
 should go unchecked.

Well, at the moment remember taht all that other SET CONSTRAINTS commands
only affect foreign keys.  However, this is a TODO to allow deferrable
unique constraints.

 Or would the below be more appropriate?:
 ALTER TABLE tab ADD FOREIGN KEY  TRUST EXISTING DATA;

Maybe instead of TRUST EXISTING DATA, it could be just be WITHOUT CHECK or
something that uses existing keywords?

Either way, it must be a superuser-only command.  I'm kinda beginning to
favour the latter now actually...

Except if we could make all constraints uncheckable, then restoring a dump
would be really fast (but risky!)

Chris






---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





[HACKERS] Use of LOCAL in SET command

2002-06-24 Thread Peter Eisentraut

Sorry to nag about this so late, but I fear that the new command SET LOCAL
will cause some confusion later on.

SQL uses LOCAL to mean the local node in a distributed system (SET LOCAL
TRANSACTION ...) and the current session as opposed to all sessions (local
temporary table).  The new SET LOCAL command adds the meaning this
transaction only.  Instead we could simply use SET TRANSACTION, which
would be consistent in behaviour with the SET TRANSACTION ISOLATION LEVEL
command.

Comments?

-- 
Peter Eisentraut   [EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] Our archive searching stinks

2002-06-24 Thread Vince Vielhaber

On Thu, 20 Jun 2002, Bruce Momjian wrote:

 OK, I have finally decided that our archive searching stinks.  I have
 emails in my mailbox that don't appear in the archives.

 Our main site, http://archives.postgresql.org/ doesn't archive the
 'patches' list.  (It isn't listed on the main site, and I can't find
 postings via searching.) Also, why does it open a separate window for
 each email.  That doesn't make any sense to me.

 My backup is Google,
 http://groups.google.com/groups?hl=engroup=comp.databases.postgresql,
 but that seems to be missing emails too.  Our email/news link regulary
 drops messages and therefore Google can't see them.

 It isn't one thing, but a general lack of quality in this area.  Heck,
 we had no usable archives for _months_.  Is this really only important
 to me?

 Oh, I see FTS is back working at http://fts.postgresql.org/db/mw/.  I
 like the output format, but all three are give me different results.
 However, fts is invisible because I can't find a link to it from
 anywhere on our web pages.

 I guess I am asking:

   Can our main archive start doing the patches list?
   Can it stop opening a new window for every email?
   Can we find out why the email/news gateway drops messages?
   Can we link to the fts site?

The only thing I can help with is the fts link, but I'm hesitant to
link to something that disappears.  If it's going to be here and not
go away again I'll be happy to add it.

Vince.
-- 
==
Vince Vielhaber -- KA8CSHemail: [EMAIL PROTECTED]http://www.pop4.net
 56K Nationwide Dialup from $16.00/mo at Pop4 Networking
Online Campground Directoryhttp://www.camping-usa.com
   Online Giftshop Superstorehttp://www.cloudninegifts.com
==






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread J. R. Nield

On Sun, 2002-06-23 at 15:36, Bruce Momjian wrote:
 Yes, I don't see writing to two files vs. one to be any win, especially
 when we need to fsync both of them.  What I would really like is to
 avoid the double I/O of writing to WAL and to the data file;  improving
 that would be a huge win.
 

If is impossible to do what you want. You can not protect against
partial writes without writing pages twice and calling fdatasync between
them while going through a generic filesystem. The best disk array will
not protect you if the operating system does not align block writes to
the structure of the underlying device. Even with raw devices, you need
special support or knowledge of the operating system and/or the disk
device to ensure that each write request will be atomic to the
underlying hardware. 

All other systems rely on the fact that you can recover a damaged file
using the log archive. This means downtime in the rare case, but no data
loss. Until PostgreSQL can do this, then it will not be acceptable for
real critical production use. This is not to knock PostgreSQL, because
it is a very good database system, and clearly the best open-source one.
It even has feature advantages over the commercial systems. But at the
end of the day, unless you have complete understanding of the I/O system
from write(2) through to the disk system, the only sure ways to protect
against partial writes are by careful writes (in the WAL log or
elsewhere, writing pages twice), or by requiring (and allowing) users to
do log-replay recovery when a file is corrupted by a partial write. As
long as there is a UPS, and the operating system doesn't crash, then
there still should be no partial writes.

If we log pages to WAL, they are useless when archived (after a
checkpoint). So either we have a separate log for them (the ping-pong
file), or we should at least remove them when archived, which makes log
archiving more complex but is perfectly doable.

Finally, I would love to hear why we are using the operating system
buffer manager at all. The OS is acting as a secondary buffer manager
for us. Why is that? What flaw in our I/O system does this reveal? I
know that:

We sync only WAL, not the other pages, except for the sync() call we do
 during checkpoint when we discard old WAL files.

But this is probably not a good thing. We should only be writing blocks
when they need to be on disk. We should not be expecting the OS to write
them sometime later and avoid blocking (as long) for the write. If we
need that, then our buffer management is wrong and we need to fix it.
The reason we are doing this is because we expect the OS buffer manager
to do asynchronous I/O for us, but then we don't control the order. That
is the reason why we have to call fdatasync(), to create sequence
points.

The reason we have performance problems with either D_OSYNC or fdatasync
on the normal relations is because we have no dbflush process. This
causes an unacceptable amount of I/O blocking by other transactions.

The ORACLE people were not kidding when they said that they could not
certify Linux for production use until it supported O_DSYNC. Can you
explain why that was the case?

Finally, let me apologize if the above comes across as somewhat
belligerent. I know very well that I can't compete with you guys for
knowledge of the PosgreSQL system. I am still at a loss when I look at
the optimizer and executor modules, and it will take some time before I
can follow discussion of that area. Even then, I doubt my ability to
compare with people like Mr. Lane and Mr. Momjian in experience and
general intelligence, or in the field of database programming and
software development in particular. However, this discussion and a
search of the pgsql-hackers archives reveals this problem to be the KEY
area of PostgreSQL's failing, and general misunderstanding, when
compared to its commercial competitors.

Sincerely, 

J. R. Nield

-- 
J. R. Nield
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] ODBC Driver 7.02.0001 (Win32) (Unicode mode): CRLF-LF works, LF-CRLF doesn't

2002-06-24 Thread Julian Mehnle

Hiroshi Inoue [EMAIL PROTECTED] wrote:
 Julian Mehnle [EMAIL PROTECTED] wrote:
  Recently I tried to use the new 7.02.0001 Win32 ODBC driver in the new
  (beta) Unicode mode in conjunction with MS Access 2000 and a UNICODE
  encoded database stored in a PostgreSQL 7.2.1 database running on a
  Linux system.
 
  I noticed that when the LF-CRLF Conversion option is *enabled* in
  the driver's settings dialog, only a CRLF-LF conversion (while writing
  to the database) is performed, but no LF-CRLF conversion (while reading
  from the database)!

 Could you try the snapshot dll at http://w2422.nsk.ne.jp/~inoue/ ?

There are several versions and I don't know which one you mean:
- psqlodbc.dll (version 7.02.0001),
- psqlodbc.dll (the multibyte version),
- psqlodbc30.dll (7.02.0001 ODBC3.0 trial driver),
- psqlodbc30.dll (the multibyte version),
- another trial driver(7.02.0001 + ODBC3.0 + Unicode)?

Please give me a hint! :-)

Julian Mehnle.




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] COPY syntax improvement

2002-06-24 Thread Bruce Momjian


Well, good points.  I know there were some people who wanted a clearer
syntax, so I supplied it.  Seems you don't.  I would like to hear from
someone else who doesn't like the improved syntax before I consider
changing things back.

---

Peter Eisentraut wrote:
 Bruce Momjian writes:
 
  I thought there were complaints that the old COPY syntax just had too
  many features stuffed in too many unusual places,
 
 Haven't ever seen one.  This command has no precedent in other products,
 only years of going virtually unchanged in PostgreSQL.  Changing it now
 and allowing countless permutations of the key words is going to be
 confusing, IMHO.
 
  e.g. delimiter after
  filename,
 
 COPY is the only command to use a delimiter, so this can hardly be
 qualified as an unusual place.
 
  oids after tablename,
 
 That's because the OIDs are in said table.
 
  binary after COPY,
 
 Which is consistent with DECLARE BINARY CURSOR.
 
  NULL after delimiter.
 
 OK, that order should perhaps be more flexible.
 
 -- 
 Peter Eisentraut   [EMAIL PROTECTED]
 
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-24 Thread Christopher Kings-Lynne

I was thinking of writing a command line tool like 'pgtune' that looks at
the stats views and will generate SQL code for, or do automatically the
following:

* Dropping indices that are never used
* Creating appropriate indices to avoid large, expensive sequential scans.

This would put us in the 'mysql makes my indices for me by magic' league -
but would be far more powerful and flexible.  How to do multikey indices is
beyond me tho.

*sigh* I'm recovering from a septoplasty on my nose atm, so I might have
some time to do some coding!

Chris

- Original Message -
From: Bruce Momjian [EMAIL PROTECTED]
To: Martijn van Oosterhout [EMAIL PROTECTED]
Cc: PostgreSQL-development [EMAIL PROTECTED]
Sent: Friday, June 21, 2002 10:50 AM
Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector


 Martijn van Oosterhout wrote:
  Since it's currently all for collecting statistics on tables, why can't
it
  collect another type of statistic, like:
 
  - How often the estimator gets it wrong?
 
  At the end of an index scan, the executor could compare the number of
rows
  returned against what was estimated, and if it falls outside a certain
  range, flag it.
 
  Also, the average ratio of rows coming out of a distinct node vs the
number
  going in.
 
  For a join clause, the amount of correlation between two columns (hard).
 
  etc
 
  Ideally, the planner could then use this info to make better plans.
  Eventually, the whole system could become somewhat self-tuning.
 
  Does anyone see any problems with this?

 [ Discussion moved to hackers.]

 I have thought that some type of feedback from the executor back into
 the optimizer would be a good feature.  Not sure how to do it, but your
 idea makes sense.  It certainly could update the table statistics after
 a sequential scan.

 --
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?

 http://archives.postgresql.org





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] ecpg and bison again

2002-06-24 Thread Thomas Lockhart

 I get
 cvs [server aborted]: cannot write /cvsroot/CVSROOT/val-tags: Permission denied
 This seems to be a server message.

I see the same thing when trying to update a tree to this branch using
local cvs on mcvsup.postgresql.org. The file is owned by scrappy and has
no group write permissions.

I use CVSup, and looking at the permissions on my local CVS repository
(which does allow me to work with the branch) that file is group (and
world!?) writable:

29501576 -rw-rw-rw-1 thomas   thomas 33 Jun 23 07:37
val-tags

And looking at another CVS repository with known good behavior I see
that the file is group-writable.

scrappy, can you adjust the permissions on /cvsroot/CVSROOT to allow
group writes, and adjust the permissions on /cvsroot/CVSROOT/val-tags to
allow group writes? Perhaps it is just the permissions on the directory
which are the problem, but it seems that no branch operations are
currently allowed :(

  - Thomas



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] ODBC Driver 7.02.0001 (Win32) (Unicode mode): CRLF-LF

2002-06-24 Thread Hiroshi Inoue
"Julian Mehnle, Linksystem Muenchen" wrote:
 
 Hi all!
 
 Recently I tried to use the new 7.02.0001 Win32 ODBC driver in the new
 (beta) Unicode mode in conjunction with MS Access 2000 and a "UNICODE"
 encoded database stored in a PostgreSQL 7.2.1 database running on a
 Linux system.
 
 I noticed that when the "LF-CRLF Conversion" option is *enabled* in
 the driver's settings dialog, only a CRLF-LF conversion (while writing
 to the database) is performed, but no LF-CRLF conversion (while reading
 from the database)!

Could you try the snapshot dll at http://w2422.nsk.ne.jp/~inoue/ ?

regards,
Hiroshi Inoue
http://w2422.nsk.ne.jp/~inoue/



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] ident-des patches

2002-06-24 Thread David M. Kaplan

Hi,

I added the code to make IDENT authentification work even if the 
responses are DES encrypted.  The changes are contained in the attached 
tar.gz file.

There is a readme included in the tar.gz which explains things.  The tar 
file contains the following files:

ident-des.patch
src/backend/libpq/ident-des.c
src/include/libpq/ident-des.h
README.ident-des

Thanks,
David Kaplan




ident-des.pgsql.tar.gz
Description: GNU Zip compressed data


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Bruce Momjian

J. R. Nield wrote:
 On Sun, 2002-06-23 at 15:36, Bruce Momjian wrote:
  Yes, I don't see writing to two files vs. one to be any win, especially
  when we need to fsync both of them.  What I would really like is to
  avoid the double I/O of writing to WAL and to the data file;  improving
  that would be a huge win.
  
 
 If is impossible to do what you want. You can not protect against
 partial writes without writing pages twice and calling fdatasync between
 them while going through a generic filesystem. The best disk array will
 not protect you if the operating system does not align block writes to
 the structure of the underlying device. Even with raw devices, you need
 special support or knowledge of the operating system and/or the disk
 device to ensure that each write request will be atomic to the
 underlying hardware. 

Yes, I suspected it was impossible, but that doesn't mean I want it any
less.  ;-)

 All other systems rely on the fact that you can recover a damaged file
 using the log archive. This means downtime in the rare case, but no data
 loss. Until PostgreSQL can do this, then it will not be acceptable for
 real critical production use. This is not to knock PostgreSQL, because
 it is a very good database system, and clearly the best open-source one.
 It even has feature advantages over the commercial systems. But at the
 end of the day, unless you have complete understanding of the I/O system
 from write(2) through to the disk system, the only sure ways to protect
 against partial writes are by careful writes (in the WAL log or
 elsewhere, writing pages twice), or by requiring (and allowing) users to
 do log-replay recovery when a file is corrupted by a partial write. As
 long as there is a UPS, and the operating system doesn't crash, then
 there still should be no partial writes.

You are talking point-in-time recovery, a major missing feature right
next to replication, and I agree it makes PostgreSQL unacceptable for
some applications.  Point taken.

And the interesting thing you are saying is that with point-in-time
recovery, we don't need to write pre-write images of pages because if we
detect a partial page write, we then abort the database and tell the
user to do a point-in-time recovery, basically meaning we are using the
previous full backup as our pre-write page image and roll forward using
the logical logs.  This is clearly a nice thing to be able to do because
it let's you take a pre-write image of the page once during full backup,
keep it offline, and bring it back in the rare case of a full page write
failure.  I now can see how the MSSQL tearoff-bits would be used, not
for recovery, but to detect a partial write and force a point-in-time
recovery from the administrator.


 If we log pages to WAL, they are useless when archived (after a
 checkpoint). So either we have a separate log for them (the ping-pong
 file), or we should at least remove them when archived, which makes log
 archiving more complex but is perfectly doable.

Yes, that is how we will do point-in-time recovery;  remove the
pre-write page images and archive the rest.  It is more complex, but
having the fsync all in one file is too big a win.

 Finally, I would love to hear why we are using the operating system
 buffer manager at all. The OS is acting as a secondary buffer manager
 for us. Why is that? What flaw in our I/O system does this reveal? I
 know that:
 
 We sync only WAL, not the other pages, except for the sync() call we do
  during checkpoint when we discard old WAL files.
 
 But this is probably not a good thing. We should only be writing blocks
 when they need to be on disk. We should not be expecting the OS to write
 them sometime later and avoid blocking (as long) for the write. If we
 need that, then our buffer management is wrong and we need to fix it.
 The reason we are doing this is because we expect the OS buffer manager
 to do asynchronous I/O for us, but then we don't control the order. That
 is the reason why we have to call fdatasync(), to create sequence
 points.

Yes.  I think I understand.  It is true we have to fsync WAL because we
can't control the individual writes by the OS.

 The reason we have performance problems with either D_OSYNC or fdatasync
 on the normal relations is because we have no dbflush process. This
 causes an unacceptable amount of I/O blocking by other transactions.

Uh, that would force writes all over the disk. Why do we really care how
the OS writes them?  If we are going to fsync, let's just do the one
file and be done with it.  What would a separate flusher process really
buy us  if it has to use fsync too. The main backend doesn't have to
wait for the fsync, but then again, we can't say the transaction is
committed until it hits the disk, so how does a flusher help?

 The ORACLE people were not kidding when they said that they could not
 certify Linux for production use until it supported O_DSYNC. Can you
 explain why that was the case?

I don't see 

Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Bruce Momjian

Curt Sampson wrote:
 On 23 Jun 2002, J. R. Nield wrote:
 
  So since we have all this buffering designed especially to meet our
  needs, and since the OS buffering is in the way, can someone explain to
  me why postgresql would ever open a file without the O_DSYNC flag if the
  platform supports it?
 
 It's more code, if there are platforms out there that don't support
 O_DYSNC. (We still have to keep the old fsync code.) On the other hand,
 O_DSYNC could save us a disk arm movement over fsync() because it
 appears to me that fsync is also going to force a metadata update, which
 means that the inode blocks have to be written as well.

Again, see postgresql.conf:

#wal_sync_method = fsync# the default varies across platforms:
#   # fsync, fdatasync, open_sync, or open_datasync

 
  Maybe fsync would be slower with two files, but I don't see how
  fdatasync would be, and most platforms support that.
 
 Because, if both files are on the same disk, you still have to move
 the disk arm from the cylinder at the current log file write point
 to the cylinder at the current ping-pong file write point. And then back
 again to the log file write point cylinder.
 
 In the end, having a ping-pong file as well seems to me unnecessary
 complexity, especially when anyone interested in really good
 performance is going to buy a disk subsystem that guarantees no
 torn pages and thus will want to turn off the ping-pong file writes
 entirely, anyway.

Yes, I don't see writing to two files vs. one to be any win, especially
when we need to fsync both of them.  What I would really like is to
avoid the double I/O of writing to WAL and to the data file;  improving
that would be a huge win.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Suggestions for implementing IS DISTINCT FROM?

2002-06-24 Thread Thomas Lockhart

 I've actually already done almost all the work for converting BETWEEN to a
 node but I have a couple of questions:
 Should I use a boolean in the node to indicate whether it is SYMMETRIC or
 ASYMMETRIC, or should I use some sort of integer to indicate whether it is
 SYMMETRIC, ASYMMETRIC or DEFAULT (ASYMMETRIC).  That way the reverse in
 rules and views could leave out the ASYMMETRIC if it wasn't specified
 originally, rather than always adding it in.  Which is better?

Great! 

I would use a boolean (or integer) to indicate two possibilities, not
three.

The language specifies what the default should be, and dump programs
could choose to omit the ASYMMETRIC if they choose. imho it is best to
resolve defaults earlier, rather than pushing the resolution deep into
the parser or even farther.

   - Thomas



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Curt Sampson

On 23 Jun 2002, J. R. Nield wrote:

 If is impossible to do what you want. You can not protect against
 partial writes without writing pages twice and calling fdatasync
 between them while going through a generic filesystem.

I agree with this.

 The best disk array will not protect you if the operating system does
 not align block writes to the structure of the underlying device.

This I don't quite understand. Assuming you're using a SCSI drive
(and this mostly applies to ATAPI/IDE, too), you can do naught but
align block writes to the structure of the underlying device. When you
initiate a SCSI WRITE command, you start by telling the device at which
block to start writing and how many blocks you intend to write. Then you
start passing the data.

(See http://www.danbbs.dk/~dino/SCSI/SCSI2-09.html#9.2.21 for parameter
details for the SCSI WRITE(10) command. You may find the SCSI 2
specification, at http://www.danbbs.dk/~dino/SCSI/ to be a useful
reference here.)

 Even with raw devices, you need special support or knowledge of the
 operating system and/or the disk device to ensure that each write
 request will be atomic to the underlying hardware.

Well, so here I guess you're talking about two things:

1. When you request, say, an 8K block write, will the OS really
write it to disk in a single 8K or multiple of 8K SCSI write
command?

2. Does the SCSI device you're writing to consider these writes to
be transactional. That is, if the write is interrupted before being
completed, does the SCSI device guarantee that the partially-sent
data is not written, and the old data is maintained? And of course,
does it guarantee that, when it acknowledges a write, that write is
now in stable storage and will never go away?

Both of these are not hard to guarantee, actually. For a BSD-based OS,
for example, just make sure that your filesystem block size is the
same as or a multiple of the database block size. BSD will never write
anything other than a block or a sequence of blocks to a disk in a
single SCSI transaction (unless you've got a really odd SCSI driver).
And for your disk, buy a Baydel or Clarion disk array, or something
similar.

Given that it's not hard to set up a system that meets these criteria,
and this is in fact commonly done for database servers, it would seem a
good idea for postgres to have the option to take advantage of the time
and money spent and adjust its performance upward appropriately.

 All other systems rely on the fact that you can recover a damaged file
 using the log archive.

Not exactly. For MS SQL Server, at any rate, if it detects a page tear
you cannot restore based on the log file alone. You need a full or
partial backup that includes that entire torn block.

 This means downtime in the rare case, but no data loss. Until
 PostgreSQL can do this, then it will not be acceptable for real
 critical production use.

It seems to me that it is doing this right now. In fact, it's more
reliable than some commerial systems (such as SQL Server) because it can
recover from a torn block with just the logfile.

 But at the end of the day, unless you have complete understanding of
 the I/O system from write(2) through to the disk system, the only sure
 ways to protect against partial writes are by careful writes (in
 the WAL log or elsewhere, writing pages twice), or by requiring (and
 allowing) users to do log-replay recovery when a file is corrupted by
 a partial write.

I don't understand how, without a copy of the old data that was in the
torn block, you can restore that block from just log file entries. Can
you explain this to me? Take, as an example, a block with ten tuples,
only one of which has been changed recently. (I.e., only that change
is in the log files.)

 If we log pages to WAL, they are useless when archived (after a
 checkpoint). So either we have a separate log for them (the
 ping-pong file), or we should at least remove them when archived,
 which makes log archiving more complex but is perfectly doable.

Right. That seems to me a better option, since we've now got only one
write point on the disk rather than two.

 Finally, I would love to hear why we are using the operating system
 buffer manager at all. The OS is acting as a secondary buffer manager
 for us. Why is that? What flaw in our I/O system does this reveal?

It's acting as a second-level buffer manager, yes, but to say it's
secondary may be a bit misleading. On most of the systems I've set
up, the OS buffer cache is doing the vast majority of the work, and the
postgres buffering is fairly minimal.

There are some good (and some perhaps not-so-good) reasons to do it this
way. I'll list them more or less in the order of best to worst:

1. The OS knows where the blocks physically reside on disk, and
postgres does not. Therefore it's in the interest of postgresql to
dispatch write responsibility back to the OS as quickly as possible
so that the OS can prioritize 

Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Peter Eisentraut

Rod Taylor writes:

  I'm also looking at the SQL99 INFORMATION_SCHEMA views. Is anyone
  already defining these? Is someone interested in picking this up?
 I've
  got some definitions in a contrib-style directory but have not yet
  mapped them to PostgreSQL.

 I have a few of the basics done, but nothing really significant.

I guess I'll polish what I have and will commit it so that the group can
fill in the rest at convenience.

-- 
Peter Eisentraut   [EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] pg_dump and ALTER TABLE / ADD FOREIGN KEY

2002-06-24 Thread Stephan Szabo


On Sat, 22 Jun 2002, Matthew T. O'Connor wrote:

  However, others don't believe constraints other than foreign keys
  should go unchecked.
 
  That said, is this functionality wanted outside of pg_dump /
  pg_restore?

 pg_dump should reload a database as it was stored in the previous database.
 If  your old data is not clean, pg_dump / restore is not a very good tool for
 cleaning it up.  I think ignoring contrains is a good thing if it will load
 the data faster (at least when you are doing a database backup / restore).
 Why can't we do all alter table commands (that add constraints) after we load
 the data, that way we don't need to alter syntax at all.

That doesn't help.  ALTER TABLE checks the constraint at the time the
alter table is issued since the constraint must be satisified by the
current data. Right now that check is basically run the trigger for each
row checking it, which is probably sub-optimal since it could be one
statement, but changing that won't prevent it from being slow on big
tables.




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] COPY syntax improvement

2002-06-24 Thread Peter Eisentraut

Bruce Momjian writes:

 I thought there were complaints that the old COPY syntax just had too
 many features stuffed in too many unusual places,

Haven't ever seen one.  This command has no precedent in other products,
only years of going virtually unchanged in PostgreSQL.  Changing it now
and allowing countless permutations of the key words is going to be
confusing, IMHO.

 e.g. delimiter after
 filename,

COPY is the only command to use a delimiter, so this can hardly be
qualified as an unusual place.

 oids after tablename,

That's because the OIDs are in said table.

 binary after COPY,

Which is consistent with DECLARE BINARY CURSOR.

 NULL after delimiter.

OK, that order should perhaps be more flexible.

-- 
Peter Eisentraut   [EMAIL PROTECTED]




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Suggestions for implementing IS DISTINCT FROM?

2002-06-24 Thread Thomas Lockhart

  I'm looking at implementing IS DISTINCT FROM, among other things.
  ...
  I was thinking to implement this by simply expanding these rules within
  gram.y to be a tree of comparison tests.
 Please, please, do not do that.  Make a new expression node tree type,
 instead.  We've made this mistake before (eg for BETWEEN) and I don't
 want to do it again.

Uh, sure. If you don't quote out of context I think it is pretty clear
that I was looking for a helpful suggestion to do just that. Thanks,
I'll proceed with the assurance that you won't object to *that* too ;)

 - Thomas



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





[HACKERS]

2002-06-24 Thread Foo

Version 7.2.1, RH 7.3, installed from RPM.

Following error occurs:

amber_ws= delete from samples;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!

The table looks like this:

CREATE TABLE samples (
id integer DEFAULT nextval('samples_id_seq'::varchar(32)),
fieldtrip_in integer
REFERENCES fieldtrips ON DELETE RESTRICT ON UPDATE CASCADE,
fieldtrip_out integer
REFERENCES fieldtrips ON DELETE RESTRICT ON UPDATE CASCADE,
site_name varchar(32)
REFERENCES sites ON DELETE RESTRICT ON UPDATE CASCADE,
collector_type varchar(32)
REFERENCES collector_types ON DELETE RESTRICT ON UPDATE CASCADE,
depth varchar(32)
REFERENCES depth_types ON DELETE RESTRICT ON UPDATE CASCADE,
replicate varchar(32)
REFERENCES replicate_set ON DELETE RESTRICT ON UPDATE CASCADE,
lost_bool boolean default false,
buoy varchar(32),
comments varchar(32),
Constraint samples_pkey Primary Key (id)
);

The single record in it looks like this:

amber_ws= select * from samples;
-[ RECORD 1 ]--+--
id| 1
fieldtrip_in   | 1
fieldtrip_out | 1
site_name|
collector_type |
depth  |
replicate  |
lost_bool | f
buoy   |
comments   |

No tables are using it as a REFERENCES target.

Let me know if I can help more.  I am not root on the box, so I am not going
to try attaching gdb to anything tonight.  However, the root user and I
would be quite happy to do so later.

Thanks
Webb


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Bruce Momjian

J. R. Nield wrote:
 So since we have all this buffering designed especially to meet our
 needs, and since the OS buffering is in the way, can someone explain to
 me why postgresql would ever open a file without the O_DSYNC flag if the
 platform supports it? 

We sync only WAL, not the other pages, except for the sync() call we do
during checkpoint when we discard old WAL files.

  I concur with Bruce: the reason we keep page images in WAL is to
  minimize the number of places we have to fsync, and thus the amount of
  head movement required for a commit.  Putting the page images elsewhere
  cannot be a win AFAICS.
 
 
 Why not put all the page images in a single pre-allocated file and treat
 it as a ring? How could this be any worse than flushing them in the WAL
 log? 
 
 Maybe fsync would be slower with two files, but I don't see how
 fdatasync would be, and most platforms support that. 

We have fdatasync option for WAL in postgresql.conf.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Thomas Lockhart

  It doesn't match perfectly in that one field is ignored as being
  (afaict) redundant for us. The basic definition from SQL99 is
  CREATE CAST(from AS to) WITH FUNCTION func(args) [AS ASSIGNMENT]
  I can map this to something equivalent to
  CREATE FUNCTION to(from) RETURNS to AS 'select func($1)' LANGUAGE 'sql';
  with another clause or two to get the implicit coersion enabled, and
  ignoring the args field(s).
 I think this is wrong.  When you call CREATE CAST ... WITH FUNCTION
 func(args)  then func(args) must already exist.

Right. And that is what is required for SQL99 also afaict. There are not
enough clauses in the SQL99 syntax to allow anything else!

 So the closest you could
 map it to would be
 ALTER FUNCTION to(from) IMPLICIT CAST

That would require that the function to be used as the cast have the
same name as the underlying PostgreSQL conventions for casting
functions. The implementation I've done does not require this; it
basically defines a new SQL function with a body of

select func($1)

where func is the name specified in the WITH FUNCTION func(args)
clause. It does hang together in the way SQL99 intends and in a way
which is consistant with PostgreSQL's view of the world.

But, I've also implemented alternate forms which would allow one not
define a separate function beforehand. So the nice PostgreSQL feature of
allowing function names to be different than the entry points can be
used.

 iff the name of the function and the target data type agree.  (Of course
 this command doesn't exit, but you get the idea.)  The SQL99 feature is
 more general than ours, but in order to use if effectively we would need
 to maintain another index on pg_proc.  Tom Lane once opined that that
 would be too costly.

I don't follow you here, but the implementation I have is consistant
with SQL99 (or at least with the way I'm interpreting it :)

 - Thomas



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Peter Eisentraut

Thomas Lockhart writes:

 It doesn't match perfectly in that one field is ignored as being
 (afaict) redundant for us. The basic definition from SQL99 is

 CREATE CAST(from AS to) WITH FUNCTION func(args) [AS ASSIGNMENT]

 I can map this to something equivalent to

 CREATE FUNCTION to(from) RETURNS to AS 'select func($1)' LANGUAGE 'sql';

 with another clause or two to get the implicit coersion enabled, and
 ignoring the args field(s).

I think this is wrong.  When you call CREATE CAST ... WITH FUNCTION
func(args)  then func(args) must already exist.  So the closest you could
map it to would be

ALTER FUNCTION to(from) IMPLICIT CAST

iff the name of the function and the target data type agree.  (Of course
this command doesn't exit, but you get the idea.)  The SQL99 feature is
more general than ours, but in order to use if effectively we would need
to maintain another index on pg_proc.  Tom Lane once opined that that
would be too costly.

-- 
Peter Eisentraut   [EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] A fairly obvious optimization?

2002-06-24 Thread Bruce Momjian


FAQ updated in section 4.8: My queries are slow or don't make use of the
indexes. Why?

is returned.  In fact, though MAX() and MIN() don't use indexes,   
it is possible to retrieve such values using an index with ORDER BY
and LIMIT:
PRE
SELECT col
FROM tab
ORDER BY col
LIMIT 1
/PRE

---

Zeugswetter Andreas SB SD wrote:
 
  The select(min) and select(max) took as long as the table scan to find
  the count.  It seems logical if a btree type index is available (such
  as pk_cnx_ds_sis_bill_detl_tb) where the most significant bit of the
  index is the column requested, it should be little more than a seek
  first or seek last in the btree.  Obviously, it won't work with a hashed
  index (which is neither here nor there).
 
 In the meantime you can use:
 select extr_stu_id from cnx_ds_sis_bill_detl_tb order by 1 desc limit 1; -- max
 select extr_stu_id from cnx_ds_sis_bill_detl_tb order by 1 asc limit 1; -- min
 
 I guess that is the reason why nobody felt really motivated to implement
 this optimization. Besides these statements are more powerful, since they can fetch 
 other columns from this min/max row. The down side is, that this syntax varies across
 db vendors, but most (all?) have a corresponding feature nowadays.
 
 select first 1
 select top 1 ...
 
 This is actually becoming a FAQ :-)
 
 Andreas
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





[HACKERS] Sporatic Server Downtime ...

2002-06-24 Thread Marc G. Fournier


Just a quick heads up ... I've asked Rackspace to investigate *why* the
server crashes every 24-48hrs, and given them carte-blanche to get it
fixed ... they are planning on swapping out/in hardware, as right now that
appears to be where the error messages are indicating ...





---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





[HACKERS] pg_restore: [archiver] input file does not appear to be a valid archive

2002-06-24 Thread James Thornton

I upgrade from PG 7.1.3 to 7.2, and I am trying to restore my dbs but I
keep getting:

[nsadmin@roam backup-20020622]$ pg_restore all-good.dmp 
pg_restore: [archiver] input file does not appear to be a valid archive

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] Suggestions for implementing IS DISTINCT FROM?

2002-06-24 Thread Christopher Kings-Lynne

 Please, please, do not do that.  Make a new expression node tree type,
 instead.  We've made this mistake before (eg for BETWEEN) and I don't
 want to do it again.

I've actually already done almost all the work for converting BETWEEN to a
node but I have a couple of questions:

Should I use a boolean in the node to indicate whether it is SYMMETRIC or
ASYMMETRIC, or should I use some sort of integer to indicate whether it is
SYMMETRIC, ASYMMETRIC or DEFAULT (ASYMMETRIC).  That way the reverse in
rules and views could leave out the ASYMMETRIC if it wasn't specified
originally, rather than always adding it in.  Which is better?

Chris





---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] pgadmin.postgresql.org displaying errors

2002-06-24 Thread Marc G. Fournier


should already be fixed ...

On 23 Jun 2002, Dave Cramer wrote:

 I am getting lots of errors on pgadmin.postgresql.org

 Dave





 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]







---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] pg_dump and ALTER TABLE / ADD FOREIGN KEY

2002-06-24 Thread Rod Taylor

On 2002.06.23 01:23 Christopher Kings-Lynne wrote:
  Some have expressed that this could be quite slow for large
 databases,
  and want a type of:
 
  SET CONSTRAINTS UNCHECKED;
 
  However, others don't believe constraints other than foreign keys
  should go unchecked.
 
 Well, at the moment remember taht all that other SET CONSTRAINTS
 commands
 only affect foreign keys.  However, this is a TODO to allow deferrable
 unique constraints.
 
  Or would the below be more appropriate?:
  ALTER TABLE tab ADD FOREIGN KEY  TRUST EXISTING DATA;
 
 Maybe instead of TRUST EXISTING DATA, it could be just be WITHOUT
 CHECK or
 something that uses existing keywords?

WITHOUT CHECK doesn't sound right.   'Make a foreign key but don't 
enforce it'.

WITHOUT BACKCHECKING, WITHOUT ENFORCING CURRENT, ...

Anyway you look at it it's going to further break loading pgsql backups 
into another database.  Atleast the set constraints line will be 
errored out on most other DBs -- but the foreign key will still be 
created.

SET FKEY_CONSTRAINTS TO UNCHECKED;

 Except if we could make all constraints uncheckable, then restoring a
 dump
 would be really fast (but risky!)

No more risky than simply avoiding foreign key constraints.  A unique 
key is a simple matter to fix usually, foreign keys are not so easy 
when you get into the double / triple keys




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] pgadmin.postgresql.org displaying errors

2002-06-24 Thread Dave Page



 -Original Message-
 From: Dave Cramer [mailto:[EMAIL PROTECTED]] 
 Sent: 24 June 2002 01:25
 To: PostgreSQL Hacker
 Subject: [HACKERS] pgadmin.postgresql.org displaying errors
 
 
 I am getting lots of errors on pgadmin.postgresql.org
 
 Dave

Looks OK now...

Thanks anyway, Dave.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Suggestions for implementing IS DISTINCT FROM?

2002-06-24 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 Should I use a boolean in the node to indicate whether it is SYMMETRIC or
 ASYMMETRIC, or should I use some sort of integer to indicate whether it is
 SYMMETRIC, ASYMMETRIC or DEFAULT (ASYMMETRIC).  That way the reverse in
 rules and views could leave out the ASYMMETRIC if it wasn't specified
 originally, rather than always adding it in.  Which is better?

My intention is to reverse-list as either BETWEEN or BETWEEN SYMMETRIC.
While I believe in reproducing the source text during reverse listing,
I don't take it to extremes ;-)

So a boolean is sufficient.

regards, tom lane



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 The only thing I've been able to think of that seems like it might
 improve matters is to make the WAL writing logic aware of the layout
 of buffer pages --- specifically, to know that our pages generally
 contain an uninteresting hole in the middle, and not write the hole.
 Optimistically this might reduce the WAL data volume by something
 approaching 50%; though pessimistically (if most pages are near full)
 it wouldn't help much.

 Good idea.  How about putting the page through or TOAST compression
 routine before writing it to WAL?  Should be pretty easy and fast and
 doesn't require any knowledge of the page format.

Easy, maybe, but fast definitely NOT.  The compressor is not speedy.
Given that we have to be holding various locks while we build WAL
records, I do not think it's a good idea to add CPU time there.

Also, compressing already-compressed data is not a win ...

regards, tom lane



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Tom Lane

 On Sun, 23 Jun 2002, Bruce Momjian wrote:
 Yes, I don't see writing to two files vs. one to be any win, especially
 when we need to fsync both of them.  What I would really like is to
 avoid the double I/O of writing to WAL and to the data file;  improving
 that would be a huge win.

I don't believe it's possible to eliminate the double I/O.  Keep in mind
though that in the ideal case (plenty of shared buffers) you are only
paying two writes per modified block per checkpoint interval --- one to
the WAL during the first write of the interval, and then a write to the
real datafile issued by the checkpoint process.  Anything that requires
transaction commits to write data blocks will likely result in more I/O
not less, at least for blocks that are modified by several successive
transactions.

The only thing I've been able to think of that seems like it might
improve matters is to make the WAL writing logic aware of the layout
of buffer pages --- specifically, to know that our pages generally
contain an uninteresting hole in the middle, and not write the hole.
Optimistically this might reduce the WAL data volume by something
approaching 50%; though pessimistically (if most pages are near full)
it wouldn't help much.

This was not very feasible when the WAL code was designed because the
buffer manager needed to cope with both normal pages and pg_log pages,
but as of 7.2 I think it'd be safe to assume that all pages have the
standard layout.

regards, tom lane



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-24 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 I was thinking of writing a command line tool like 'pgtune' that looks at
 the stats views and will generate SQL code for, or do automatically the
 following:

 * Dropping indices that are never used
 * Creating appropriate indices to avoid large, expensive sequential scans.

Dropping unused indices sounds good --- but beware of dropping unique
indexes; they may be there to enforce a constraint, and not because of
any desire to use them in queries.

I'm not sure how you're going to automatically intuit appropriate
indexes to add, though.  You'd need to look at a suitable workload
(ie, a representative set of queries) which is not data that's readily
available from the stats views.  Perhaps we could expect the DBA to
provide a segment of log output that includes debug_print_query
and show_query_stats results.

regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-24 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
 I was thinking of writing a command line tool like 'pgtune' that looks at
 the stats views and will generate SQL code for, or do automatically the
 following:
 
 * Dropping indices that are never used
 * Creating appropriate indices to avoid large, expensive sequential scans.
 
 This would put us in the 'mysql makes my indices for me by magic' league -
 but would be far more powerful and flexible.  How to do multikey indices is
 beyond me tho.

This is a great idea.  I have been wanting to do something like this
myself but probably won't get the time.

Does MySQL really make indexes by magic?

Also, I had to look up the contraction for will not because I always
get that confused (won't).  I just found a web page on it:

http://www.straightdope.com/mailbag/mwont.html

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Page OpaqueData

2002-06-24 Thread Tom Lane

Manfred Koizar [EMAIL PROTECTED] writes:
 Is od_pagesize in any way more or less opaque than pd_lower, pd_upper,
 pd_special, etc?  If it is, why?

I surmise that there was once some idea of supporting multiple page
sizes simultaneously, but it's not real clear why the macros
PageGetPageSize/PageSetPageSize wouldn't be a sufficient abstraction
layer; the extra level of struct naming for pd_opaque has no obvious
usefulness.  In any case I doubt that dealing with multiple page sizes
would be worth the trouble it would be to support.

 If it's not, should I post a patch that puts pagesize directly into
 PageHeaderData?

If you're so inclined.  Given that pd_opaque is hidden in those macros,
there wouldn't be much of any gain in readability either, so I haven't
worried about changing the declaration.

regards, tom lane



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Bruce Momjian

Tom Lane wrote:
  On Sun, 23 Jun 2002, Bruce Momjian wrote:
  Yes, I don't see writing to two files vs. one to be any win, especially
  when we need to fsync both of them.  What I would really like is to
  avoid the double I/O of writing to WAL and to the data file;  improving
  that would be a huge win.
 
 I don't believe it's possible to eliminate the double I/O.  Keep in mind
 though that in the ideal case (plenty of shared buffers) you are only
 paying two writes per modified block per checkpoint interval --- one to
 the WAL during the first write of the interval, and then a write to the
 real datafile issued by the checkpoint process.  Anything that requires
 transaction commits to write data blocks will likely result in more I/O
 not less, at least for blocks that are modified by several successive
 transactions.
 
 The only thing I've been able to think of that seems like it might
 improve matters is to make the WAL writing logic aware of the layout
 of buffer pages --- specifically, to know that our pages generally
 contain an uninteresting hole in the middle, and not write the hole.
 Optimistically this might reduce the WAL data volume by something
 approaching 50%; though pessimistically (if most pages are near full)
 it wouldn't help much.

Good idea.  How about putting the page through or TOAST compression
routine before writing it to WAL?  Should be pretty easy and fast and
doesn't require any knowledge of the page format.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





[HACKERS] oids rollover?

2002-06-24 Thread Daniel Kalchev

o
I have a problem with an 7.1.3 database that has probably overflowed
the oid counter. The startup halts with these messages

DEBUG:  database system was interrupted at 2002-06-24 21:19:43 EEST
DEBUG:  CheckPoint record at (156, 1692817164)
DEBUG:  Redo record at (156, 1692775580); Undo record at (0, 0);
Shutdown FALSE
DEBUG:  NextTransactionId: 859255800; NextOid: 7098
FATAL 2:  Invalid NextTransactionId/NextOid
postmaster: Startup proc 4752 exited with status 512 - abort


Can something be sone to recover the database?

Regards,
Daniel



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





Re: [HACKERS] Our archive searching stinks

2002-06-24 Thread Marc G. Fournier

On Sun, 23 Jun 2002, Vince Vielhaber wrote:

  Can we link to the fts site?

 The only thing I can help with is the fts link, but I'm hesitant to
 link to something that disappears.  If it's going to be here and not
 go away again I'll be happy to add it.

The only reason it disappeared was more my fault then anything ... I
spec'd out that server for what I *thought* we were using on the old one,
and didn't realize how much memory was required ... the upgraded to 4gig
appears to have helped ...





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Tom Lane

J. R. Nield [EMAIL PROTECTED] writes:
 Also, postgreSQL can't recover from any other type of block corruption,
 while the commercial systems can.

Say again?

 Would it not be the case that things like read-ahead, grouping writes,
 and caching written data are probably best done by PostgreSQL, because
 only our buffer manager can understand when they will be useful or when
 they will thrash the cache?

I think you have been missing the point.  No one denies that there will
be some incremental gain if we do all that.  However, the conclusion of
everyone who has thought much about it (and I see Curt has joined that
group) is that the effort would be far out of proportion to the probable
gain.  There are a lot of other things we desperately need to spend time
on that would not amount to re-engineering large quantities of OS-level
code.  Given that most Unixen have perfectly respectable disk management
subsystems, we prefer to tune our code to make use of that stuff, rather
than follow the conventional wisdom that databases need to bypass it.

Oracle can afford to do that sort of thing because they have umpteen
thousand developers available.  Postgres does not.

regards, tom lane



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Use of LOCAL in SET command

2002-06-24 Thread Thomas Lockhart

 SQL uses LOCAL to mean the local node in a distributed system (SET LOCAL
 TRANSACTION ...) and the current session as opposed to all sessions (local
 temporary table).  The new SET LOCAL command adds the meaning this
 transaction only.  Instead we could simply use SET TRANSACTION, which
 would be consistent in behaviour with the SET TRANSACTION ISOLATION LEVEL
 command.

Yes. If there is a possibility of confusion (now or later) over SQL99
syntax, we should do it The Right Way per spec.

 - Thomas



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] pg_restore: [archiver] input file does not appear to be a valid archive

2002-06-24 Thread Tom Lane

James Thornton [EMAIL PROTECTED] writes:
 I upgrade from PG 7.1.3 to 7.2, and I am trying to restore my dbs but I
 keep getting:
 [nsadmin@roam backup-20020622]$ pg_restore all-good.dmp 
 pg_restore: [archiver] input file does not appear to be a valid archive

How did you make the dump file exactly?

I'm betting that what you have is not a dump, but just a SQL script that
you are supposed to feed to psql not pg_restore ...

regards, tom lane



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] A fairly obvious optimization?

2002-06-24 Thread cbbrowne

On Sun, 23 Jun 2002 17:16:09 EDT, the world broke into rejoicing as
Bruce Momjian [EMAIL PROTECTED]  said:
 FAQ updated in section 4.8: My queries are slow or don't make use of the
 indexes. Why?
 
 is returned.  In fact, though MAX() and MIN() don't use indexes,   
 it is possible to retrieve such values using an index with ORDER BY
 and LIMIT:
 PRE
 SELECT col
 FROM tab
 ORDER BY col
 LIMIT 1
 /PRE

This sounds like the sort of thing that would be really nice to be able
to automate into the query optimizer...
--
(reverse (concatenate 'string moc.enworbbc@ sirhc))
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
I decry the current  tendency to seek  patents on algorithms.   There
are better ways to  earn a living  than to  prevent other  people from
making use of one's contributions to computer science.
-- D. E. Knuth



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





[HACKERS] Nonrecursive ALTER TABLE ADD/RENAME COLUMN is wrong

2002-06-24 Thread Tom Lane

Fernando Nasser of Red Hat reminded me that it really makes no sense
for ALTER TABLE ADD COLUMN and ALTER TABLE RENAME COLUMN to behave
non-recursively --- that is, they should *always* affect inheritance
children of the named table, never just the named table itself.

After a non-recursive ADD/RENAME, you'd have a situation wherein
SELECT * FROM foo would fail, because there'd be no corresponding
columns in the child table(s).  This seems clearly bogus to me.
(On the other hand, non-recursive DROP COLUMN, if we had one, would
be okay ... the orphaned child columns would effectively become
non-inherited added columns.  Similarly, non-recursive alterations of
defaults, constraints, etc seem reasonable.)

As of 7.2 we do accept ALTER TABLE ONLY foo forms of these commands,
but I think that's a mistake arising from thoughtless cut-and-paste
from the other forms of ALTER.  I believe it is better to give an error
if such a command is given.  Any objections?

Also, in the case where neither ONLY foo nor foo* is written, the
behavior currently depends on the SQL_INHERITANCE variable.  There's
no problem when SQL_INHERITANCE has its default value of TRUE, but what
if it is set to FALSE?  Seems to me we have two plausible choices:

* Give an error, same as if ONLY foo had been written.

* Assume the user really wants recursion, and do it anyway.

The second seems more user-friendly but also seems to violate the
principle of least surprise.  Anyone have an opinion about what to do?

regards, tom lane



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Tom Lane

Thomas Lockhart [EMAIL PROTECTED] writes:
 So the closest you could
 map it to would be
 ALTER FUNCTION to(from) IMPLICIT CAST

 That would require that the function to be used as the cast have the
 same name as the underlying PostgreSQL conventions for casting
 functions. The implementation I've done does not require this; it
 basically defines a new SQL function with a body of
 select func($1)
 where func is the name specified in the WITH FUNCTION func(args)
 clause. It does hang together in the way SQL99 intends and in a way
 which is consistant with PostgreSQL's view of the world.

Urk.  Do you realize how expensive SQL functions are for such uses?
(I have had a to-do item for awhile to teach the planner to inline
trivial SQL functions, but it seems unlikely to happen for another
release or three.)

I see no real reason why we should not require casting functions to
follow the Postgres naming convention --- after all, what else would
you name a casting function?

So I'm with Peter on this one: make the SQL99 syntax a mere wrapper
for setting the IMPLICIT CAST bit on an existing function.  Otherwise,
people will avoid it as soon as they discover what it's costing them.

regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] Use of LOCAL in SET command

2002-06-24 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Sorry to nag about this so late, but I fear that the new command SET LOCAL
 will cause some confusion later on.

Okay...

 SQL uses LOCAL to mean the local node in a distributed system (SET LOCAL
 TRANSACTION ...) and the current session as opposed to all sessions (local
 temporary table).  The new SET LOCAL command adds the meaning this
 transaction only.  Instead we could simply use SET TRANSACTION, which
 would be consistent in behaviour with the SET TRANSACTION ISOLATION LEVEL
 command.

Hmm ... this would mean that the implicit parsing of SET TRANSACTION
ISOLATION LEVEL would change (instead of SET / TRANSACTION ISOLATION
LEVEL you'd now tend to read it as SET TRANSACTION / ISOLATION LEVEL)
but I guess that would still not create any parse conflicts.  I'm okay
with this as long as we can fix psql's command completion stuff to
handle it intelligently.  I hadn't gotten round to looking at that point
yet for the LOCAL case; do you have any thoughts?

regards, tom lane



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Does anyone know what the major barriers to infinite log replay are in
 PostgreSQL? I'm trying to look for everything that might need to be
 changed outside xlog.c, but surely this has come up before. Searching
 the archives hasn't revealed much.

 This has been brought up.  Could we just save WAL files and get replay? 
 I believe some things have to be added to WAL to allow this, but it
 seems possible.

The Red Hat group has been looking at this somewhat; so far there seem
to be some minor tweaks that would be needed, but no showstoppers.

 Somehow you would need a tar-type
 backup of the database, and with a running db, it is hard to get a valid
 snapshot of that.

But you don't *need* a valid snapshot, only a correct copy of
every block older than the first checkpoint in your WAL log series.
Any inconsistencies in your tar dump will look like repairable damage;
replaying the WAL log will fix 'em.

regards, tom lane



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] [SQL] Request for builtin function: Double_quote

2002-06-24 Thread Tom Lane

Josh Berkus [EMAIL PROTECTED] writes:
 Well, first off, quote_literal isn't in the documentation under
 Functions and Operators.So this is the first I've heard about it
 -- or probably anyone else outside the core team.   How long has it
 been around?

Awhile; however, the only documentation was in the discussion of EXECUTE
in the pl/pgsql chapter of the Programmer's Guide, which is probably not
the best place.

 Therefore, I withdraw my initial request, and request instead that
 quote_literal be added to the function documentation in String
 Functions and Operators.

Done; I also added its sister function quote_ident.  See the devel
docs at
http://candle.pha.pa.us/main/writings/pgsql/sgml/functions-string.html

regards, tom lane



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Bruce Momjian

J. R. Nield wrote:
  This I don't quite understand. Assuming you're using a SCSI drive
  (and this mostly applies to ATAPI/IDE, too), you can do naught but
  align block writes to the structure of the underlying device. When you
  initiate a SCSI WRITE command, you start by telling the device at which
  block to start writing and how many blocks you intend to write. Then you
  start passing the data.
  
 
 All I'm saying is that the entire postgresql block write must be
 converted into exactly one SCSI write command in all cases, and I don't
 know a portable way to ensure this. 

...

 I agree with this. My point was only that you need to know what
 guarantees your operating system/hardware combination provides on a
 case-by-case basis, and there is no standard way for a program to
 discover this. Most system administrators are not going to know this
 either, unless databases are their main responsibility.

Yes, agreed.  1% are going to know the answer to this question so we
have to assume worst case.

  It seems to me that it is doing this right now. In fact, it's more
  reliable than some commerial systems (such as SQL Server) because it can
  recover from a torn block with just the logfile.
 
 Again, what I meant to say is that the commercial systems can recover
 with an old file backup + logs. How old the backup can be depends only
 on how much time you are willing to spend playing the logs forward. So
 if you do a full backup once a week, and multiplex and backup the logs,
 then even if a backup tape gets destroyed you can still survive. It just
 takes longer.
 
 Also, postgreSQL can't recover from any other type of block corruption,
 while the commercial systems can. That's what I meant by the critical
 production use comment, which was sort-of unfair.
 
 So I would say they are equally reliable for torn pages (but not bad
 blocks), and the commercial systems let you trade potential recovery
 time for not having to write the blocks twice. You do need to back-up
 the log archives though.

Yes, good tradeoff analysis.  We recover from partial writes quicker,
and don't require saving of log files, _but_ we don't recover from bad
disk blocks.  Good summary.

 I'll back off on that. I don't know if we want to use the OS buffer
 manager, but shouldn't we try to have our buffer manager group writes
 together by files, and pro-actively get them out to disk? Right now, it
 looks like all our write requests are delayed as long as possible and
 the order in which they are written is pretty-much random, as is the
 backend that writes the block, so there is no locality of reference even
 when the blocks are adjacent on disk, and the write calls are spread-out
 over all the backends.
 
 Would it not be the case that things like read-ahead, grouping writes,
 and caching written data are probably best done by PostgreSQL, because
 only our buffer manager can understand when they will be useful or when
 they will thrash the cache?

The OS should handle all of this.  We are doing main table writes but no
sync until checkpoint, so the OS can keep those blocks around and write
them at its convenience.  It knows the size of the buffer cache and when
stuff is forced to disk.  We can't second-guess that.

 I may likely be wrong on this, and I haven't done any performance
 testing. I shouldn't have brought this up alongside the logging issues,
 but there seemed to be some question about whether the OS was actually
 doing all these things behind the scene.

It had better.  Looking at the kernel source is the way to know.

 Does anyone know what the major barriers to infinite log replay are in
 PostgreSQL? I'm trying to look for everything that might need to be
 changed outside xlog.c, but surely this has come up before. Searching
 the archives hasn't revealed much.

This has been brought up.  Could we just save WAL files and get replay? 
I believe some things have to be added to WAL to allow this, but it
seems possible.  However, the pg_dump is just a data dump and does not
have the file offsets and things.  Somehow you would need a tar-type
backup of the database, and with a running db, it is hard to get a valid
snapshot of that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread J. R. Nield

On Sun, 2002-06-23 at 23:40, Curt Sampson wrote:
 On 23 Jun 2002, J. R. Nield wrote:
 
  If is impossible to do what you want. You can not protect against
  partial writes without writing pages twice and calling fdatasync
  between them while going through a generic filesystem.
 
 I agree with this.
 
  The best disk array will not protect you if the operating system does
  not align block writes to the structure of the underlying device.
 
 This I don't quite understand. Assuming you're using a SCSI drive
 (and this mostly applies to ATAPI/IDE, too), you can do naught but
 align block writes to the structure of the underlying device. When you
 initiate a SCSI WRITE command, you start by telling the device at which
 block to start writing and how many blocks you intend to write. Then you
 start passing the data.
 

All I'm saying is that the entire postgresql block write must be
converted into exactly one SCSI write command in all cases, and I don't
know a portable way to ensure this. 

  Even with raw devices, you need special support or knowledge of the
  operating system and/or the disk device to ensure that each write
  request will be atomic to the underlying hardware.
 
 Well, so here I guess you're talking about two things:
 
 1. When you request, say, an 8K block write, will the OS really
 write it to disk in a single 8K or multiple of 8K SCSI write
 command?
 
 2. Does the SCSI device you're writing to consider these writes to
 be transactional. That is, if the write is interrupted before being
 completed, does the SCSI device guarantee that the partially-sent
 data is not written, and the old data is maintained? And of course,
 does it guarantee that, when it acknowledges a write, that write is
 now in stable storage and will never go away?
 
 Both of these are not hard to guarantee, actually. For a BSD-based OS,
 for example, just make sure that your filesystem block size is the
 same as or a multiple of the database block size. BSD will never write
 anything other than a block or a sequence of blocks to a disk in a
 single SCSI transaction (unless you've got a really odd SCSI driver).
 And for your disk, buy a Baydel or Clarion disk array, or something
 similar.
 
 Given that it's not hard to set up a system that meets these criteria,
 and this is in fact commonly done for database servers, it would seem a
 good idea for postgres to have the option to take advantage of the time
 and money spent and adjust its performance upward appropriately.

I agree with this. My point was only that you need to know what
guarantees your operating system/hardware combination provides on a
case-by-case basis, and there is no standard way for a program to
discover this. Most system administrators are not going to know this
either, unless databases are their main responsibility.

 
  All other systems rely on the fact that you can recover a damaged file
  using the log archive.
 
 Not exactly. For MS SQL Server, at any rate, if it detects a page tear
 you cannot restore based on the log file alone. You need a full or
 partial backup that includes that entire torn block.
 

I should have been more specific: you need a backup of the file from
some time ago, plus all the archived logs from then until the current
log sequence number.

  This means downtime in the rare case, but no data loss. Until
  PostgreSQL can do this, then it will not be acceptable for real
  critical production use.
 
 It seems to me that it is doing this right now. In fact, it's more
 reliable than some commerial systems (such as SQL Server) because it can
 recover from a torn block with just the logfile.

Again, what I meant to say is that the commercial systems can recover
with an old file backup + logs. How old the backup can be depends only
on how much time you are willing to spend playing the logs forward. So
if you do a full backup once a week, and multiplex and backup the logs,
then even if a backup tape gets destroyed you can still survive. It just
takes longer.

Also, postgreSQL can't recover from any other type of block corruption,
while the commercial systems can. That's what I meant by the critical
production use comment, which was sort-of unfair.

So I would say they are equally reliable for torn pages (but not bad
blocks), and the commercial systems let you trade potential recovery
time for not having to write the blocks twice. You do need to back-up
the log archives though.

 
  But at the end of the day, unless you have complete understanding of
  the I/O system from write(2) through to the disk system, the only sure
  ways to protect against partial writes are by careful writes (in
  the WAL log or elsewhere, writing pages twice), or by requiring (and
  allowing) users to do log-replay recovery when a file is corrupted by
  a partial write.
 
 I don't understand how, without a copy of the old data that was in the
 torn block, you can restore that block from just log file entries. Can
 you 

Re: [HACKERS] PERFORM effects FOUND patch (Was: [GENERAL] I must be

2002-06-24 Thread Tom Lane

 Jan Wieck [EMAIL PROTECTED] writes:
 Perform has nothing to do with ORACLE. It was added because people tried
 to call other procedures and didn't want any result back.

 Well, in that case we can do what we want with it.

 Does anyone object to making it set FOUND?

Given the lack of objection, I have committed the attached patch for 7.3,
along with a suitable documentation update.

regards, tom lane

*** src/pl/plpgsql/src/pl_exec.c.orig   Mon Mar 25 02:41:10 2002
--- src/pl/plpgsql/src/pl_exec.cMon Jun 24 18:23:11 2002
***
*** 969,977 
else
{
/*
!* PERFORM: evaluate query and discard result.  This cannot share
!* code with the assignment case since we do not wish to
!* constraint the discarded result to be only one row/column.
 */
int rc;
  
--- 969,979 
else
{
/*
!* PERFORM: evaluate query and discard result (but set FOUND
!* depending on whether at least one row was returned).
!*
!* This cannot share code with the assignment case since we do not
!* wish to constrain the discarded result to be only one row/column.
 */
int rc;
  
***
*** 984,989 
--- 986,993 
rc = exec_run_select(estate, expr, 0, NULL);
if (rc != SPI_OK_SELECT)
elog(ERROR, query \%s\ didn't return data, expr-query);
+ 
+   exec_set_found(estate, (estate-eval_processed != 0));
  
exec_eval_cleanup(estate);
}



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] ADTs and embedded sql

2002-06-24 Thread Tony Griffiths



Thomas Lockhart wrote:
 
  Ah, I see --- more or less make all of utils/adt/ available to be
  linked into clients.
   That is a Good Idea in principle. In practice, ...
  Yeah, it'd be a huge amount of work.  For starters, all that code
  relies on the backend environment for error handling and memory
  management...
 
 It would be a large amount of work to make *all* of utils/adt available.
 However, the initial work would be to support I/O to get values
 converted to internal storage. Michael M. already has to do some of this
 for ecpg, and presumably we could do this for more types (or maybe *all*
 builtin types are already supported in this way by ecpg, in which case
 MM has already done all of the hard work, and we might just repackage
 it).
 
 A first cut would seem to be appropriate, if someone would like to pick
 up the work. Tony?? ;)

I'd love to get involved in this, BUT... no time at the moment, although
if I get a really good Masters student  next semester - I could always
do this as their project. If this is still a requirement in about 3
months then I can set someone on to it.

Tony

 
 - Thomas
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html

-- 
Tony

-
Dr. Tony Griffiths
Research Fellow
Information Management Group,
Department of Computer Science,
The University of Manchester,
Oxford Road,
Manchester M13 9PL, 
United Kingdom

Tel. +44 (0) 161 275 6139
Fax +44 (0) 161 275 6236
email [EMAIL PROTECTED]
-



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] Democracy and organisation : let's make a revolution in the Debian way

2002-06-24 Thread Josh Berkus

Jean-Michel,

 It seems clear that several teams are working without central point 
management 
 and contact:
snip
 - Marketing: MySQL sucks and has a team of marketing sending junk technical 
 emails and writing false benchmarks. Who is in charge of marketing at 
 PostgreSQL? Where can I find a list of PostgreSQL features?
snip
ome projects, like Debian, have a democratic organisation. The team leader is 
 elected for a year. Why not settle a similar organization? This would help 
 take decisions ... and not loose time on important issues.
 
 PostgreSQL is a software but it is also a community. If we believe in 
 democracy, I suggest we should organize in a democratic way and elect a 
 leader for a year.

Let me introduce myself.  In addition to being a contributor of supplimentary 
documentation and the occasional spec to the PostgreSQL project, I am 
volunteer marketing lead and the primary motivator for governance overhaul in 
the OpenOffice.org project.   

And frankly, I think you're way off base here.   We have leaders:  Tom, Bruce, 
Jan, Stephan, Thomas, Marc and Oliver (did I miss anybody?).Frankly, if 
OpenOffice.org had the kind of widely trusted, committed, involved in the 
community core developers that PostgreSQL already has, I wouldn't be on my 
fourth draft of an OpenOffice.org Community Council charter.  OpenOffice.org 
will have an election process because we are too big and too dispersed for a 
simple trust network, not because we want one for its own sake.

PostgreSQL is, quite possibly, the smoothest-running Open Source project with 
worldwide adoption.  I find myself saying, at least once a week, if only 
project X were as well-organized as PostgreSQL!   It is perhaps not 
coincidental that Postgres is one of the 15 or 20 oldest Open Source projects 
(older than Linux, I believe).

How would a democratic election improve this?   And why would we want an 
elected person or body who was not a core developer?And if we elected a 
core developer, why bother?  They aready run things.

Regarding your marketing angle:  Feel free to nominate yourself PostgreSQL 
Marketing Czar.Write articles.  Contact journalists.   Generate press 
releases for each new Postgres version.  Apply for a dot-org booth at 
LinuxWorld.  Nobody voted for me (actually, I got stuck with the job by not 
protesting hard enough grin).

Frankly, my feeling is, as a geek-to-geek product, PostgreSQL is already 
adequately marketed through our huge network of DBA users and code 
contributors.   As often as not, the database engine choice is made by the 
DBA, and they will choose PostgreSQL on its merits, not because of some 
Washington Post article.

OpenOffice.org is a different story, as an end-user application.   So we have 
a Marketing Project.

-- 
-Josh Berkus
Porject Lead, OpenOffice.org Marketing



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Does anyone know what the major barriers to infinite log replay are in
  PostgreSQL? I'm trying to look for everything that might need to be
  changed outside xlog.c, but surely this has come up before. Searching
  the archives hasn't revealed much.
 
  This has been brought up.  Could we just save WAL files and get replay? 
  I believe some things have to be added to WAL to allow this, but it
  seems possible.
 
 The Red Hat group has been looking at this somewhat; so far there seem
 to be some minor tweaks that would be needed, but no showstoppers.


Good.

  Somehow you would need a tar-type
  backup of the database, and with a running db, it is hard to get a valid
  snapshot of that.
 
 But you don't *need* a valid snapshot, only a correct copy of
 every block older than the first checkpoint in your WAL log series.
 Any inconsistencies in your tar dump will look like repairable damage;
 replaying the WAL log will fix 'em.

Yes, my point was that you need physical file backups, not pg_dump, and
you have to be tricky about the files changing during the backup.  You
_can_ work around changes to the files during backup.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
/usr/local/bin/mime: cannot create /dev/ttyp3: permission denied



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Tom Lane

Thomas Lockhart [EMAIL PROTECTED] writes:
 I've got another issue with casting which I've run into while testing
 this feature; afaict invoking an explicit CAST() in SQL does not
 guarantee that the function of the expected name would be called, if
 that function does not have the implicit flag set.

[ scratches head ]  Whether the flag is set or not shouldn't matter;
if the cast function is needed it will be called.  Were you perhaps
testing binary-compatible cases?  Note the order of cases specified in
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/typeconv-func.html

I recall we changed what is now case 2 to be higher priority than it
used to be; I do not recall the examples that motivated that change,
but I'm pretty sure moving it down in the priority list would be bad.

regards, tom lane



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly





Re: [HACKERS] ODBC Driver 7.02.0001 (Win32) (Unicode mode): CRLF-LF

2002-06-24 Thread Hiroshi Inoue
Julian Mehnle wrote:
 
 Hiroshi Inoue [EMAIL PROTECTED] wrote:
  Julian Mehnle [EMAIL PROTECTED] wrote:
   Recently I tried to use the new 7.02.0001 Win32 ODBC driver
   in the new (beta) Unicode mode in conjunction with MS Access
   2000 and a "UNICODE" encoded database stored in a PostgreSQL
   7.2.1 database running on a Linux system.
  
   I noticed that when the "LF-CRLF Conversion" option is
   *enabled* in the driver's settings dialog, only a CRLF-LF
   conversion (while writing to the database) is performed,
   but no LF-CRLF conversion (while reading from the database)!
 
  Could you try the snapshot dll at http://w2422.nsk.ne.jp/~inoue/ ?
 
 There are several versions and I don't know which one you mean:
 - psqlodbc.dll (version 7.02.0001),
 - psqlodbc.dll (the multibyte version),
 - psqlodbc30.dll (7.02.0001 ODBC3.0 trial driver),
 - psqlodbc30.dll (the multibyte version),
 - another trial driver(7.02.0001 + ODBC3.0 + Unicode)?

The last one because you seem to be using Unicode driver.

regards,
Hiroshi Inoue
http://w2422.nsk.ne.jp/~inoue/



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] oids rollover?

2002-06-24 Thread Tom Lane

Daniel Kalchev [EMAIL PROTECTED] writes:
 I have a problem with an 7.1.3 database that has probably overflowed
 the oid counter. The startup halts with these messages

 DEBUG:  database system was interrupted at 2002-06-24 21:19:43 EEST
 DEBUG:  CheckPoint record at (156, 1692817164)
 DEBUG:  Redo record at (156, 1692775580); Undo record at (0, 0);
 Shutdown FALSE
 DEBUG:  NextTransactionId: 859255800; NextOid: 7098
 FATAL 2:  Invalid NextTransactionId/NextOid
 postmaster: Startup proc 4752 exited with status 512 - abort

Looks that way.  This is fixed in 7.2, so you might want to think about
an update sometime soon.

 Can something be sone to recover the database?

You could modify contrib/pg_resetxlog to force a value at least 16384 into
the OID counter.  Since the DB was evidently not shut down cleanly, I'd
counsel cutting out the xlog-reset function entirely; just make it read
the pg_control file, set a valid nextOid, update the CRC, and rewrite
pg_control.

regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Thomas Lockhart

 I see no real reason why we should not require casting functions to
 follow the Postgres naming convention --- after all, what else would
 you name a casting function?

We do require casting functions to follow the Postgres naming
convention. istm to be a waste of time to have the CREATE CAST() feature
*only* set a bit on an existing function, especially given the SQL99
syntax which implies that it can define a cast operation for an
arbitrarily named function. It also supposes that the only allowed casts
are *implicit casts* (see below for a new issue) which is not quite
right. I've defined alternate forms which draw on the general PostgreSQL
feature set and capabilities, but if we can fit the SQL99 model then we
should go ahead and do that too.

I've got another issue with casting which I've run into while testing
this feature; afaict invoking an explicit CAST() in SQL does not
guarantee that the function of the expected name would be called, if
that function does not have the implicit flag set. Seems that it should
be willing to do the conversion even if the function is not marked as
allowing implicit casts; after all, this is an *explicit* cast!

I'm pretty sure that this is the behavior I've been seeing, but will
publish a test case to confirm it when I have a chance.

 - Thomas



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] SQL99, CREATE CAST, and initdb

2002-06-24 Thread Tom Lane

I said:
 Thomas Lockhart [EMAIL PROTECTED] writes:
 I've got another issue with casting which I've run into while testing
 this feature; afaict invoking an explicit CAST() in SQL does not
 guarantee that the function of the expected name would be called, if
 that function does not have the implicit flag set.

 [ scratches head ]  Whether the flag is set or not shouldn't matter;
 if the cast function is needed it will be called.  Were you perhaps
 testing binary-compatible cases?

Another possibility is that you got burnt by some schema-related issue;
cf the updated conversion docs at
http://developer.postgresql.org/docs/postgres/typeconv-func.html

IIRC, a function is only considered to be a cast function if it matches
by name *and schema* with the target type.  So if you, for example,
make a function public.int4(something), it'll never be considered a
cast function for pg_catalog.int4.  I had some doubts about that rule
when I put it in, but so far have not thought of an alternative I like
better.

regards, tom lane



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread J. R. Nield

On Mon, 2002-06-24 at 17:16, Tom Lane wrote:
 
 I think you have been missing the point...  
Yes, this appears to be the case. Thanks especially to Curt for clearing
things up for me.

-- 
J. R. Nield
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Nonrecursive ALTER TABLE ADD/RENAME COLUMN is wrong

2002-06-24 Thread Rod Taylor

   * Give an error, same as if ONLY foo had been written.
 
   * Assume the user really wants recursion, and do it anyway.
 
 The second seems more user-friendly but also seems to violate the
 principle of least surprise.  Anyone have an opinion about what to do?

I really prefer the former.  If for some reason it were to become 
available that they could alter only foo for some strange reason we 
haven't come up with yet (statistics related perhaps?), we would 
certainly need to throw an error on the other 'alter table' statements 
at that point in time.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] [SQL] Request for builtin function: Double_quote

2002-06-24 Thread Josh Berkus


Tom,

 Done; I also added its sister function quote_ident.  See the devel
 docs at
 http://candle.pha.pa.us/main/writings/pgsql/sgml/functions-string.html

Tante Grazie.

-- 
-Josh Berkus




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





Re: [HACKERS] Index Scans become Seq Scans after VACUUM ANALYSE

2002-06-24 Thread Curt Sampson

On 24 Jun 2002, J. R. Nield wrote:

 All I'm saying is that the entire postgresql block write must be
 converted into exactly one SCSI write command in all cases, and I don't
 know a portable way to ensure this.

No, there's no portable way. All you can do is give the admin who
is able to set things up safely the ability to turn of the now-unneeded
(and expensive) safety-related stuff that postgres does.

 I agree with this. My point was only that you need to know what
 guarantees your operating system/hardware combination provides on a
 case-by-case basis, and there is no standard way for a program to
 discover this. Most system administrators are not going to know this
 either, unless databases are their main responsibility.

Certainly this is true of pretty much every database system out there.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] Nonrecursive ALTER TABLE ADD/RENAME COLUMN is wrong

2002-06-24 Thread Thomas Lockhart

 Fernando Nasser of Red Hat reminded me that it really makes no sense
 for ALTER TABLE ADD COLUMN and ALTER TABLE RENAME COLUMN to behave
 non-recursively --- that is, they should *always* affect inheritance
 children of the named table, never just the named table itself.

Hmm. Good point. Anything else would lead to structural breakage.

 The second seems more user-friendly but also seems to violate the
 principle of least surprise.  Anyone have an opinion about what to do?

Same point as for the main issue: the solution should not introduce
structural breakage, especially only on the otherwise benign setting of
a GUC variable.

The case you are worried about already *has* structural inheritance, so
the GUC setting could reasonably have no effect. But if one is mixing a
database with inheritance structures with command settings that hide it,
they shouldn't be too suprised at whatever they get. The Right Thing
imho is to respect the underlying structures and definitions, not the
command facade. But would not dig in my heels on either choice after
more discussion.

 - Thomas



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]