[HACKERS] 7.4RC2 regression failur and not running stats collector process on Solaris

2003-11-10 Thread Kiyoshi Sawada
Failed to build on Solaris.

Summary
1. Checking for pstat... no
2. Regression Failur  stats . FAILED .
3. Not running stats buffer process and stats collector process.


Environments
 SunOS 5.8 Generic_108528-15 sun4m sparc
 SunOS 5.8 Generic_108529-23 i86pc i386 i86pc
Both sparc and i386
 PostgreSQL 7.4 RC2
 gcc (GCC) 3.3.2
 autoconf (GNU Autoconf) 2.57 
 bison (GNU Bison) 1.875
 GNU Make 3.80


(1) checking for pstat... no
$ ./configure --enable-integer-datetimes \
  --without-readline --with-openssl
---
  : : : :
checking sys/pstat.h usability... no
checking sys/pstat.h presence... no
  : : : :
checking for pstat... no
  : : : :
---

(2) Regression Failur  stats . FAILED
$ make check
 : : : :
 sequence ... ok
 polymorphism ... ok
 stats... FAILED
== shutting down postmaster  ==
===
 1 of 93 tests failed.
===

(3) Not running stats buffer process and stats collector process.
$ pg_ctl start -D /usr/local/pgsql/data
$ ps -ef | grep postmaster
postgres 15912 15899  0 11:32:59 pts/20:00 grep postmaster
postgres 15864 1  0 11:17:03 pts/10:00 /usr/local/pgsql/bin/postmaster
$

--
Kiyoshi Sawada


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Shridhar Daithankar
On Tuesday 11 November 2003 00:50, Neil Conway wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > We can't resize shared memory because we allocate the whole thing in
> > one big hump - which causes the shmmax problem BTW. If we allocate
> > that in chunks of multiple blocks, we only have to give it a total
> > maximum size to get the hash tables and other stuff right from the
> > beginning. But the vast majority of memory, the buffers themself, can
> > be made adjustable at runtime.
>
> Yeah, writing a palloc()-style wrapper over shm has been suggested
> before (by myself among others). You could do the shm allocation in
> fixed-size blocks (say, 1 MB each), and then do our own memory
> management to allocate and release smaller chunks of shm when
> requested. I'm not sure what it really buys us, though: sure, we can
> expand the shared buffer area to some degree, but

Thinking of it, it can be put as follows. Postgresql needs shared memory 
between all the backends. 

If the parent postmaster mmaps anonymous memory segments and shares them with 
children, postgresql wouldn't be dependent upon any kernel resourse aka 
shared memory anymore.

Furthermore parent posmaster can allocate different anonymous mappings for 
different databases. In addition to postgresql buffer manager overhaul, this 
would make things lot better.

note that I am not suggesting mmap to maintain files on disk. So I guess that 
should be OK. 

I tried searching for mmap on hackers. The threads seem to be very old. One in 
1998. with so many proposals of rewriting core stuff, does this have any 
chance?

 Just a thought.

 Shridhar


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Proposal: psql force prompting on notty

2003-11-10 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> >> Are there further concerns with this patch?
> 
> > No, just getting to it now.  Sorry.
> 
> Peter didn't agree with this patch, and I have to concur with him
> that the need for it is unproven.
> 
> Given that it is certainly not going into 7.4 at this late date,
> I think Michael will be wanting to look for another solution anyway...

Yea, certainly not 7.4.  I am not sure he has any other ideas about a
solution, though.  I will try to make him a test binary soon and see how
that works --- maybe we can make it an option that only shows up on
Win32.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Proposal: psql force prompting on notty

2003-11-10 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
>> Are there further concerns with this patch?

> No, just getting to it now.  Sorry.

Peter didn't agree with this patch, and I have to concur with him
that the need for it is unproven.

Given that it is certainly not going into 7.4 at this late date,
I think Michael will be wanting to look for another solution anyway...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pgsql CVS build failure on Debian GNU/Linux 3.0

2003-11-10 Thread Bruce Momjian

You need bison 1.875.

---

strk wrote:
> I can't build postgresql from CVS. Any help ?
> This is the message I get:
> 
>   bison -y -d  preproc.y
>   preproc.y:6275: fatal error: maximum table size (32767) exceeded
> 
> 
> Bison version:
> 
>   bison (GNU Bison) 1.35
> 
> TIA
> .strk;
> 
> -- 
> pallamondo.net  <-- take a look !
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PostgreSQL Backup problems with tsearch2

2003-11-10 Thread Christopher Kings-Lynne

Is the problem with backing up and restoring a database which has tsearch2
installed and enabled delt with in Version 7.4 of PostgreSQL?


If it's the problem with restoring the tsearch2-related functions, then no,
and I'm not sure whether it's "fixable" (in the sense that a tsearch2 enabled
database will do a painless dump/restore).
I've had some success by making sure all tsearch2-related functions
are in their own schema, which I don't dump or use for restoring; 
before restoring I recreate the schema from a script, then reload
the other schemas. There's a slight gotcha though which I can't recall
offhand. I'll try and write it up next time I got through the process.
What I did is I edited my dump and removed all the tsearch stuff.  Then 
I copied the tsearch2.sql just after the CREATE DATABASE statement. 
This ensured that all the dependencies work fine.

Since then, I think PostgreSQL's default dump order has just worked.

The main situation that causes complete breakage is:

CREATE TABLE...
CREATE TYPE...
ALTER TABLE / ADD COLUMN newtype
Basically, any object that you can add dependencies to after it has been 
initially created can cause problems.  eg. all the CREATE OR REPLACE 
commands, etc.

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Tatsuo Ishii
> Jan Wieck wrote:
> > What bothers me a little is that you keep telling us that you have all 
> > that great code from SRA. Do you have any idea when they intend to share 
> > this with us and contribute the stuff? I mean at least some pieces 
> > maybe? You personally got all the code from NuSphere AKA PeerDirect even 
> > weeks before it got released. Did any PostgreSQL developer other than 
> > you ever look at the SRA code?
> 
> I can get the open/fsync/write/close patch from SRA released, I think. 
> Let me ask them now.

I will ask my boss then come back with the result.

> Tom has seen the Win32 tarball (with SRA's approval) because he wanted
> to research if threading was something we should pursue.  I haven't
> heard a report back from him yet.  If you would like to see the tarball,
> I can ask them.
> 
> Agreed, I got the PeerDirect/Nusphere code very early and it was a help.
> I am sure I can get some of it released.  I haven't pursued the sync
> Win32 patch because it is based on a threaded backend model, so it is
> different from how it need to be done in a process model (all shared
> file descriptors).  However, I will need to get approval in the end
> anyway for Win32 because I need that Win32-specific part anyway.
> 
> I just looked at the sync() call in the code and it just did _flushall:
> 
>   
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/html/_crt__flushall.asp
> 
> I can share this because I know it was discussed when someone (SRA?)
> realized _commit() didn't force all buffers to disk.  In fact, _commit
> is fsync().
> 
> I think the only question was whether _flushall() fsync file descriptors
> that have been closed.  Perhaps SRA keeps the file descriptors open
> until after the checkpoint, or does it fsync closed files with dirty
> buffers.  Tatsuo?

In the SRA's code, the checkpoint thread opens each file (if it's not
already open of course) which has been written then fsync() it.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] PostgreSQL v7.4 Release Candidate 2

2003-11-10 Thread Marc G. Fournier


We have just packaged up our second Release Candidate for v7.4, with the
hopes of producing a full release next week.

A full ChangeLog is available at:

ftp://ftp.postgresql.org/pub/sources/v7.4/ChangeLog.RC1.to.RC2

But, one of the highlights is that support for tcl8.0.x has been
re-introduced.

there are alot of doc changes, and some "what appear to be" small fixes,
mostly related to the various ports.

As we are in the home stretch of a full release, we encourage as many as
possible to test and report any bugs they can find, whether as part of the
build process, or running in "real life" scenarios.

If we've heard no reports back before midnight on Thursday, we are looking
at a full code freeze, with a Final Release to happen on the following
Monday.



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Andrew Sullivan wrote:

On Sun, Nov 09, 2003 at 08:54:25PM -0800, Joe Conway wrote:
two servers, mounted to the same data volume, and some kind of 
coordination between the writer processes. Anyone know if this is 
similar to how Oracle handles RAC?
It is similar, yes, but there's some mighty powerful magic in that
"some kind of co-ordination".  What do you do when one of the
particpants crashes, for instance?  
What about "sympathetic crash"?

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Proposal: psql force prompting on notty

2003-11-10 Thread Bruce Momjian

No, just getting to it now.  Sorry.

---

Michael Mauger wrote:
> --- Michael Mauger <[EMAIL PROTECTED]> wrote:
> > --- Peter Eisentraut <[EMAIL PROTECTED]> wrote:
> > > Michael Mauger writes:
> > > >
> > > > Emacs is not a Cygwin (Un*x emulation on Windows) executable (like
> > > > psql is)
> > > 
> > > Why don't you use the native Windows version of psql?
> > > 
> > 
> > The Cygwin setup program makes pre-built binaries of postgres 
> > readily available on Windows.  (In fact, using this version is 
> > encouraged in the Postgres for Windows installation notes.)  A 
> > native Windows version would not work either since the isatty() 
> > implementation there will only recognize a Command Prompt 
> > window as a tty.
> > 
> 
> Are there further concerns with this patch?
> 
> --- pgsql-server/src/bin/psql/startup.c   29 Sep 2003 18:21:33 -  1.80
> +++ pgsql-server/src/bin/psql/startup.c   01 Nov 2003 06:10:42 -
> @@ -322,6 +322,7 @@
>   {"field-separator", required_argument, NULL, 'F'},
>   {"host", required_argument, NULL, 'h'},
>   {"html", no_argument, NULL, 'H'},
> + {"interactive", no_argument, NULL, 'I'},
>   {"list", no_argument, NULL, 'l'},
>   {"no-readline", no_argument, NULL, 'n'},
>   {"output", required_argument, NULL, 'o'},
> @@ -352,7 +353,7 @@
> 
>   memset(options, 0, sizeof *options);
> 
> - while ((c = getopt_long(argc, argv,
> "aAc:d:eEf:F:h:Hlno:p:P:qR:sStT:uU:v:VWxX?",
> + while ((c = getopt_long(argc, argv,
> "aAc:d:eEf:F:h:HIlno:p:P:qR:sStT:uU:v:VWxX?",
>   long_options, &optindex)) != 
> -1)
>   {
>   switch (c)
> @@ -395,7 +396,10 @@
>   case 'H':
>   pset.popt.topt.format = PRINT_HTML;
>   break;
> + case 'I':
> + pset.notty = 0;
> + break;
>   case 'l':
>   options->action = ACT_LIST_DB;
>   break;
> --- pgsql-server/src/bin/psql/help.c  02 Oct 2003 06:39:31 -  1.81
> +++ pgsql-server/src/bin/psql/help.c  01 Nov 2003 06:29:50 -
> @@ -103,6 +103,7 @@
>   puts(_("  -a  echo all input from script"));
>   puts(_("  -e  echo commands sent to server"));
>   puts(_("  -E  display queries that internal commands
> generate"));
> + puts(_("  -I  force interactive prompting for input"));
>   puts(_("  -q  run quietly (no messages, only query
> output)"));
>   puts(_("  -o FILENAME send query results to file (or |pipe)"));
>   puts(_("  -n  disable enhanced command line editing
> (readline)"));
> 
> 
> __
> Do you Yahoo!?
> Protect your identity with Yahoo! Mail AddressGuard
> http://antispam.yahoo.com/whatsnewfree
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> 
> > Jan Wieck wrote:
> >> Zeugswetter Andreas SB SD wrote:
> >> 
> >> >> > One problem with O_SYNC would be, that the OS does not group writes any 
> >> >> > more. So the code would need to eighter do it's own sorting and grouping
> >> >> > (256k) or use aio, or you won't be able to get the maximum out of the disks.
> >> >> 
> >> >> Or just run multiple writer processes, which I believe is Oracle's
> >> >> solution.
> >> > 
> >> > That does not help, since for O_SYNC the OS'es (those I know) do not group 
> >> > those 
> >> > writes together. Oracle allows more than one writer to busy more than one 
> >> > disk(subsystem) and circumvent other per process limitations (mainly on 
> >> > platforms without AIO). 
> >> 
> >> Yes, I think the best way would be to let the background process write a 
> >> bunch of pages, then fsync() the files written to. If one tends to have 
> >> many dirty buffers to the same file, this will group them together and 
> >> the OS can optimize that. If one really has completely random access, 
> >> then there is nothing to group.
> > 
> > Agreed.  This might force enough stuff out to disk the checkpoint/sync()
> > would be OK.  Jan, have you tested this?
> > 
> 
> As said, not using fsync() but sync() at that place. This only makes a 
> real difference when you're not running PostgreSQL on a dedicated 
> server. And yes, it really works well.

I talked to Jan about this.  Basically, for testing, if sync decreases
the checkpoint load, fsync/O_SYNC should do even better, hopefully, once
he has that implemented.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Lack of RelabelType is causing me pain

2003-11-10 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Joe, do you recall the reasoning for this code in parse_coerce.c?
>> [much snipped]

> Does the RelabelType keep a record of what was relabeled (I presume from 
> your description above it does)?

The RelabelType node itself doesn't, but you can look to its input node
to see the initial type.  The code I was imagining adding to
get_fn_expr_argtype would go like

while ()
node := node->input;

to chain down to the first thing that isn't a Relabel.  You can see
examples of this coding pattern in various places in the optimizer that
want to ignore binary-compatible relabelings.

> The original code above predates get_fn_expr_argtype() I think,

Oh, okay, if the coding predates 7.4 then I'm not so concerned about it.
I was afraid we'd done this as of 7.4, in which case there's no field
experience to indicate that it's really safe in corner cases.

I have found a workaround for my immediate problem with indexing
behavior, so I think we can leave parse_coerce.c as-is for the moment,
but I'm planning to keep my eyes open for any evidence that we ought to
reconsider the decision to omit RelabelType here.  When RelabelType was
put in, the intention was that it would appear *anywhere* that the
actual output of one expression didn't match the expected input type of
its parent.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [DOCS] Annotated release notes

2003-11-10 Thread Bruce Momjian

OK, release notes updated to:

Allow polymorphic PL/pgSQL functions (Tom, Joe)
Allow polymorphic SQL functions (Joe)
   
Allow functions to accept arbitrary data types for input, and return arbitrary 
types.
   
   


---

Joe Conway wrote:
> Bruce Momjian wrote:
> > http://candle.pha.pa.us/main/writings/pgsql/sgml/release.html#RELEASE-7-4
> > 
> > I need people to check this and help me with the items marked 'bjm'.  I
> > am confused about the proper text for those sections.
> 
>  > Allow polymorphic SQL functions (Joe)
>  > bjm ??
> 
> What isn't clear about this? Should/can we refer to related sections of 
> the manual?
> http://developer.postgresql.org/docs/postgres/xfunc-sql.html#AEN28722
> http://developer.postgresql.org/docs/postgres/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
> 
>  > Allow user defined aggregates to use polymorphic functions (Joe)
>  > bjm ??
> 
> Same question. From this url:
> http://developer.postgresql.org/docs/postgres/xaggr.html
> see this paragraph:
> 
>   Aggregate functions may use polymorphic state transition functions or 
> final functions, so that the same functions can be used to implement 
> multiple aggregates. See Section 33.2.1  for an explanation of 
> polymorphic functions. Going a step further, the aggregate function 
> itself may be specified with a polymorphic base type and state type, 
> allowing a single aggregate definition to serve for multiple input data 
> types. Here is an example of a polymorphic aggregate:
> 
> CREATE AGGREGATE array_accum (
>  sfunc = array_append,
>  basetype = anyelement,
>  stype = anyarray,
>  initcond = '{}'
> );
> 
> Joe
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> 
> > Jan Wieck wrote:
> >> >> > If the background cleaner has to not just write() but write/fsync or
> >> >> > write/O_SYNC, it isn't going to be able to clean them fast enough.  It
> >> >> > creates a bottleneck where we didn't have one before.
> >> >> > 
> >> >> > We are trying to eliminate an I/O storm during checkpoint, but the
> >> >> > solutions seem to be making the non-checkpoint times slower.
> >> >> > 
> >> >> 
> >> >> It looks as if you're assuming that I am making the backends unable to 
> >> >> write on their own, so that they have to wait on the checkpointer. I 
> >> >> never said that.
> >> > 
> >> > Maybe I missed it but are those backend now doing write or write/fsync? 
> >> > If the former, that is fine.  If the later, it does seem slower than it
> >> > used to be.
> >> 
> >> In my all_performance.v4.diff they do write and the checkpointer does 
> >> write+sync.
> > 
> > Again, sorry to be confusing --- I might be good to try write/fsync from
> > the background writer if backends can do writes on their own too without
> > fsync.  The additional fsync from the background writer should reduce
> > disk writing during sync().  (The fsync should happen with the buffer
> > unlocked.)
> 
> No, you're not. But thank you for suggesting what I implemented.

OK, I did IM with Jan and I understand now --- he is using write/sync
for testing, but plans to allow several ways to force writes to disk
occasionally, probably defaulting to fsync on most platforms.  Backend
will still use write only, and a checkpoint will continue using sync().

The qustion still open is whether we can push most/all writes into the
background writer so we can use fsync/open instead of sync.  My point
has been that this might be hard to do with the same performance we have
now.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] PostgreSQL Backup problems with tsearch2

2003-11-10 Thread Ian Barwick
On Monday 10 November 2003 20:47, Ed Baer wrote:
> To whom it may concern,
>
> Please accept my apology if this is not the correct forum, I am new and was
> unable to find the correct location to send this question.
>
> If you don't wish to answer, could you please direct me to the correct
> place to ask the question.

Try the OpenFTS-General list:
http://sourceforge.net/mailarchive/forum.php?forum=openfts-general

The tsearch2 homepage is here:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

(...)

> The question is:
>
> Is the problem with backing up and restoring a database which has tsearch2
> installed and enabled delt with in Version 7.4 of PostgreSQL?

If it's the problem with restoring the tsearch2-related functions, then no,
and I'm not sure whether it's "fixable" (in the sense that a tsearch2 enabled
database will do a painless dump/restore).

I've had some success by making sure all tsearch2-related functions
are in their own schema, which I don't dump or use for restoring; 
before restoring I recreate the schema from a script, then reload
the other schemas. There's a slight gotcha though which I can't recall
offhand. I'll try and write it up next time I got through the process.


Ian Barwick
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pgsql CVS build failure on Debian GNU/Linux 3.0

2003-11-10 Thread Peter Eisentraut
strk writes:

> I can't build postgresql from CVS. Any help ?

Search the fine archives.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[HACKERS] pgsql CVS build failure on Debian GNU/Linux 3.0

2003-11-10 Thread strk
I can't build postgresql from CVS. Any help ?
This is the message I get:

bison -y -d  preproc.y
preproc.y:6275: fatal error: maximum table size (32767) exceeded


Bison version:

bison (GNU Bison) 1.35

TIA
.strk;

-- 
pallamondo.net  <-- take a look !

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Lack of RelabelType is causing me pain

2003-11-10 Thread Joe Conway
Tom Lane wrote:
Joe, do you recall the reasoning for this code in parse_coerce.c?

if (targetTypeId == ANYOID ||
targetTypeId == ANYARRAYOID ||
targetTypeId == ANYELEMENTOID)
{
/* assume can_coerce_type verified that implicit coercion is okay */
/* NB: we do NOT want a RelabelType here */
return node;
}
I see this in REL7_3_STABLE

 else if (targetTypeId == ANYOID ||
  targetTypeId == ANYARRAYOID)
{
/* assume can_coerce_type verified that implicit coercion is okay */
/* NB: we do NOT want a RelabelType here */
result = node;
}
This was introduced here:
--
Revision 2.80 / (download) - annotate - [select for diffs] , Thu Aug 22 
00:01:42 2002 UTC (14 months, 2 weeks ago) by tgl
Changes since 2.79: +42 -19 lines
Diff to previous 2.79

Add a bunch of pseudo-types to replace the behavior formerly associated
with OPAQUE, as per recent pghackers discussion.  I still want to do 
some more work on the 'cstring' pseudo-type, but I'm going to commit the 
bulk of the changes now before the tree starts shifting under me ...
--

I think I just followed suit when adding ANYELEMENTOID.

This is AFAICT the only case where the parser will generate an
expression tree that is not labeled with the same datatype expected
by the next-higher operator.  That is precisely the sort of mismatch
that RelabelType was invented to avoid, and I'm afraid that we have
broken some things by regressing on the explicit representation of
type coercions.
The particular case that is causing me pain right now is that in my
modified tree with support for cross-datatype index operations, cases
involving anyarray_ops indexes are blowing up.  That's because the
visible input type of an indexed comparison isn't matching the declared
righthand input type of any operator in the opclass.  But RelabelType
was put in to avoid a number of other problems that I can't recall in
detail, so I am suspicious that this shortcut breaks other things too.
I think that the reason we did this was to allow get_fn_expr_argtype()
to see the unrelabeled datatype of the input to an anyarray/anyelement-
accepting function.  Couldn't we fix that locally in that function
instead of breaking a system-wide convention?  I'm thinking that we
could simply make that function "burrow down" through any RelabelTypes
for any/anyarray/anyelement.
Does the RelabelType keep a record of what was relabeled (I presume from 
your description above it does)? The original code above predates 
get_fn_expr_argtype() I think, but it sounds like a reasonable approach 
to me.

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Jan Wieck wrote:
>> > If the background cleaner has to not just write() but write/fsync or
>> > write/O_SYNC, it isn't going to be able to clean them fast enough.  It
>> > creates a bottleneck where we didn't have one before.
>> > 
>> > We are trying to eliminate an I/O storm during checkpoint, but the
>> > solutions seem to be making the non-checkpoint times slower.
>> > 
>> 
>> It looks as if you're assuming that I am making the backends unable to 
>> write on their own, so that they have to wait on the checkpointer. I 
>> never said that.
> 
> Maybe I missed it but are those backend now doing write or write/fsync? 
> If the former, that is fine.  If the later, it does seem slower than it
> used to be.

In my all_performance.v4.diff they do write and the checkpointer does 
write+sync.
Again, sorry to be confusing --- I might be good to try write/fsync from
the background writer if backends can do writes on their own too without
fsync.  The additional fsync from the background writer should reduce
disk writing during sync().  (The fsync should happen with the buffer
unlocked.)
No, you're not. But thank you for suggesting what I implemented.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Jan Wieck wrote:
Zeugswetter Andreas SB SD wrote:

>> > One problem with O_SYNC would be, that the OS does not group writes any 
>> > more. So the code would need to eighter do it's own sorting and grouping
>> > (256k) or use aio, or you won't be able to get the maximum out of the disks.
>> 
>> Or just run multiple writer processes, which I believe is Oracle's
>> solution.
> 
> That does not help, since for O_SYNC the OS'es (those I know) do not group those 
> writes together. Oracle allows more than one writer to busy more than one disk(subsystem) and circumvent other per process limitations (mainly on platforms without AIO). 

Yes, I think the best way would be to let the background process write a 
bunch of pages, then fsync() the files written to. If one tends to have 
many dirty buffers to the same file, this will group them together and 
the OS can optimize that. If one really has completely random access, 
then there is nothing to group.
Agreed.  This might force enough stuff out to disk the checkpoint/sync()
would be OK.  Jan, have you tested this?
As said, not using fsync() but sync() at that place. This only makes a 
real difference when you're not running PostgreSQL on a dedicated 
server. And yes, it really works well.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[HACKERS] PostgreSQL Backup problems with tsearch2

2003-11-10 Thread Ed Baer
To whom it may concern,

Please accept my apology if this is not the correct forum, I am new and was unable to 
find the correct location to send this question.

If you don't wish to answer, could you please direct me to the correct place to ask 
the question.

Thanks

The question is:

Is the problem with backing up and restoring a database which has tsearch2 installed 
and enabled delt with in Version 7.4 of PostgreSQL?

If not, is there any timeline for this, or is it not considered important since 
tsearch2 is a contrib component?

I am planning to spend some time rigging up a solution for our implementation, but I 
am not qualified to create one for general use.

Thanks in advance for your consideration.

ewb

=
Edward W. Baer
Always24x7.com
[EMAIL PROTECTED]
=

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Proposal: psql force prompting on notty

2003-11-10 Thread Michael Mauger
--- Michael Mauger <[EMAIL PROTECTED]> wrote:
> --- Peter Eisentraut <[EMAIL PROTECTED]> wrote:
> > Michael Mauger writes:
> > >
> > > Emacs is not a Cygwin (Un*x emulation on Windows) executable (like
> > > psql is)
> > 
> > Why don't you use the native Windows version of psql?
> > 
> 
> The Cygwin setup program makes pre-built binaries of postgres 
> readily available on Windows.  (In fact, using this version is 
> encouraged in the Postgres for Windows installation notes.)  A 
> native Windows version would not work either since the isatty() 
> implementation there will only recognize a Command Prompt 
> window as a tty.
> 

Are there further concerns with this patch?

--- pgsql-server/src/bin/psql/startup.c 29 Sep 2003 18:21:33 -  1.80
+++ pgsql-server/src/bin/psql/startup.c 01 Nov 2003 06:10:42 -
@@ -322,6 +322,7 @@
{"field-separator", required_argument, NULL, 'F'},
{"host", required_argument, NULL, 'h'},
{"html", no_argument, NULL, 'H'},
+   {"interactive", no_argument, NULL, 'I'},
{"list", no_argument, NULL, 'l'},
{"no-readline", no_argument, NULL, 'n'},
{"output", required_argument, NULL, 'o'},
@@ -352,7 +353,7 @@

memset(options, 0, sizeof *options);

-   while ((c = getopt_long(argc, argv,
"aAc:d:eEf:F:h:Hlno:p:P:qR:sStT:uU:v:VWxX?",
+   while ((c = getopt_long(argc, argv,
"aAc:d:eEf:F:h:HIlno:p:P:qR:sStT:uU:v:VWxX?",
long_options, &optindex)) != 
-1)
{
switch (c)
@@ -395,7 +396,10 @@
case 'H':
pset.popt.topt.format = PRINT_HTML;
break;
+   case 'I':
+   pset.notty = 0;
+   break;
case 'l':
options->action = ACT_LIST_DB;
break;
--- pgsql-server/src/bin/psql/help.c02 Oct 2003 06:39:31 -  1.81
+++ pgsql-server/src/bin/psql/help.c01 Nov 2003 06:29:50 -
@@ -103,6 +103,7 @@
puts(_("  -a  echo all input from script"));
puts(_("  -e  echo commands sent to server"));
puts(_("  -E  display queries that internal commands
generate"));
+   puts(_("  -I  force interactive prompting for input"));
puts(_("  -q  run quietly (no messages, only query
output)"));
puts(_("  -o FILENAME send query results to file (or |pipe)"));
puts(_("  -n  disable enhanced command line editing
(readline)"));


__
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Neil Conway wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Another idea --- if fsync() is slow because it can't find the dirty
> > buffers, use write() to write the buffers, copy the buffer to local
> > memory, mark it as clean, then open the file with O_SYNC and write
> > it again.
> 
> Yuck.

This idea if mine will not even work unless others are prevented from
writing that data block while I am fsync'ing from local memory --- what
if someone modified and wrote that block before my block did its fsync
write?  I would overwrite their new data.  It was just a crazy idea.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Bug fix for 7.4?

2003-11-10 Thread Michael Meskes
On Mon, Nov 10, 2003 at 10:50:30AM -0500, Tom Lane wrote:
> I looked over it, and the only part that seems odd is that you seem to
> have removed the type info caching behavior in execute.c.  Is that
> intended?  It looks like ECPGis_type_an_array() will now issue a query

Well actually no. I removed it for testing reasons but didn't re-add.

> on every call for a non-built-in type, which seems rather a large loss.
> Also, if cache_head ever becomes non-null then it stops trusting its
> internal knowledge as well, which seems worse.

I re-added it. 

Memo to /me: Just testing it is not enough. Better to look over the code
again after a night of sleep.

Thanks a lot.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [7.3.x] function does not exist ... ?

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Gaetano Mendola wrote:

> Marc G. Fournier wrote:
>
> > 'k, this doesn't look right, but it could be that I'm overlooking
> > something ...
> >
> > The function I created:
> >
> > CREATE FUNCTION month_trunc (timestamp without time zone) RETURNS timestamp 
> > without time zone
> > AS 'SELECT date_trunc(''month'', $1 )'
> > LANGUAGE sql IMMUTABLE;
> >
> >
> > The query that fails:
> >
> > ams=# select * from traffic_logs where month_trunc(runtime) = month_trunc(now());
> > ERROR:  Function month_trunc(timestamp with time zone) does not exist
> > Unable to identify a function that satisfies the given argument types
> > You may need to add explicit typecasts
>
> now return a timestamp with time zone and your function
> take a timestamp without time zone.
>   ^^^

d'oh, I knew I was mis-reading something there ... thanks

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Lack of RelabelType is causing me pain

2003-11-10 Thread Tom Lane
Joe, do you recall the reasoning for this code in parse_coerce.c?

if (targetTypeId == ANYOID ||
targetTypeId == ANYARRAYOID ||
targetTypeId == ANYELEMENTOID)
{
/* assume can_coerce_type verified that implicit coercion is okay */
/* NB: we do NOT want a RelabelType here */
return node;
}

This is AFAICT the only case where the parser will generate an
expression tree that is not labeled with the same datatype expected
by the next-higher operator.  That is precisely the sort of mismatch
that RelabelType was invented to avoid, and I'm afraid that we have
broken some things by regressing on the explicit representation of
type coercions.

The particular case that is causing me pain right now is that in my
modified tree with support for cross-datatype index operations, cases
involving anyarray_ops indexes are blowing up.  That's because the
visible input type of an indexed comparison isn't matching the declared
righthand input type of any operator in the opclass.  But RelabelType
was put in to avoid a number of other problems that I can't recall in
detail, so I am suspicious that this shortcut breaks other things too.

I think that the reason we did this was to allow get_fn_expr_argtype()
to see the unrelabeled datatype of the input to an anyarray/anyelement-
accepting function.  Couldn't we fix that locally in that function
instead of breaking a system-wide convention?  I'm thinking that we
could simply make that function "burrow down" through any RelabelTypes
for any/anyarray/anyelement.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [7.3.x] function does not exist ... ?

2003-11-10 Thread Andrew Dunstan
Marc G. Fournier wrote:

'k, this doesn't look right, but it could be that I'm overlooking
something ...
The function I created:

CREATE FUNCTION month_trunc (timestamp without time zone) RETURNS timestamp without 
time zone
   AS 'SELECT date_trunc(''month'', $1 )'
   LANGUAGE sql IMMUTABLE;
The query that fails:

ams=# select * from traffic_logs where month_trunc(runtime) = month_trunc(now());
ERROR:  Function month_trunc(timestamp with time zone) does not exist
   Unable to identify a function that satisfies the given argument types
   You may need to add explicit typecasts
The query that succeeds:

ams=# explain select * from traffic_logs where month_trunc(runtime) = '2003-10-01';
   QUERY PLAN
---
Index Scan using tl_month on traffic_logs  (cost=0.00..30751.90 rows=8211 width=36)
  Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time 
zone)
(2 rows)
I haven't mis-spelt anything that I can see ... is this something that is
known not to be doable?
 

Try casting now() to timestamp without time zone?

cheers

andrew

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [7.3.x] function does not exist ... ?

2003-11-10 Thread Gaetano Mendola
Marc G. Fournier wrote:

'k, this doesn't look right, but it could be that I'm overlooking
something ...
The function I created:

CREATE FUNCTION month_trunc (timestamp without time zone) RETURNS timestamp without 
time zone
AS 'SELECT date_trunc(''month'', $1 )'
LANGUAGE sql IMMUTABLE;
The query that fails:

ams=# select * from traffic_logs where month_trunc(runtime) = month_trunc(now());
ERROR:  Function month_trunc(timestamp with time zone) does not exist
Unable to identify a function that satisfies the given argument types
You may need to add explicit typecasts
now return a timestamp with time zone and your function
take a timestamp without time zone.
 ^^^
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] [7.3.x] function does not exist ... ?

2003-11-10 Thread Marc G. Fournier

'k, this doesn't look right, but it could be that I'm overlooking
something ...

The function I created:

CREATE FUNCTION month_trunc (timestamp without time zone) RETURNS timestamp without 
time zone
AS 'SELECT date_trunc(''month'', $1 )'
LANGUAGE sql IMMUTABLE;


The query that fails:

ams=# select * from traffic_logs where month_trunc(runtime) = month_trunc(now());
ERROR:  Function month_trunc(timestamp with time zone) does not exist
Unable to identify a function that satisfies the given argument types
You may need to add explicit typecasts

The query that succeeds:

ams=# explain select * from traffic_logs where month_trunc(runtime) = '2003-10-01';
QUERY PLAN
---
 Index Scan using tl_month on traffic_logs  (cost=0.00..30751.90 rows=8211 width=36)
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time 
zone)
(2 rows)

I haven't mis-spelt anything that I can see ... is this something that is
known not to be doable?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> >> > If the background cleaner has to not just write() but write/fsync or
> >> > write/O_SYNC, it isn't going to be able to clean them fast enough.  It
> >> > creates a bottleneck where we didn't have one before.
> >> > 
> >> > We are trying to eliminate an I/O storm during checkpoint, but the
> >> > solutions seem to be making the non-checkpoint times slower.
> >> > 
> >> 
> >> It looks as if you're assuming that I am making the backends unable to 
> >> write on their own, so that they have to wait on the checkpointer. I 
> >> never said that.
> > 
> > Maybe I missed it but are those backend now doing write or write/fsync? 
> > If the former, that is fine.  If the later, it does seem slower than it
> > used to be.
> 
> In my all_performance.v4.diff they do write and the checkpointer does 
> write+sync.

Again, sorry to be confusing --- I might be good to try write/fsync from
the background writer if backends can do writes on their own too without
fsync.  The additional fsync from the background writer should reduce
disk writing during sync().  (The fsync should happen with the buffer
unlocked.)

You stated you didn't see improvement when the background writer did
non-checkpoint writes unless you modified update(4).  Adding fsync might
correct that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Jan Wieck wrote:
Bruce Momjian wrote:

> Jan Wieck wrote:
>> Bruce Momjian wrote:
>> 
>> > Now, O_SYNC is going to force every write to the disk.  If we have a
>> > transaction that has to write lots of buffers (has to write them to
>> > reuse the shared buffer)
>> 
>> So make the background writer/checkpointer keeping the LRU head clean. I 
>> explained that 3 times now.
> 
> If the background cleaner has to not just write() but write/fsync or
> write/O_SYNC, it isn't going to be able to clean them fast enough.  It
> creates a bottleneck where we didn't have one before.
> 
> We are trying to eliminate an I/O storm during checkpoint, but the
> solutions seem to be making the non-checkpoint times slower.
> 

It looks as if you're assuming that I am making the backends unable to 
write on their own, so that they have to wait on the checkpointer. I 
never said that.
Maybe I missed it but are those backend now doing write or write/fsync? 
If the former, that is fine.  If the later, it does seem slower than it
used to be.
In my all_performance.v4.diff they do write and the checkpointer does 
write+sync.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Neil Conway wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Another idea --- if fsync() is slow because it can't find the dirty
> > buffers, use write() to write the buffers, copy the buffer to local
> > memory, mark it as clean, then open the file with O_SYNC and write
> > it again.
> 
> Yuck.
> 
> Do we have any idea how many kernels are out there that implement
> fsync() as poorly as HPUX apparently does? I'm just wondering if we're
> contemplating spending a whole lot of effort to work around a bug that
> is only present on an (old?) version of HPUX. Do typical BSD derived
> kernels exhibit this behavior? What about Linux? Solaris?

Not sure, but it almost doesn't even matter --- any solution which has
fsync/O_SYNC/sync() in a critical path, even the path of replacing dirty
buffers --- is going to be too slow, I am afraid.  Doesn't matter how
fast fsync() is, it is going to be slow.  

I think Tom's only issue with HPUX is that even if fsync is out of the
critical path (background writer) it is going to consume lots of CPU
time finding those dirty buffers --- not sure how slow that would be.
If it is really slow on HPUX, we could disable the fsync's for the
background writer and just how the OS writes those buffers aggressively.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Neil Conway
Jan Wieck <[EMAIL PROTECTED]> writes:
> We can't resize shared memory because we allocate the whole thing in
> one big hump - which causes the shmmax problem BTW. If we allocate
> that in chunks of multiple blocks, we only have to give it a total
> maximum size to get the hash tables and other stuff right from the
> beginning. But the vast majority of memory, the buffers themself, can
> be made adjustable at runtime.

Yeah, writing a palloc()-style wrapper over shm has been suggested
before (by myself among others). You could do the shm allocation in
fixed-size blocks (say, 1 MB each), and then do our own memory
management to allocate and release smaller chunks of shm when
requested. I'm not sure what it really buys us, though: sure, we can
expand the shared buffer area to some degree, but

(a) how do we know what the right size of the shared buffer
area /should/ be? It is difficult enough to avoid running
the machine out of physical memory, let alone figure out
how much memory is being used by the kernel for the buffer
cache and how much we should use ourselves. I think the
DBA needs to configure this anyway.

(b) the amount of shm we can ultimately use is finite, so we
will still need to use a lot of caution when placing
dynamically-sized data structures in shm. A shm_alloc()
might help this somewhat, but I don't see how it would
remove the fundamental problem.

-Neil


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Zeugswetter Andreas SB SD wrote:
> 
> >> > One problem with O_SYNC would be, that the OS does not group writes any 
> >> > more. So the code would need to eighter do it's own sorting and grouping
> >> > (256k) or use aio, or you won't be able to get the maximum out of the disks.
> >> 
> >> Or just run multiple writer processes, which I believe is Oracle's
> >> solution.
> > 
> > That does not help, since for O_SYNC the OS'es (those I know) do not group those 
> > writes together. Oracle allows more than one writer to busy more than one 
> > disk(subsystem) and circumvent other per process limitations (mainly on platforms 
> > without AIO). 
> 
> Yes, I think the best way would be to let the background process write a 
> bunch of pages, then fsync() the files written to. If one tends to have 
> many dirty buffers to the same file, this will group them together and 
> the OS can optimize that. If one really has completely random access, 
> then there is nothing to group.

Agreed.  This might force enough stuff out to disk the checkpoint/sync()
would be OK.  Jan, have you tested this?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Neil Conway
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Another idea --- if fsync() is slow because it can't find the dirty
> buffers, use write() to write the buffers, copy the buffer to local
> memory, mark it as clean, then open the file with O_SYNC and write
> it again.

Yuck.

Do we have any idea how many kernels are out there that implement
fsync() as poorly as HPUX apparently does? I'm just wondering if we're
contemplating spending a whole lot of effort to work around a bug that
is only present on an (old?) version of HPUX. Do typical BSD derived
kernels exhibit this behavior? What about Linux? Solaris?

-Neil


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] What do you want me to do?

2003-11-10 Thread Robert Treat
On Fri, 2003-11-07 at 18:37, Andrew Dunstan wrote:
> Robert Treat wrote:
> >On Fri, 2003-11-07 at 15:28, Andrew Dunstan wrote:
> >>Marc G. Fournier wrote:
> >>>On Fri, 7 Nov 2003, Robert Treat wrote:
> I know most people have talked about using bugzilla, but is anyone
> familiar with GNATS?  I'm currently rereading Open Sources and there's a
> paragraph or two mentioning it's use and the fact that it can be
> interfaced with completely by email.
>    
> >>>FreeBSD uses it almost exclusively and it supports email interaction with
> >>>the database, but I don't think there are very many good GUI front ends
> >>>for it (or, at least, not that I've seen) ...
> >>>
> >>No.
> >
> >personal axe to grind?  
> >
> 
> er, no. I was only agreeing with Marc about GUI interfaces. What axe to 
> grind do you imagine I could have?

sorry, i just wondered because you gave a one word response dismissing
the idea and moved on...

> 
> >I've never used it, but it's been around a long
> >time, allows for interaction completely through email (which is how we
> >do things now), has a web front end for anyone who wants to use it to
> >use, and as i understand it has a tcl based desktop app for folks to use
> >as well.  seems it's being dismissed prematurely imho.
> >
> Every person wishing to submit a bug will have to have send-pr installed 
> or else we'll get lots of reports not broken up into fields. That 
> doesn't sound like a recipe for success to me.
> 

not really... we can still have a web interface to it, so anyone
submitting a bug could use the web interface. now maybe for regular
folks working on bugs this would be an issue.. don't know, i'm not
familiar send-pr...

> >
> >>A few other thoughts:
> >>. the Samba team have apparently abandoned their own tool and moved to 
> >>bugzilla
> >>. if we used bugzilla this might give some impetus to the bugzilla 
> >>team's efforts to provide pg as a backend (maybe we could help with that)
> >>. it would seem slightly strange to me for an RDBMS project to use a bug 
> >>tracking system that was not RDBMS-backed
> >>
> >>
> >
> >we serve far more static pages on the website than we do database driven
> >ones... 
> >
> *nod* but there has been talk of moving to bricolage, hasn't there?
>

if only because it outputs static content...
 
> >the software we distribute is housed on fileservers and sent via
> >ftp, we dont expect people to store and retrieve it from a database...
> >
> 
> you're reaching now ...
> 
> >our mailing lists software actually uses another db product in fact...
> >let's just get the right tool for the job... 
> > 
> 
> Yes. I agree. Bugs (including enhancements) strike me as a classic case 
> of data that belongs in a database.
> 

i think it's something that needs to be searchable, whether that
requires "the worlds most powerful open source object relational
database management system" is something else entirely ;-)

> >  
> >
> >>. developers are far more likely to be familiar with bugzilla
> >
> >developers are far more likely to be familiar with windows and mysql as
> >well...
> >
> 
> c'mon ...
> 
your strawman meets my strawman... 

> >
> >>. are there any active developers without web access? If not, why is 
> >>pure email interaction important?
> >>
> >
> >for the same reason mailing lists work better than message boards...
> >it's just easier. i'm much more likely to read an email list the scroll
> >through web forms, and if i am going to respond to a bug report, i'm
> >much mroe likely to if i can hit "reply" and start typing than if i have
> >to fire up a browser to do it.
> >
> 
> Tom explicitly said he *didn't* want a system where email poured 
> straight into the bugtrack db.
> 

Which I find odd since thats essentially the system we have now... 

> Yes, it is a different way of doing things, and it takes getting used to.
> 
> >
> >>Bugzilla is far from perfect. But it's getting better.
> >>
> >
> >don't get me wrong, i like bugzilla and all, but theres no need to put
> >blinders on...
> >
> 
> I don't. But I do think the current processes can stand improvement.
> 
right... i think gnats is one way of doing that. bugzilla is decent, it
just doesn't seems as advanced as gnats which is why i brought it up...
we don't need to beat it to death though, I think tom has fixed more
bugs than me recently so if he is interested in bugzilla i'm all for
giving it a twirl...

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Larry Rosenman


--On Monday, November 10, 2003 13:40:24 -0500 Neil Conway 
<[EMAIL PROTECTED]> wrote:

Larry Rosenman <[EMAIL PROTECTED]> writes:
You might also look at Veritas' advisory stuff.
Thanks for the suggestion -- it looks like we can make use of
this. For the curious, the cache advisory API is documented here:
http://www.lerctr.org:8458/en/man/html.7/vxfsio.7.html
http://www.lerctr.org:8458/en/ODM_FSadmin/fssag-9.html#MARKER-9-1
Note that unlike for posix_fadvise(), the docs for this functionality
explicitly state:
Some advisories are currently maintained on a per-file, not a
per-file-descriptor, basis. This means that only one set of
advisories can be in effect for all accesses to the file. If two
conflicting applications set different advisories, both use the
last advisories that were set.
BTW, if ANY developer wants to play with this, I can make an account for 
them.  I have ODM installed on lerami.lerctr.org (www.lerctr.org is a 
CNAME).

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


pgp0.pgp
Description: PGP signature


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes:
> > One problem with O_SYNC would be, that the OS does not group writes any 
> > more. So the code would need to eighter do it's own sorting and grouping
> > (256k) or use aio, or you won't be able to get the maximum out of the disks.
> 
> Or just run multiple writer processes, which I believe is Oracle's
> solution.

Yes, that might need to be the final solution because the O_SYNC will be
slow.  However, that is a lot of "big wrench" solution to removing
sync() --- it would be nice if we could find a more eligant way.

In fact, one goffy idea would be if the OS does sync every 30 seconds to
just write() the buffers and wait 30 seconds for the OS to issue the
sync, then recycle the WAL buffers --- again, just a crazy thought.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-10 Thread Marc G. Fournier

In fact: http://oosurvey.gratismania.ro/stats

On Mon, 10 Nov 2003, Marc G. Fournier wrote:

>
>
> On Mon, 10 Nov 2003, Josh Berkus wrote:
>
> > Justin,
> >
> > >  From memory, the OpenOffice.org surveys still run from the techdocs
> > > virtual machine too.  That may or may not be the case these days, I just
> > > don't remember.
> >
> > Really?   I thought that they were running from one of Sun's machines.   Will
> > check with Cristian.
> >
> > If we're hosting the surveys, I want a "Powered by PostgreSQL" bug on them,
> > dammit.   Those get 21,000 views a week.
>
> start of current access_log: 217.1.97.253 - - [08/Nov/2003:08:00:28 -0500]
>   end of current access_log: 141.211.97.33 - - [10/Nov/2003:13:28:39 -0500]
>
> jobs# grep http://oosurvey.gratismania.ro/user/index.php access_log | egrep -v 
> "images" | wc -l
> 2966
>
> looks like its still well used ...
>
>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> 
> > Jan Wieck wrote:
> >> Bruce Momjian wrote:
> >> > I would be interested to know if you have the background write process
> >> > writing old dirty buffers to kernel buffers continually if the sync()
> >> > load is diminished.  What this does is to push more dirty buffers into
> >> > the kernel cache in hopes the OS will write those buffers on its own
> >> > before the checkpoint does its write/sync work.  This might allow us to
> >> > reduce sync() load while preventing the need for O_SYNC/fsync().
> >> 
> >> I tried that first. Linux 2.4 does not, as long as you don't tell it by 
> >> reducing the dirty data block aging time with update(8). So you have to 
> >> force it to utilize the write bandwidth in the meantime. For that you 
> >> have to call sync() or fsync() on something.
> >> 
> >> Maybe O_SYNC is not as bad an option as it seems. In my patch, the 
> >> checkpointer flushes the buffers in LRU order, meaning it flushes the 
> >> least recently used ones first. This has the side effect that buffers 
> >> returned for replacement (on a cache miss, when the backend needs to 
> >> read the block) are most likely to be flushed/clean. So it reduces the 
> >> write load of backends and thus the probability that a backend is ever 
> >> blocked waiting on an O_SYNC'd write().
> >> 
> >> I will add some counters and gather some statistics how often the 
> >> backend in comparision to the checkpointer calls write().
> > 
> > OK, new idea.  How about if you write() the buffers, mark them as clean
> > and unlock them, then issue fsync().  The advantage here is that we can
> 
> Not really new, I think in my first mail I wrote that I simplified this 
> new mdfsyncrecent() function by calling sync() instead ... other than 
> that the code I posted worked exactly that way.

I am confused --- I was suggesting we call fsync after we write a few
blocks for a given table, and that was going to happen between
checkpoints.  Is the sync() happening then or only at checkpoint time.

Sorry I am lost but there seems to be an email delay in my receiving the
replies.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> 
> > Jan Wieck wrote:
> >> Bruce Momjian wrote:
> >> 
> >> > Now, O_SYNC is going to force every write to the disk.  If we have a
> >> > transaction that has to write lots of buffers (has to write them to
> >> > reuse the shared buffer)
> >> 
> >> So make the background writer/checkpointer keeping the LRU head clean. I 
> >> explained that 3 times now.
> > 
> > If the background cleaner has to not just write() but write/fsync or
> > write/O_SYNC, it isn't going to be able to clean them fast enough.  It
> > creates a bottleneck where we didn't have one before.
> > 
> > We are trying to eliminate an I/O storm during checkpoint, but the
> > solutions seem to be making the non-checkpoint times slower.
> > 
> 
> It looks as if you're assuming that I am making the backends unable to 
> write on their own, so that they have to wait on the checkpointer. I 
> never said that.

Maybe I missed it but are those backend now doing write or write/fsync? 
If the former, that is fine.  If the later, it does seem slower than it
used to be.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> > If the background cleaner has to not just write() but write/fsync or
> > write/O_SYNC, it isn't going to be able to clean them fast enough.  It
> > creates a bottleneck where we didn't have one before.
> > 
> > We are trying to eliminate an I/O storm during checkpoint, but the
> > solutions seem to be making the non-checkpoint times slower.
> > 
> 
> It looks as if you're assuming that I am making the backends unable to 
> write on their own, so that they have to wait on the checkpointer. I 
> never said that.
> 
> If the checkpointer keeps the LRU heads clean, that lifts off write load 
> from the backends. Sure, they will be able to dirty pages faster. 
> Theoretically, because in practice if you have a reasonably good cache 
> hitrate, they will just find already dirty buffers where they just add 
> some more dust.
> 
> If after all the checkpointer (doing write()+whateversync) is not able 
> to keep up with the speed of buffers getting dirtied, the backends will 
> have to do some write()'s again, because they will eat up the clean 
> buffers at the LRU head and pass the checkpointer.

Yes, there are a couple of issues here --- first, have a background
writer to write dirty pages.  This is good, no question.  The bigger
question is removing sync() and using fsync() or O_SYNC for every write
--- if we do that, the backends doing private write will have to fsync
their writes too, meaning if the checkpointer can't keep up, we now have
backends doing slow writes too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Catching "UPDATE 0"

2003-11-10 Thread Neil Conway
<[EMAIL PROTECTED]> writes:
> I am trying to catch the "UPDATE 0" condition in postgresql.
> I have tried it using triggers, but they are only fired when there
> effectively is an update.

You could try using an AFTER STATEMENT trigger (which will be invoked
even if zero rows are updated), and checking the size of the set of
modified rows ... except that we currently don't provide a way for
per-statement triggers to access the set of modified tuples.

If someone implements that functionality (which is on the TODO list),
that would be a clean solution to your problem.

-Neil


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> 
> > Tom Lane wrote:
> >> Andrew Sullivan <[EMAIL PROTECTED]> writes:
> >> > On Sun, Nov 02, 2003 at 01:00:35PM -0500, Tom Lane wrote:
> >> >> real traction we'd have to go back to the "take over most of RAM for
> >> >> shared buffers" approach, which we already know to have a bunch of
> >> >> severe disadvantages.
> >> 
> >> > I know there are severe disadvantages in the current implementation,
> >> > but are there in-principle severe disadvantages?
> >> 
> >> Yes.  For one, since we cannot change the size of shared memory
> >> on-the-fly (at least not portably), there is no opportunity to trade off
> >> memory usage dynamically between processes and disk buffers.  For
> >> another, on many systems shared memory is subject to being swapped out.
> >> Swapping out dirty buffers is a performance killer, because they must be
> >> swapped back in again before they can be written to where they should
> >> have gone.  The only way to avoid this is to keep the number of shared
> >> buffers small enough that they all remain fairly "hot" (recently used)
> >> and so the kernel won't be tempted to swap out any part of the region.
> > 
> > Agreed, we can't resize shared memory, but I don't think most OS's swap
> > out shared memory, and even if they do, they usually have a kernel
> 
> We can't resize shared memory because we allocate the whole thing in one 
> big hump - which causes the shmmax problem BTW. If we allocate that in 
> chunks of multiple blocks, we only have to give it a total maximum size 
> to get the hash tables and other stuff right from the beginning. But the 
> vast majority of memory, the buffers themself, can be made adjustable at 
> runtime.

That is an interesting idea.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Now, if we are sure that writes will happen only in the checkpoint
> > process, O_SYNC would be OK, I guess, but will we ever be sure of that?
> 
> This is a performance issue, not a correctness issue.  It's okay for
> backends to wait for writes as long as it happens very infrequently.
> The question is whether we can design a background dirty-buffer writer
> that works well enough to make it uncommon for backends to have to
> write dirty buffers for themselves.  If we can, then doing all the
> writes O_SYNC would not be a problem.

Agreed.  My concern is that right now we do write() in each backend. 
Those writes are probably pretty fast, probably as fast as a read() when
the buffer is already in the kernel cache.  The current discussion
involves centralizing most of the writes (centralization can be slower),
and having the writes forced to disk.  That seems like it could be a
double-killer.

> (One possibility that could help improve the odds is to allow a certain
> amount of slop in the LRU buffer reuse policy --- that is, if you see
> the buffer at the tail of the LRU list is dirty, allow one of the next
> few buffers to be taken instead, if it's clean.  Or just keep separate
> lists for dirty and clean buffers.)

Yes, I think you almost will have to split the LRU list into
dirty/clean, and that might make dirty buffers stay around longer.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Andrew Sullivan
On Sun, Nov 09, 2003 at 08:54:25PM -0800, Joe Conway wrote:
> two servers, mounted to the same data volume, and some kind of 
> coordination between the writer processes. Anyone know if this is 
> similar to how Oracle handles RAC?

It is similar, yes, but there's some mighty powerful magic in that
"some kind of co-ordination".  What do you do when one of the
particpants crashes, for instance?  

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> What bothers me a little is that you keep telling us that you have all 
> that great code from SRA. Do you have any idea when they intend to share 
> this with us and contribute the stuff? I mean at least some pieces 
> maybe? You personally got all the code from NuSphere AKA PeerDirect even 
> weeks before it got released. Did any PostgreSQL developer other than 
> you ever look at the SRA code?

I can get the open/fsync/write/close patch from SRA released, I think. 
Let me ask them now.

Tom has seen the Win32 tarball (with SRA's approval) because he wanted
to research if threading was something we should pursue.  I haven't
heard a report back from him yet.  If you would like to see the tarball,
I can ask them.

Agreed, I got the PeerDirect/Nusphere code very early and it was a help.
I am sure I can get some of it released.  I haven't pursued the sync
Win32 patch because it is based on a threaded backend model, so it is
different from how it need to be done in a process model (all shared
file descriptors).  However, I will need to get approval in the end
anyway for Win32 because I need that Win32-specific part anyway.

I just looked at the sync() call in the code and it just did _flushall:


http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/html/_crt__flushall.asp

I can share this because I know it was discussed when someone (SRA?)
realized _commit() didn't force all buffers to disk.  In fact, _commit
is fsync().

I think the only question was whether _flushall() fsync file descriptors
that have been closed.  Perhaps SRA keeps the file descriptors open
until after the checkpoint, or does it fsync closed files with dirty
buffers.  Tatsuo?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Neil Conway
Larry Rosenman <[EMAIL PROTECTED]> writes:
> You might also look at Veritas' advisory stuff.

Thanks for the suggestion -- it looks like we can make use of
this. For the curious, the cache advisory API is documented here:

http://www.lerctr.org:8458/en/man/html.7/vxfsio.7.html
http://www.lerctr.org:8458/en/ODM_FSadmin/fssag-9.html#MARKER-9-1

Note that unlike for posix_fadvise(), the docs for this functionality
explicitly state:

Some advisories are currently maintained on a per-file, not a
per-file-descriptor, basis. This means that only one set of
advisories can be in effect for all accesses to the file. If two
conflicting applications set different advisories, both use the
last advisories that were set.

-Neil


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Josh Berkus wrote:

> Justin,
>
> >  From memory, the OpenOffice.org surveys still run from the techdocs
> > virtual machine too.  That may or may not be the case these days, I just
> > don't remember.
>
> Really?   I thought that they were running from one of Sun's machines.   Will
> check with Cristian.
>
> If we're hosting the surveys, I want a "Powered by PostgreSQL" bug on them,
> dammit.   Those get 21,000 views a week.

start of current access_log: 217.1.97.253 - - [08/Nov/2003:08:00:28 -0500]
  end of current access_log: 141.211.97.33 - - [10/Nov/2003:13:28:39 -0500]

jobs# grep http://oosurvey.gratismania.ro/user/index.php access_log | egrep -v 
"images" | wc -l
2966

looks like its still well used ...


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-10 Thread Josh Berkus
Justin,

>  From memory, the OpenOffice.org surveys still run from the techdocs
> virtual machine too.  That may or may not be the case these days, I just
> don't remember.

Really?   I thought that they were running from one of Sun's machines.   Will 
check with Cristian.   

If we're hosting the surveys, I want a "Powered by PostgreSQL" bug on them, 
dammit.   Those get 21,000 views a week.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Zeugswetter Andreas SB SD wrote:

> One problem with O_SYNC would be, that the OS does not group writes any 
> more. So the code would need to eighter do it's own sorting and grouping
> (256k) or use aio, or you won't be able to get the maximum out of the disks.

Or just run multiple writer processes, which I believe is Oracle's
solution.
That does not help, since for O_SYNC the OS'es (those I know) do not group those 
writes together. Oracle allows more than one writer to busy more than one disk(subsystem) and circumvent other per process limitations (mainly on platforms without AIO). 
Yes, I think the best way would be to let the background process write a 
bunch of pages, then fsync() the files written to. If one tends to have 
many dirty buffers to the same file, this will group them together and 
the OS can optimize that. If one really has completely random access, 
then there is nothing to group.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Larry Rosenman


--On Monday, November 10, 2003 11:40:45 -0500 Neil Conway 
<[EMAIL PROTECTED]> wrote:

Bruce Momjian <[EMAIL PROTECTED]> writes:
Now, the disadvantages of large kernel cache, small PostgreSQL buffer
cache is that data has to be transfered to/from the kernel buffers, and
second, we can't control the kernel's cache replacement strategy, and
will probably not be able to in the near future, while we do control our
own buffer cache replacement strategy.
The intent of the posix_fadvise() work is to at least provide a
few hints about our I/O patterns to the kernel's buffer
cache. Although only Linux supports it (right now), that should
hopefully improve the status quo for a fairly significant portion of
our user base.
I'd be curious to see a comparison of the cost of transferring data
from the kernel's buffers to the PG bufmgr.
You might also look at Veritas' advisory stuff.  If you want exact doc
pointers, I can provide them, but they are in the Filesystem section
of http://www.lerctr.org:8458/
LER

-Neil

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


pgp0.pgp
Description: PGP signature


Re: [HACKERS] Dreaming About Redesigning SQL

2003-11-10 Thread Hannu Krosing
Anthony W. Youngman kirjutas K, 05.11.2003 kell 01:15:
> >1) Your database might change over time and say a table that originally 
> >had only a few rows
> >could suddenty grow considerably.  Now an optimiser would insulate you 
> >from these changes
> >or in the worst case all that would need to be done would be to create 
> >an index (and, yes, check
> >that the DBMS starts using it).
> 
> Except that an optimiser is *irrelevant* to MV. What do we need to be
> insulated from? MV doesn't care whether a FILE is 4Kb or 40Gb, the cost
> of accessing a single record, AT RANDOM, from within that FILE is almost
> identical. Where would we gain from an optimiser? In practice, it would
> get in the way and slow us down!

getting a single record from any DB ,AT RANDOM, follows the same rules
;)

> >
> >2) You might have a product that runs in a number of sites: large ones 
> >and small
> >ones.  Now you would not have to reoptimise the programs for each type site.
> 
> BUT WE DON'T NEED AN OPTIMISER. IT'S A WASTE OF CPU TIME!!! WE
> *D*O*N*'*T* *N*E*E*D* ONE!!!

on slashdot this would be tagged *funny* ;)

> >3) Complex SQL-queries do quite a lot of things and it might not be very 
> >obvious for
> >the programmer how to optimise best.
> 
> But a large chunk of SQL's complexity is reassembling a view of an
> entity.

perhaps "a large chunk of initial perceived complexity of SQL" is
reassembling a view of an entity. You will get over it in a day or two
;)

that is *if * the thing you are after *is* an entity.

>  MV doesn't have that complexity. An MV program views the
> database the same way as a programmer views the real world.

You mean screenfuls of weird green glowing letters running down the
screen leaving slowly fading tracks ?

> So it's pretty obvious to a MV programmer how to optimise things.

I've never been very good at optimising the real world - the obvious
optimisations have very limited scope.

> >4) depending on input from user (say, a search screen) the optimal 
> >access path may be different. An optimiser
> >could generate a different path depending on this input.
> 
> Again, MV views the entity as a whole, so probably we don't need to
> generate a "different path" - it's just "get me this entity" regardless
> of what we need to know about it.

Not "what we need to know about it" but "what we already know about it".

So it is always a SEQUENTIAL SCAN , non ?

or is there some magic by which you have all "entities" automatically
hashed by each and every attribute (or combination of attributes) ?

> >> We're not interested in being able to
> >>improve the speed at which the db can find data to respond to an app
> >>request - with an access factor of 1.05 (actually, it's nearer 1.02 or
> >>1.03) we consider any effort there to be a waste of time ...
> >>
> >But isn't it better to have NO disk reads than one?  I thought disk I/O 
> >was rather expensive?  With
> >that mentality you will always be disk bound.
> 
> I'm assuming we don't have sufficient RAM to cache stuff ...
> 
> Our mentality is to leave disk caching to the OS. The app says "get me
> X". The database knows *exactly* where to look and asks the OS to "get
> me disk sector Y".

How does the database map X to Y, without any extra info (meaning extra
disk accesses) ?

If you can always predict your data needs that well, you dont need a
database, all you need is a file system.

> Any OS worth its salt will have that cached if it's
> been asked for previously recently.

Were you not talking about databases with substantially more data than
fits into RAM ?

> That way, we're only caching stuff
> that's been accessed recently. But because for us the "atomic" chunk is
> an entity, there's a good chance that stuff has been accessed and is in
> cache.

depending on your point of view, anything can be an "entity" (or atomic
chunk) ;)

> SQL optimisation *seems* to be more "efficient" because it tries to
> predict what you're going to want next. 

Where do you get your weird ideas about SQL optimisation from ?

> But whereas SQL *guesses* that
> because you've accessed one order detail, you're likely to want other
> order details from the same invoice (a sensible guess), you cannot
> compare this to MV because it gives you those order details as a side
> effect. In order for MV optimisation to be of any use, it would need to
> guess which INVOICE I'm going to access next, and frankly a random
> number generator is probably as good an optimiser as any!

So you claim that MV is good for problems you already know the best way
to solve ?

> >>Basically, the only way you can beat us in the real world is to throw
> >>hardware at the problem - and like I said with linux and macro/micro
> >>kernels, we can do the same :-)
> >>
> >Well, please do!
> 
> We do. Which is why we can smoke any relational db for speed unless the
> hardware is big enough to store the entire database in RAM (and even
> then we'd beat it for speed :-) (just not that much in absolut

[HACKERS] Catching "UPDATE 0"

2003-11-10 Thread enio
Hello,
I am trying to catch the "UPDATE 0" condition in postgresql.
I have tried it using triggers, but they are only fired when there
effectively is an update. So, an Update 0 does not fire triggers.
Is it possible to make the backend raise and exception when the
Update 0 happens? This would be useful for knowing when records
were not updated as those which are in concurrent transactions.
By the way, How do I know if a transaction was really committed
and not rollbacked?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Zeugswetter Andreas SB SD

> > One problem with O_SYNC would be, that the OS does not group writes any 
> > more. So the code would need to eighter do it's own sorting and grouping
> > (256k) or use aio, or you won't be able to get the maximum out of the disks.
> 
> Or just run multiple writer processes, which I believe is Oracle's
> solution.

That does not help, since for O_SYNC the OS'es (those I know) do not group those 
writes together. Oracle allows more than one writer to busy more than one 
disk(subsystem) and circumvent other per process limitations (mainly on platforms 
without AIO). 

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Neil Conway
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Now, the disadvantages of large kernel cache, small PostgreSQL buffer
> cache is that data has to be transfered to/from the kernel buffers, and
> second, we can't control the kernel's cache replacement strategy, and
> will probably not be able to in the near future, while we do control our
> own buffer cache replacement strategy.

The intent of the posix_fadvise() work is to at least provide a
few hints about our I/O patterns to the kernel's buffer
cache. Although only Linux supports it (right now), that should
hopefully improve the status quo for a fairly significant portion of
our user base.

I'd be curious to see a comparison of the cost of transferring data
from the kernel's buffers to the PG bufmgr.

-Neil


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Performance features the 4th

2003-11-10 Thread scott.marlowe
On Sun, 9 Nov 2003, Jan Wieck wrote:

> scott.marlowe wrote:
> 
> > On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:
> > 
> >> - Original Message - 
> >> From: "Jan Wieck" <[EMAIL PROTECTED]>
> >> > Tom Lane wrote:
> >> > > Gaetano and a couple of other people did experiments that seemed to show
> >> > > it was useful.  I think we'd want to change the shape of the knob per
> >> > > later suggestions (sleep 10 ms every N blocks, instead of N ms every
> >> > > block) but it did seem that there was useful bang for little buck there.
> >> >
> >> > I thought it was "sleep N ms every M blocks".
> >> >
> >> > Have we seen any numbers? Anything at all? Something that gives us a
> >> > clue by what factor one has to multiply the total time a "VACUUM
> >> > ANALYZE" takes, to get what effect in return?
> >> 
> >> I have some time on sunday to do some testing.  Is there a patch that I can
> >> apply that implements either of the two options? (sleep 10ms every M blocks
> >> or sleep N ms every M blocks).
> >> 
> >> I know Tom posted the original patch that sleept N ms every 1 block (where N
> >> is > 10 due to OS limitations).  Jan can you post a patch that has just the
> >> sleep code in it? Or should it be easy enough for me to cull out of the
> >> larger patch you posted?
> > 
> > The reason for the change is that the minumum sleep period on many systems 
> > is 10mS, which meant that vacuum was running 20X slower than normal.  
> > While it might be necessary in certain very I/O starved situations to make 
> > it this slow, it would probably be better to be able to get a vacuum that 
> > ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
> > less than 10mS on most systems, it's better to just leave it at a fixed 
> > amount and change the number of pages vacuumed per sleep.
> 
> I disagree with that. If you limit yourself to the number of pages being 
> the only knob you have and set the napping time fixed, you can only 
> lower the number of sequentially read pages to slow it down. Making read 
> ahead absurd in an IO starved situation ...
> 
> I'll post a patch doing
> 
>  every N pages nap for M milliseconds
> 
> using two GUC variables and based on a select(2) call later.

I didn't mean "fixed in the code"  I meant in your setup.  I.e. find a 
delay (10mS, 50, 100 etc...) then vary the number of pages processed at a 
time until you start to notice the load, then back it off.

Not being forced by the code to have one and only one delay value, setting 
it yourself.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Question for the developers.

2003-11-10 Thread Jan Wieck
[EMAIL PROTECTED] wrote:

Hi,

 I am Suchindra Katageri and am working as a Software Engineer at
 Linuxlabs, Atlanta, GA.
 
 I am presently working on developing libraries to make postgresql
 run on a cluster. I was wondering if it was possible to force Database
 writes to stable storage, without messing up with the postgres code.
 e.g. force the PostgreSQL to write to stable storage after an INSERT
 command.
I don't see how that would help you very far.

PostgreSQL holds disk buffers in a shared memory cache. And unless you 
modify that cache, it would not read from the file again if it thinks it 
knows the content already. How do you intend to remove pages from that 
shared buffer cache at the time, another cluster member modifies the 
same logical page?

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Tom Lane
"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes:
> One problem with O_SYNC would be, that the OS does not group writes any 
> more. So the code would need to eighter do it's own sorting and grouping
> (256k) or use aio, or you won't be able to get the maximum out of the disks.

Or just run multiple writer processes, which I believe is Oracle's
solution.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Zeugswetter Andreas SB SD

> that works well enough to make it uncommon for backends to have to
> write dirty buffers for themselves.  If we can, then doing all the
> writes O_SYNC would not be a problem.

One problem with O_SYNC would be, that the OS does not group writes any 
more. So the code would need to eighter do it's own sorting and grouping
(256k) or use aio, or you won't be able to get the maximum out of the disks.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Bug fix for 7.4?

2003-11-10 Thread Tom Lane
Michael Meskes <[EMAIL PROTECTED]> writes:
> I just fixed a bug in ecpglib that caused it to misbehave for all
> internal array/vector types. They were treated like the external ones.
> So ecpg expected them to be listed as '{...}' which surely failed. 
> However, this bug fix involves more code changes than I like to just
> commit into 7.4 at this point of the release. So the fixes are so far
> just committed to HEAD. On the other hand I don't like to release 7.4
> with this know bug. 
> The only solution I see is someone else looking over the patch resp.
> testing it. It worked well with my test suite, but then this may not
> catch all side effects.

I looked over it, and the only part that seems odd is that you seem to
have removed the type info caching behavior in execute.c.  Is that
intended?  It looks like ECPGis_type_an_array() will now issue a query
on every call for a non-built-in type, which seems rather a large loss.
Also, if cache_head ever becomes non-null then it stops trusting its
internal knowledge as well, which seems worse.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Robert Treat wrote:

> On Fri, 2003-11-07 at 13:51, Josh Berkus wrote:
> > > really the most important thing here is that we get some movement on the
> > > site in order to ditch the old VM the site lives on and get it on our new
> > > web VM.
> >
> > On techdocs?   What part of that needs to be migrated?
> >
>
> Last I check it was the whole thing... techdocs runs on its own VM, the
> other sites all run on a different VM. We need to kill the old VM, but
> until we move techdocs to it's new home, we can't

And there is no pressure/hurry for this to be done ... its not a 'simple
move', but a redesign based on new technology ... what is there now,
works, so no pressure

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Unable to load libsqlpg.so

2003-11-10 Thread Welly



I just 
installed kylix3 and had a problem when try to connect PostgreSQLI use 
SQLConnection to connect with PostgreSQLwhen I set property connection to 
true it causean error "unable to load libsqlpg.so"can anyone help 
me?thanxwelly


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-10 Thread Robert Treat
On Fri, 2003-11-07 at 13:51, Josh Berkus wrote:
> > really the most important thing here is that we get some movement on the
> > site in order to ditch the old VM the site lives on and get it on our new
> > web VM.
> 
> On techdocs?   What part of that needs to be migrated?
> 

Last I check it was the whole thing... techdocs runs on its own VM, the
other sites all run on a different VM. We need to kill the old VM, but
until we move techdocs to it's new home, we can't

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Tom Lane wrote:

Bruce Momjian <[EMAIL PROTECTED]> writes:
Now, if we are sure that writes will happen only in the checkpoint
process, O_SYNC would be OK, I guess, but will we ever be sure of that?
This is a performance issue, not a correctness issue.  It's okay for
backends to wait for writes as long as it happens very infrequently.
The question is whether we can design a background dirty-buffer writer
that works well enough to make it uncommon for backends to have to
write dirty buffers for themselves.  If we can, then doing all the
writes O_SYNC would not be a problem.
(One possibility that could help improve the odds is to allow a certain
amount of slop in the LRU buffer reuse policy --- that is, if you see
the buffer at the tail of the LRU list is dirty, allow one of the next
few buffers to be taken instead, if it's clean.  Or just keep separate
lists for dirty and clean buffers.)
If the checkpointer is writing in LRU order (which is the order buffers 
normally get replaced), this happening would mean that the backends have 
replaced all clean buffers at the LRU head and this can only happen if 
the currently running checkpointer is working way too slow. If it is 
more than 30 seconds away from its target finish time, it would be a 
good idea to restart by building a (guaranteed long now) new todo list 
and write faster (but starting again at the LRU head). If it's too late 
for that, stop napping, finish this checkpoint NOW and start a new one 
immediately.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Jan Wieck wrote:
Bruce Momjian wrote:
> I would be interested to know if you have the background write process
> writing old dirty buffers to kernel buffers continually if the sync()
> load is diminished.  What this does is to push more dirty buffers into
> the kernel cache in hopes the OS will write those buffers on its own
> before the checkpoint does its write/sync work.  This might allow us to
> reduce sync() load while preventing the need for O_SYNC/fsync().
I tried that first. Linux 2.4 does not, as long as you don't tell it by 
reducing the dirty data block aging time with update(8). So you have to 
force it to utilize the write bandwidth in the meantime. For that you 
have to call sync() or fsync() on something.

Maybe O_SYNC is not as bad an option as it seems. In my patch, the 
checkpointer flushes the buffers in LRU order, meaning it flushes the 
least recently used ones first. This has the side effect that buffers 
returned for replacement (on a cache miss, when the backend needs to 
read the block) are most likely to be flushed/clean. So it reduces the 
write load of backends and thus the probability that a backend is ever 
blocked waiting on an O_SYNC'd write().

I will add some counters and gather some statistics how often the 
backend in comparision to the checkpointer calls write().
OK, new idea.  How about if you write() the buffers, mark them as clean
and unlock them, then issue fsync().  The advantage here is that we can
Not really new, I think in my first mail I wrote that I simplified this 
new mdfsyncrecent() function by calling sync() instead ... other than 
that the code I posted worked exactly that way.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Jan Wieck wrote:
Bruce Momjian wrote:

> Now, O_SYNC is going to force every write to the disk.  If we have a
> transaction that has to write lots of buffers (has to write them to
> reuse the shared buffer)
So make the background writer/checkpointer keeping the LRU head clean. I 
explained that 3 times now.
If the background cleaner has to not just write() but write/fsync or
write/O_SYNC, it isn't going to be able to clean them fast enough.  It
creates a bottleneck where we didn't have one before.
We are trying to eliminate an I/O storm during checkpoint, but the
solutions seem to be making the non-checkpoint times slower.
It looks as if you're assuming that I am making the backends unable to 
write on their own, so that they have to wait on the checkpointer. I 
never said that.

If the checkpointer keeps the LRU heads clean, that lifts off write load 
from the backends. Sure, they will be able to dirty pages faster. 
Theoretically, because in practice if you have a reasonably good cache 
hitrate, they will just find already dirty buffers where they just add 
some more dust.

If after all the checkpointer (doing write()+whateversync) is not able 
to keep up with the speed of buffers getting dirtied, the backends will 
have to do some write()'s again, because they will eat up the clean 
buffers at the LRU head and pass the checkpointer.

Also please notice another little change in behaviour. The old code just 
went through the buffer cache sequentially, possibly flushing buffers 
that got dirtied after the checkpoint started, which is way ahead of 
time (they need to be flushed for the next checkpoint, not now). That 
means, that if the same buffer gets dirtied again after that, we wasted 
a full disk write on it. My new code creates a list of dirty blocks at 
the beginning of the checkpoint, and flushes only those that are still 
dirty at the time it gets to them.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] RC2 tag'd and bundled ...

2003-11-10 Thread Marc G. Fournier

Should be on the mirrors now, will announce it this evening ... things
looked to build clean, just would like a second opinion on it ...



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Tom Lane wrote:
Andrew Sullivan <[EMAIL PROTECTED]> writes:
> On Sun, Nov 02, 2003 at 01:00:35PM -0500, Tom Lane wrote:
>> real traction we'd have to go back to the "take over most of RAM for
>> shared buffers" approach, which we already know to have a bunch of
>> severe disadvantages.
> I know there are severe disadvantages in the current implementation,
> but are there in-principle severe disadvantages?
Yes.  For one, since we cannot change the size of shared memory
on-the-fly (at least not portably), there is no opportunity to trade off
memory usage dynamically between processes and disk buffers.  For
another, on many systems shared memory is subject to being swapped out.
Swapping out dirty buffers is a performance killer, because they must be
swapped back in again before they can be written to where they should
have gone.  The only way to avoid this is to keep the number of shared
buffers small enough that they all remain fairly "hot" (recently used)
and so the kernel won't be tempted to swap out any part of the region.
Agreed, we can't resize shared memory, but I don't think most OS's swap
out shared memory, and even if they do, they usually have a kernel
We can't resize shared memory because we allocate the whole thing in one 
big hump - which causes the shmmax problem BTW. If we allocate that in 
chunks of multiple blocks, we only have to give it a total maximum size 
to get the hash tables and other stuff right from the beginning. But the 
vast majority of memory, the buffers themself, can be made adjustable at 
runtime.

Jan

configuration parameter to lock it into kernel memory.  All the old
unixes locked the shared memory into kernel address space and in fact
this is why many of them required a kernel recompile to increase shared
memory.  I hope the ones that have pagable shared memory have a way to
prevent it --- at least FreeBSD does, not sure about Linux.
Now, the disadvantages of large kernel cache, small PostgreSQL buffer
cache is that data has to be transfered to/from the kernel buffers, and
second, we can't control the kernel's cache replacement strategy, and
will probably not be able to in the near future, while we do control our
own buffer cache replacement strategy.
Looking at the advantages/disadvantages, a large shared buffer cache
looks pretty good to me.


--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
What bothers me a little is that you keep telling us that you have all 
that great code from SRA. Do you have any idea when they intend to share 
this with us and contribute the stuff? I mean at least some pieces 
maybe? You personally got all the code from NuSphere AKA PeerDirect even 
weeks before it got released. Did any PostgreSQL developer other than 
you ever look at the SRA code?

Jan

Bruce Momjian wrote:

scott.marlowe wrote:
On Tue, 4 Nov 2003, Tom Lane wrote:

> Jan Wieck <[EMAIL PROTECTED]> writes:
> > What still needs to be addressed is the IO storm cause by checkpoints. I 
> > see it much relaxed when stretching out the BufferSync() over most of 
> > the time until the next one should occur. But the kernel sync at it's 
> > end still pushes the system hard against the wall.
> 
> I have never been happy with the fact that we use sync(2) at all.  Quite
> aside from the "I/O storm" issue, sync() is really an unsafe way to do a
> checkpoint, because there is no way to be certain when it is done.  And
> on top of that, it does too much, because it forces syncing of files
> unrelated to Postgres.
> 
> I would like to see us go over to fsync, or some other technique that
> gives more certainty about when the write has occurred.  There might be
> some scope that way to allow stretching out the I/O, too.
> 
> The main problem with this is knowing which files need to be fsync'd.

Wasn't this a problem that the win32 port had to solve by keeping a list 
of all files that need fsyncing since Windows doesn't do sync() in the 
classical sense?  If so, then could we use that code to keep track of the 
files that need fsyncing?
Yes, I have that code from SRA.  They used threading, so they recorded
all the open files in local memory and opened/fsync/closed them for
checkpoints.  We have to store the file names in a shared area, perhaps
an area of shared memory with an overflow to a disk file.


--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> > I would be interested to know if you have the background write process
> > writing old dirty buffers to kernel buffers continually if the sync()
> > load is diminished.  What this does is to push more dirty buffers into
> > the kernel cache in hopes the OS will write those buffers on its own
> > before the checkpoint does its write/sync work.  This might allow us to
> > reduce sync() load while preventing the need for O_SYNC/fsync().
> 
> I tried that first. Linux 2.4 does not, as long as you don't tell it by 
> reducing the dirty data block aging time with update(8). So you have to 
> force it to utilize the write bandwidth in the meantime. For that you 
> have to call sync() or fsync() on something.
> 
> Maybe O_SYNC is not as bad an option as it seems. In my patch, the 
> checkpointer flushes the buffers in LRU order, meaning it flushes the 
> least recently used ones first. This has the side effect that buffers 
> returned for replacement (on a cache miss, when the backend needs to 
> read the block) are most likely to be flushed/clean. So it reduces the 
> write load of backends and thus the probability that a backend is ever 
> blocked waiting on an O_SYNC'd write().
> 
> I will add some counters and gather some statistics how often the 
> backend in comparision to the checkpointer calls write().

OK, new idea.  How about if you write() the buffers, mark them as clean
and unlock them, then issue fsync().  The advantage here is that we can
allow the buffer to be reused while we wait for the fsync to complete. 
Obviously, O_SYNC is not going to allow that.  Another idea --- if
fsync() is slow because it can't find the dirty buffers, use write() to
write the buffers, copy the buffer to local memory, mark it as clean,
then open the file with O_SYNC and write it again.  Of course, I am just
throwing out ideas here.  The big thing I am concerned about is that
reusing buffers not take too long.

> > Perhaps sync() is bad partly because the checkpoint runs through all the
> > dirty shared buffers and writes them all to the kernel and then issues
> > sync() almost guaranteeing a flood of writes to the disk.  This method
> > would find fewer dirty buffers in the shared buffer cache, and therefore
> > fewer kernel writes needed by sync().
> 
> I don't understand this? How would what method reduce the number of page 
> buffers the backends modify?

What I was saying is that if we only write() just before a checkpoint,
we never give the kernel a chance to write the buffers on its own.  I
figured if we wrote them earlier, the kernel might write them for us and
sync wouldn't need to do it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Now, if we are sure that writes will happen only in the checkpoint
> process, O_SYNC would be OK, I guess, but will we ever be sure of that?

This is a performance issue, not a correctness issue.  It's okay for
backends to wait for writes as long as it happens very infrequently.
The question is whether we can design a background dirty-buffer writer
that works well enough to make it uncommon for backends to have to
write dirty buffers for themselves.  If we can, then doing all the
writes O_SYNC would not be a problem.

(One possibility that could help improve the odds is to allow a certain
amount of slop in the LRU buffer reuse policy --- that is, if you see
the buffer at the tail of the LRU list is dirty, allow one of the next
few buffers to be taken instead, if it's clean.  Or just keep separate
lists for dirty and clean buffers.)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Bruce Momjian
Jan Wieck wrote:
> Bruce Momjian wrote:
> 
> > Now, O_SYNC is going to force every write to the disk.  If we have a
> > transaction that has to write lots of buffers (has to write them to
> > reuse the shared buffer)
> 
> So make the background writer/checkpointer keeping the LRU head clean. I 
> explained that 3 times now.

If the background cleaner has to not just write() but write/fsync or
write/O_SYNC, it isn't going to be able to clean them fast enough.  It
creates a bottleneck where we didn't have one before.

We are trying to eliminate an I/O storm during checkpoint, but the
solutions seem to be making the non-checkpoint times slower.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] what could cause this PANIC on enterprise 7.3.4 db?

2003-11-10 Thread Tom Lane
Andriy Tkachuk <[EMAIL PROTECTED]> writes:
> On Fri, 7 Nov 2003, Tom Lane wrote:
>> Andriy Tkachuk <[EMAIL PROTECTED]> writes:
>>> Nov  5 20:22:42 monstr postgres[16071]: [3] PANIC:  open of 
>>> /usr/local/pgsql/data/pg_clog/0040 failed: No such file or directory
>> 
>> Could we see ls -l /usr/local/pgsql/data/pg_clog/

> [10:49]/2:[EMAIL PROTECTED]:~>sudo ls -al /usr/local/pgsql/data/pg_clog
> total 40
> drwx--2 pgsqlpostgres 4096 Nov  7 03:28 .
> drwx--6 pgsqlroot 4096 Oct 23 10:45 ..
> -rw---1 pgsqlpostgres32768 Nov 10 10:47 000D

Okay, given that the file the code was trying to access is nowhere near
the current or past set of valid transaction numbers, it's pretty clear
that what you have is a corrupted transaction number in some tuple's
header.  The odds are that not only the transaction number is affected;
usually when we see something like this, anywhere from dozens to
hundreds of bytes have been replaced by garbage data.

In the cases I've been able to study in the past, the cause seemed to
be faulty hardware or possibly kernel bugs --- for instance someone
recently reported a case where a whole kilobyte of a Postgres file had
been replaced with what seemed to be part of a mail message.  I'd
ascribe that to either a disk drive writing a sector at the wrong place,
or the kernel getting confused about which buffer held which file.
So I'd recommend running some hardware diagnostics and checking to see
if there are errata available for your kernel.

As far as cleaning up the immediate damage is concerned, you'll probably
want to use pg_filedump or some such tool to get a better feeling for
the extent of the damage.  There are descriptions of this process in the
archives --- try looking for recent references to pg_filedump.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:

Now, O_SYNC is going to force every write to the disk.  If we have a
transaction that has to write lots of buffers (has to write them to
reuse the shared buffer)
So make the background writer/checkpointer keeping the LRU head clean. I 
explained that 3 times now.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-10 Thread Jan Wieck
Bruce Momjian wrote:
I would be interested to know if you have the background write process
writing old dirty buffers to kernel buffers continually if the sync()
load is diminished.  What this does is to push more dirty buffers into
the kernel cache in hopes the OS will write those buffers on its own
before the checkpoint does its write/sync work.  This might allow us to
reduce sync() load while preventing the need for O_SYNC/fsync().
I tried that first. Linux 2.4 does not, as long as you don't tell it by 
reducing the dirty data block aging time with update(8). So you have to 
force it to utilize the write bandwidth in the meantime. For that you 
have to call sync() or fsync() on something.

Maybe O_SYNC is not as bad an option as it seems. In my patch, the 
checkpointer flushes the buffers in LRU order, meaning it flushes the 
least recently used ones first. This has the side effect that buffers 
returned for replacement (on a cache miss, when the backend needs to 
read the block) are most likely to be flushed/clean. So it reduces the 
write load of backends and thus the probability that a backend is ever 
blocked waiting on an O_SYNC'd write().

I will add some counters and gather some statistics how often the 
backend in comparision to the checkpointer calls write().

Perhaps sync() is bad partly because the checkpoint runs through all the
dirty shared buffers and writes them all to the kernel and then issues
sync() almost guaranteeing a flood of writes to the disk.  This method
would find fewer dirty buffers in the shared buffer cache, and therefore
fewer kernel writes needed by sync().
I don't understand this? How would what method reduce the number of page 
buffers the backends modify?

Jan

---

Jan Wieck wrote:
Tom Lane wrote:

> Jan Wieck <[EMAIL PROTECTED]> writes:
> 
>> How I can see the background writer operating is that he's keeping the 
>> buffers in the order of the LRU chain(s) clean, because those are the 
>> buffers that most likely get replaced soon. In my experimental ARC code 
>> it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
>> n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
>> been involved in that, nap depending on where he got down the queues (to 
>> increase the write rate when running low on clean buffers), and do it 
>> all over again.
> 
> You probably need one more knob here: how often to issue the fsyncs.
> I'm not convinced "once per outer loop" is a sufficient answer.
> Otherwise this is sounding pretty good.

This is definitely heading into the right direction.

I currently have a crude and ugly hacked system, that does checkpoints 
every minute but streches them out over the whole time. It writes out 
the dirty buffers in T1+T2 LRU order intermixed, streches out the flush 
over the whole checkpoint interval and does sync()+usleep() every 32 
blocks (if it has time to do this).

This is clearly the wrong way to implement it, but ...

The same system has ARC and delayed vacuum. With normal, unmodified 
checkpoints every 300 seconds, the transaction responsetime for 
new_order still peaks at over 30 seconds (5 is already too much) so the 
system basically come to a freeze during a checkpoint.

Now with this high-frequent sync()ing and checkpointing by the minute, 
the entire system load levels out really nice. Basically it's constantly 
checkpointing. So maybe the thing we're looking for is to make the 
checkpoint process the background buffer writer process and let it 
checkpoint 'round the clock. Of course, with a bit more selectivity on 
what to fsync and not doing system wide sync() every 10-500 milliseconds :-)

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
   http://www.postgresql.org/docs/faqs/FAQ.html




--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] [ADMIN] [HACKERS] retrieve statement from catalogs

2003-11-10 Thread Jaime Casanova
you could put a view on every table that called a function?
Maybe, but how can i retrieve the select statement

_
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] what could cause this PANIC on enterprise 7.3.4 db?

2003-11-10 Thread Andriy Tkachuk
shure, Tom

[10:49]/2:[EMAIL PROTECTED]:~>sudo ls -al /usr/local/pgsql/data/pg_clog
total 40
drwx--2 pgsqlpostgres 4096 Nov  7 03:28 .
drwx--6 pgsqlroot 4096 Oct 23 10:45 ..
-rw---1 pgsqlpostgres32768 Nov 10 10:47 000D

[10:49]/2:[EMAIL PROTECTED]:~>date
Mon Nov 10 10:49:50 EET 2003

[10:49]/2:[EMAIL PROTECTED]:~>uname -sr
Linux 2.4.18-5custom


On Fri, 7 Nov 2003, Tom Lane wrote:

> Andriy Tkachuk <[EMAIL PROTECTED]> writes:
> > Nov  5 20:22:42 monstr postgres[16071]: [3] PANIC:  open of 
> > /usr/local/pgsql/data/pg_clog/0040 failed: No such file or directory
>
> Could we see ls -l /usr/local/pgsql/data/pg_clog/
>
>   regards, tom lane
>

regards,
 Andriy Tkachuk
--
http://www.imt.com.ua

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster