date:20010315

Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Xu Yifeng <[EMAIL PROTECTED]> [010315 22:25] wrote:
> Hello Tom,
> 
> Friday, March 16, 2001, 6:54:22 AM, you wrote:
> 
> TL> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> >> How many files need to be fsync'd?
> 
> TL> Only one.
> 
> >> If it's more than one, what might work is using mmap() to map the
> >> files in adjacent areas, then calling msync() on the entire range,
> >> this would allow you to batch fsync the data.
> 
> TL> Interesting thought, but mmap to a prespecified address is most
> TL> definitely not portable, whether or not you want to assume that
> TL> plain mmap is ...
> 
> TL> regards, tom lane
> 
> Could anyone consider fork a syncer process to sync data to disk ?
> build a shared sync queue, when a daemon process want to do sync after
> write() is called, just put a sync request to the queue. this can release
> process from blocked on writing as soon as possible. multipile sync
> request for one file can be merged when the request is been inserting to
> the queue.

I suggested this about a year ago. :)

The problem is that you need that process to potentially open and close
many files over and over.

I still think it's somewhat of a good idea.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Xu Yifeng

Hello Tom,

Friday, March 16, 2001, 6:54:22 AM, you wrote:

TL> Alfred Perlstein <[EMAIL PROTECTED]> writes:
>> How many files need to be fsync'd?

TL> Only one.

>> If it's more than one, what might work is using mmap() to map the
>> files in adjacent areas, then calling msync() on the entire range,
>> this would allow you to batch fsync the data.

TL> Interesting thought, but mmap to a prespecified address is most
TL> definitely not portable, whether or not you want to assume that
TL> plain mmap is ...

TL> regards, tom lane

Could anyone consider fork a syncer process to sync data to disk ?
build a shared sync queue, when a daemon process want to do sync after
write() is called, just put a sync request to the queue. this can release
process from blocked on writing as soon as possible. multipile sync
request for one file can be merged when the request is been inserting to
the queue.

-- 
Regards,
Xu Yifeng

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane


Bruce Momjian <[EMAIL PROTECTED]> writes:
> It is hard for me to imagine O_* being slower than fsync(),

Not hard at all --- if we're writing multiple xlog blocks per
transaction, then O_* constrains the sequence of operations more
than we really want.  Changing xlog.c to combine writes as much
as possible would reduce this problem, but not eliminate it.

Besides, the entire object of this exercise is to work around
an unexpected inefficiency in some kernels' implementations of
fsync/fdatasync (viz, scanning over lots of not-dirty buffers).
Who's to say that there might not be inefficiencies in other
platforms' implementations of the O_* options?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > My question was what are we needing to test?  If we can do only single writes
> > to the log, don't we prefer O_* to fsync, and the O_D* options over
> > plain O_*?  Am I confused?
> 
> I don't think we have enough data to conclude that with any certainty.

I just figured we knew the answers to above issues, that that the only
issue was multiple writes vs. fsync().

It is hard for me to imagine O_* being slower than fsync(), or fdatasync
being slower than fsync.  Are we not able to assume that?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane


Bruce Momjian <[EMAIL PROTECTED]> writes:
> My question was what are we needing to test?  If we can do only single writes
> to the log, don't we prefer O_* to fsync, and the O_D* options over
> plain O_*?  Am I confused?

I don't think we have enough data to conclude that with any certainty.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] PostgreSQL Search Engine - searchraw.php3 (1/1)

2001-03-15 Thread ryan


begin 644 searchraw.php3
M/#]P:'`*"2\O'!L;V1E*"<@)RP@)&-L96%N7W%U97)Y*3L*"2\O82!L:7-T(&]F('-T
M;W`@=V]R9',@8F5C;VUE2`](&%R2D["@DO+W)E2!I
M;F1E>`H)"7-O7=O&5C
M*"1C;VYN+"`D7=O7=O7=O7=O&5C*"1C;VYN+"`Dhttp://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] PostgreSQL Search Engine - searchraw.php3 (0/1)

2001-03-15 Thread ryan


I have written a simple search engine that utilizes pgsql.  I would
like to make it a stored procedure, although I am still learning
plpgsql.  Attached to this message is the basic search logic (in PHP).
This script will be in production on a major e-commerce site in about
a week.  I think it would be much faster as a stored procedure.

This implemetation searches product data, but it could be used for
anything.

The script does a few things, gets the id's of valid words, finds
which products these words are mapped to, finds which products have
the most keyword mappings out of the result set, and outputs a result
set joined to the product table and ordered by products with the most
keyword hits.

It accomplishes this by keeping an indexed list of words, maintaing a
mapping table to products.  Then when someone does a search, a
temporary table called 'hits' is created which stores the product_id
that was matched to a word.  Then an additional temporary table is
created which consists of the product id and hit count from the
search.  The search then retrns the product details ordered by the
product that had the most hits.

If you are interested in seeing it in action, I can send you a url, if
you'd like to implement it I can help you out, and if you can help me
covert it to a stored procedure I'd be very apreciative!

It uses five tables:

CREATE TABLE "pa_search_keyword" (
   "keyword_id" int4 DEFAULT
nextval('"pa_search_keywor_keyword_id_seq"'::text) NOT NULL,
   "keyword_value" varchar(30),
   CONSTRAINT "pa_search_keyword_pkey" PRIMARY KEY ("keyword_id")
);

 and

CREATE TABLE "pa_search_map" (
   "keyword_id" int4 NOT NULL,
   "product_id" int4 NOT NULL,
   CONSTRAINT "pa_search_map_pkey" PRIMARY KEY ("keyword_id",
"product_id")
);

two temporary tables:

CREATE TEMPORARY TABLE hits 
(product_id integer not null);

and

CREATE TEMPORARY TABLE prod_hit_count (
product_id integer not null, 
hit_count smallint not null
);

the fifth table would be the table you are joining to to get the
product data, or details, or whatever.

-Ryan Mahoney

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > OK, but the point of adding all those configuration options was to allow
> > us to figure out which was faster.  If you can do the code so we no
> > longer need to know the answer of which is best, why bother adding the
> > config options.
> 
> How in the world did you arrive at that idea?  I don't see anyone around
> here but you claiming that we don't need any experimentation ...

I am trying to understand what testing we need to do.   I know we need
configure tests to check to see what exists in the OS.

My question was what are we needing to test?  If we can do only single writes
to the log, don't we prefer O_* to fsync, and the O_D* options over
plain O_*?  Am I confused?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > I was wondering if the multiple writes performed to the XLOG could be
> > grouped into one write().
> 
> That would require fairly major restructuring of xlog.c, which I don't
> want to undertake at this point in the cycle (we're trying to push out
> a release candidate, remember?).  I'm not convinced it would be a huge
> win anyway.  It would be a win if your average transaction writes
> multiple blocks' worth of XLOG ... but if your average transaction
> writes less than a block then it won't help.
> 
> I think it probably is a good idea to restructure xlog.c so that it can
> write more than one page at a time --- but it's not such a great idea
> that I want to hold up the release any more for it.

OK, but the point of adding all those configuration options was to allow
us to figure out which was faster.  If you can do the code so we no
longer need to know the answer of which is best, why bother adding the
config options.  Just ship our best guess and fix it when we can.  Does
that make sense?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane


Bruce Momjian <[EMAIL PROTECTED]> writes:
> OK, but the point of adding all those configuration options was to allow
> us to figure out which was faster.  If you can do the code so we no
> longer need to know the answer of which is best, why bother adding the
> config options.

How in the world did you arrive at that idea?  I don't see anyone around
here but you claiming that we don't need any experimentation ...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] FATAL 2: XLogFlush: request is not satisfied

2001-03-15 Thread Tom Lane

Christopher Sawtell <[EMAIL PROTECTED]> writes:
>   I'm getting errors as a result of making queries. The attached log from 
> the last two queries gives an idea as to what is happening.

Hmmm ... you were the one who did the pg_resetxlog bit today, right?
I have a feeling I missed something in that.  Back to the drawing
board...

>   Does this mean that I should do an initdb?

Afraid so.  Sorry about that.  You should be able to do a clean dump at
least.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> I was wondering if the multiple writes performed to the XLOG could be
> grouped into one write().

That would require fairly major restructuring of xlog.c, which I don't
want to undertake at this point in the cycle (we're trying to push out
a release candidate, remember?).  I'm not convinced it would be a huge
win anyway.  It would be a win if your average transaction writes
multiple blocks' worth of XLOG ... but if your average transaction
writes less than a block then it won't help.

I think it probably is a good idea to restructure xlog.c so that it can
write more than one page at a time --- but it's not such a great idea
that I want to hold up the release any more for it.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


I was wondering if the multiple writes performed to the XLOG could be
grouped into one write().  Seems everyone agrees:

fdatasync/O_DSYNC is better then plain fsync/O_SYNC

and the O_* flags are better than fsync() if we are doing only one write
before every fsync.  It seems the only open question is now often we do
multiple writes before fsync, and if that is ever faster than putting
the O_* on the file for all writes.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[HACKERS] FATAL 2: XLogFlush: request is not satisfied

2001-03-15 Thread Christopher Sawtell


Greetings,

  I'm getting errors as a result of making queries. The attached log from 
the last two queries gives an idea as to what is happening.

  I suspect I must have done something bad when I upgraded via cvsup.

  Does this mean that I should do an initdb?

-- 
Sincerely etc.,

 NAME   Christopher Sawtell
 CELL PHONE 021 257 4451
 ICQ UIN45863470
 EMAIL  csawtell @ xtra . co . nz
 CNOTES ftp://ftp.funet.fi/pub/languages/C/tutorials/sawtell_C.tar.gz

 -->> Please refrain from using HTML or WORD attachments in e-mails to me 
<<--



chris=# select * from pupil where number=127;
-[ RECORD 1 ]-+---
number| 127
first_name| Savannah
last_name | Palme
date_of_birth | 1991-10-17
mother_name   | Lesley
mother_phone  | 384-5151
father_name   | Philip
father_phone  | 338-1857
home_phone| 366-3588
doctor_name   | Carl Denny
doctor_phone  | 377-6014
class_number  |
lessons   | {0,0}

chris=# select first_name, last_name from pupil;
FATAL 2:  XLogFlush: request is not satisfied
FATAL 2:  XLogFlush: request is not satisfied
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Server process (pid 10700) 
exited with status 512 at Fri Mar 16 15:38:49 2001
Terminating any active server processes...
Server processes were terminated at Fri Mar 16 15:38:49 2001
Reinitializing shared memory and semaphores
The Data Base System is starting up
Failed.
!# DEBUG:  database system was interrupted at 2001-03-16 15:10:46 NZDT
DEBUG:  CheckPoint record at (0, 108)
DEBUG:  Redo record at (0, 108); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 117853; NextOid: 201268
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (0, 172)
DEBUG:  ReadRecord: record with zero len at (0, 376)
DEBUG:  redo done at (0, 340)
FATAL 2:  XLogFlush: request is not satisfied
./bin/postmaster: Startup proc 10797 exited with status 512 - abort

!#
!#
!# \q
15:39:44 postgres@berty:~ $ ./bin/pg_ctl -o -i start
postmaster successfully started
15:40:18 postgres@berty:~ $ DEBUG:  database system shutdown was interrupted at 
2001-03-16 15:38:49 NZDT
DEBUG:  CheckPoint record at (0, 108)
DEBUG:  Redo record at (0, 108); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 117853; NextOid: 201268
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (0, 172)
DEBUG:  ReadRecord: record with zero len at (0, 376)
DEBUG:  redo done at (0, 340)
FATAL 2:  XLogFlush: request is not satisfied
./bin/postmaster: Startup proc 10808 exited with status 512 - abort

15:40:20 postgres@berty:~ $


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Beta6 for Tomorrow?

2001-03-15 Thread The Hermit Hacker


On Thu, 15 Mar 2001, Tom Lane wrote:

> The Hermit Hacker <[EMAIL PROTECTED]> writes:
> > Is there anything *major* left, other then the fsync issue, that needs to
> > be resolved?
>
> Don't believe so.
>
> I'm testing xlog fsync revisions now, should be ready to commit in an
> hour or so.  (I'm just curious to see what it does to the pgbench
> results...)

Okay, I'll wrap up beta6 tomorrow, give a weekend for ppl to test, and
*finally* roll out RC1 if nobody has anything major that creeps up ...:)


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Beta6 for Tomorrow?

2001-03-15 Thread Tom Lane

The Hermit Hacker <[EMAIL PROTECTED]> writes:
> Is there anything *major* left, other then the fsync issue, that needs to
> be resolved?

Don't believe so.

I'm testing xlog fsync revisions now, should be ready to commit in an
hour or so.  (I'm just curious to see what it does to the pgbench
results...)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] Beta6 for Tomorrow?

2001-03-15 Thread The Hermit Hacker



I know there are still discussions going on concerning the whole fsync
issue, but, from what I've been following, its purely a performance issue
then anything ...

Now that Tom's patch is in place for the XLOG stuff, I'd like to put out a
Beta6 tomorrow for testing, with an RC1 schedualed for next week ...

Is there anything *major* left, other then the fsync issue, that needs to
be resolved?

Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Philip Warner


At 16:55 15/03/01 -0800, Alfred Perlstein wrote:
>* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
>> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
>> >
>> >Lost data is probably better than incorrect data.  Either use locks
>> >or a copying mechanism.  People will depend on the data returned
>> >making sense.
>> >
>> 
>> But with per-backend data, there is only ever *one* writer to a given set
>> of counters. Everyone else is a reader.
>
>This doesn't prevent a reader from getting an inconsistant view.
>
>Think about a 64bit counter on a 32bit machine.  If you charged per
>megabyte, wouldn't it upset you to have a small chance of loosing
>4 billion units of sale?
>
>(ie, doing a read after an addition that wraps the low 32 bits
>but before the carry is done to the top most signifigant 32bits?)

I assume this means we can not rely on the existence of any kind of
interlocked add on 64 bit machines?


>Ok, what what if everything can be read atomically by itself?
>
>You're still busted the minute you need to export any sort of
>compound stat.

Which is why the backends should not do anything other than maintain the
raw data. If there is atomic data than can cause inconsistency, then a
dropped UDP packet will do the same.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein


* Philip Warner <[EMAIL PROTECTED]> [010315 17:08] wrote:
> At 16:55 15/03/01 -0800, Alfred Perlstein wrote:
> >* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
> >> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
> >> >
> >> >Lost data is probably better than incorrect data.  Either use locks
> >> >or a copying mechanism.  People will depend on the data returned
> >> >making sense.
> >> >
> >> 
> >> But with per-backend data, there is only ever *one* writer to a given set
> >> of counters. Everyone else is a reader.
> >
> >This doesn't prevent a reader from getting an inconsistant view.
> >
> >Think about a 64bit counter on a 32bit machine.  If you charged per
> >megabyte, wouldn't it upset you to have a small chance of loosing
> >4 billion units of sale?
> >
> >(ie, doing a read after an addition that wraps the low 32 bits
> >but before the carry is done to the top most signifigant 32bits?)
> 
> I assume this means we can not rely on the existence of any kind of
> interlocked add on 64 bit machines?
> 
> 
> >Ok, what what if everything can be read atomically by itself?
> >
> >You're still busted the minute you need to export any sort of
> >compound stat.
> 
> Which is why the backends should not do anything other than maintain the
> raw data. If there is atomic data than can cause inconsistency, then a
> dropped UDP packet will do the same.

The UDP packet (a COPY) can contain a consistant snapshot of the data.
If you have dependancies, you fit a consistant snapshot into a single
packet.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Testing structure (was) Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Justin Clift

Is someone able to put together a testing-type script or sequence so
people can run this on the various platforms and then report the
results?

For example, I can setup benchmarking, (or automated testing) on various
Solaris platforms to run overnight and report the results in the
morning.  I suspect that quite a few people can do similar.

Would this be a good thing for someone to spend some time and effort on,
in generating testing-type scripts/structures?  It might be a useful
tool to use in the future when making performance/related decisions like
this.

Regards and best wishes,

Justin Clift

Tom Lane wrote:
> 
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > I later read Vadim's comment that fsync() of two blocks may be faster
> > than two O_* writes, so I am now confused about the proper solution.
> > However, I think we need to pick one and make it invisible to the user.
> > Perhaps a compiler/config.h flag for testing would be a good solution.
> 
> I believe that we don't know enough yet to nail down a hard-wired
> decision.  Vadim's idea of preferring O_DSYNC if it appears to be
> different from O_SYNC is a good first cut, but I think we'd better make
> it possible to override that, at least for testing purposes.
> 
> So I think it should be configurable at *some* level.  I don't much care
> whether it's a config.h entry or a GUC variable.
> 
> But consider this: we'll be more likely to get some feedback from the
> field (allowing us to refine the policy in future releases) if it is a
> GUC variable.  Not many people will build two versions of the software,
> but people might take the trouble to play with a run-time configuration
> setting.
> 
> regards, tom lane
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> For example, Tom had a nice fsync test program.  Why can't we run that
> on various platforms and collect the results, then make a decision on
> the best default.

Mainly because (a) there's not enough time before release, and (b) that
test program was far too stupid to give trustworthy results anyway.
(It was assuming exactly one commit per XLOG block, for example.)

> Trying to test the affects of fsync() with a database wrapped around it
> really makes for difficult measurement anyway.

Exactly.  What I'm doing now is providing some infrastructure with which
we can hope to see some realistic tests.  For example, I'm gonna be
leaning on Great Bridge's lab guys to rerun their TPC tests with a bunch
of combinations, just as soon as the dust settles.  But I'm not planning
to put my faith in only that one benchmark.

I'm all for improving the intelligence of the defaults once we know
enough to pick better defaults.  But we don't yet, and there's no way
that we *will* know enough until after we've shipped a release that has
these tuning knobs and gotten some real-world results from the field.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

> Bruce Momjian wrote:
> > 
> 
> > No one will ever do the proper timing tests to know which is better except us.
> 
> Hi Bruce,
> 
> I believe in the future that anyone doing serious benchmark tests before
> large-scale implementation will indeed be testing things like this. 
> There will also be people/companies out there who will specialize in
> "tuning" PostgreSQL systems and they will definitely test stuff like
> this... different variations, different database structures, different
> OS's, etc.

But I don't want to go the Informix/Oracle way where we have so many
tuning options that no one understands them all.  I would like us to
find the best options and only give users choices when there is a real
tradeoff.

For example, Tom had a nice fsync test program.  Why can't we run that
on various platforms and collect the results, then make a decision on
the best default.

Trying to test the affects of fsync() with a database wrapped around it
really makes for difficult measurement anyway.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
> >
> >Lost data is probably better than incorrect data.  Either use locks
> >or a copying mechanism.  People will depend on the data returned
> >making sense.
> >
> 
> But with per-backend data, there is only ever *one* writer to a given set
> of counters. Everyone else is a reader.

This doesn't prevent a reader from getting an inconsistant view.

Think about a 64bit counter on a 32bit machine.  If you charged per
megabyte, wouldn't it upset you to have a small chance of loosing
4 billion units of sale?

(ie, doing a read after an addition that wraps the low 32 bits
but before the carry is done to the top most signifigant 32bits?)

Ok, what what if everything can be read atomically by itself?

You're still busted the minute you need to export any sort of
compound stat.

If A, B and C need to add up to 100 you have a read race.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Justin Clift

Bruce Momjian wrote:
> 

> No one will ever do the proper timing tests to know which is better except us.

Hi Bruce,

I believe in the future that anyone doing serious benchmark tests before
large-scale implementation will indeed be testing things like this. 
There will also be people/companies out there who will specialise in
"tuning" PostgreSQL systems and they will definitely test stuff like
this... different variations, different database structures, different
OS's, etc.

Regards and best wishes,

Justin Clift

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Philip Warner


At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
>
>Lost data is probably better than incorrect data.  Either use locks
>or a copying mechanism.  People will depend on the data returned
>making sense.
>

But with per-backend data, there is only ever *one* writer to a given set
of counters. Everyone else is a reader.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Philip Warner

At 06:57 15/03/01 -0500, Jan Wieck wrote:
>
>And  shared  memory has all the interlocking problems we want
>to avoid.

I suspect that if we keep per-backend data in a separate area, then we
don;t need locking since there is only one writer. It does not matter if a
reader gets an inconsistent view, the same as if you drop a few UDP packets.

>What about a collector deamon, fired up by the postmaster and
>receiving UDP packets from the backends. 

This does sound appealing; it means that individual backend data (IO etc)
will survive past the termination of the backend. I'd like to see the stats
survive the death of the collector if possible, possibly even survive a
stop/start of the postmaster.

>Now whatever the backend has to tell the collector, it simply
>throws  a UDP packet into his direction. If the collector can
>catch it or not, not the backends problem.

If we get the backends to keep the stats they are sending in local counters
as well, then they can send the counter value (not delta) each time, which
would mean that the collector would not 'miss' anything - just it's
operations/sec might see a hiccough. This could have a sidebenefit that(if
wewanted to?) we could allow a client to query their own counters to get an
idea of the costs of their queries.

When we need to reset the counters that should be done explicitly, I think.

Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein


* Philip Warner <[EMAIL PROTECTED]> [010315 16:14] wrote:
> At 06:57 15/03/01 -0500, Jan Wieck wrote:
> >
> >And  shared  memory has all the interlocking problems we want
> >to avoid.
> 
> I suspect that if we keep per-backend data in a separate area, then we
> don;t need locking since there is only one writer. It does not matter if a
> reader gets an inconsistent view, the same as if you drop a few UDP packets.

No, this is completely different.

Lost data is probably better than incorrect data.  Either use locks
or a copying mechanism.  People will depend on the data returned
making sense.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] Re: Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Thomas Lockhart


> Well, that's exactly *why* we need an overridable default.  Or would you
> like to try to do some performance measurements in configure?

At this point I'm more comfortable with a compile-time option
(determined statically or in a configure compilation test, not a
performance test), rather than a GUC variable. But imho 7.1 will be nice
with either choice, and if you think that a variable will make it easier
for developers to do tuning from a distance (as opposed to having it
just confuse new users) then... ;)

- Thomas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 14:54] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > How many files need to be fsync'd?
> 
> Only one.
> 
> > If it's more than one, what might work is using mmap() to map the
> > files in adjacent areas, then calling msync() on the entire range,
> > this would allow you to batch fsync the data.
> 
> Interesting thought, but mmap to a prespecified address is most
> definitely not portable, whether or not you want to assume that
> plain mmap is ...

Yeah... :(

Evil thought though (for reference):

mmap(anon memory) returns addr1
addr2 = addr1 + maplen
split addr1<->addr2 on points A B and C
mmap(file1 over addr1 to A)
mmap(file2 over A to B)
mmap(file3 over B to C)
mmap(file4 over C to addr2)

It _should_ work, but there's probably some corner cases where it
doesn't.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Alfred Perlstein <[EMAIL PROTECTED]> writes:
> How many files need to be fsync'd?

Only one.

> If it's more than one, what might work is using mmap() to map the
> files in adjacent areas, then calling msync() on the entire range,
> this would allow you to batch fsync the data.

Interesting thought, but mmap to a prespecified address is most
definitely not portable, whether or not you want to assume that
plain mmap is ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> switch(lower(string[0]) + lower(string[5]))
> {
>   case 'f':   /* fsync */
>   case 'f' + 's': /* fdatasync */
>   case 'o' + 's': /* open_sync */
>   case 'o' + 'd': /* open_datasync */
> }

> Although ugly, it should serve as a readable solution for now.

Ugly is the word ...

>> Do you object if I add an "assign_hook" to guc.c that's called when an
>> actual assignment is made?

> Something like this is on my wish list, but I'm not sure if it's wise to
> start this now.

I'm not particularly concerned about changing the interface later if
that proves necessary.  We're not likely to have so many of the things
that an API change is burdensome, and they will all be strictly backend
internal.

What I have in mind for now is just

void (*assign_hook) (const char *newval);

(obviously this is for string variables only, for now) called just
before actually changing the variable value.  This lets the hook see
the old value if it needs to.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Mikheev, Vadim <[EMAIL PROTECTED]> [010315 13:52] wrote:
> > I believe that we don't know enough yet to nail down a hard-wired
> > decision.  Vadim's idea of preferring O_DSYNC if it appears to be
> > different from O_SYNC is a good first cut, but I think we'd 
> > better make it possible to override that, at least for testing purposes.
> 
> So let's leave fsync as default and add option to open log files
> with O_DSYNC/O_SYNC.

I have a weird and untested suggestion:

How many files need to be fsync'd?

If it's more than one, what might work is using mmap() to map the
files in adjacent areas, then calling msync() on the entire range,
this would allow you to batch fsync the data.

The only problem is that I'm not sure:

1) how portable msync() is.
2) if msync garauntees metadata consistancy.

Another benifit of mmap() is the 'zero' copy nature of it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

> wal_sync_method = fsync | fdatasync | open_sync | open_datasync

> A small problem is that I don't want to be doing multiple strcasecmp's
> to figure out what to do in xlog.c.

This should be efficient:

switch(lower(string[0]) + lower(string[5]))
{
case 'f':   /* fsync */
case 'f' + 's': /* fdatasync */
case 'o' + 's': /* open_sync */
case 'o' + 'd': /* open_datasync */
}

Although ugly, it should serve as a readable solution for now.

> Do you object if I add an "assign_hook" to guc.c that's called when an
> actual assignment is made?

Something like this is on my wish list, but I'm not sure if it's wise to
start this now.  There are a few issues that need some thought, like how
to make the interface for non-string options, and how to keep it in sync
with the parse hook of string options, ...

> That would provide a place to set up the flag variables that xlog.c
> would actually look at.  Furthermore, having an assign_hook would let
> us support changing this value at SIGHUP, not only at postmaster
> start. (The assign hook would just need to fsync whatever WAL file is
> currently open and possibly close/reopen the file, to ensure that no
> blocks miss getting synced when we change conventions.)

... and possibly here you need to pass the context to the assign hook as
well.  This application strikes me as a bit too esoteric for a first try.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Contribute to the PL/pgSQL CookBook !!

2001-03-15 Thread Roberto Mello



I have started the "PL/pgSQL CookBook" project. The goal is to
create a cookbook of PL/pgSQL functions that will be catalogued and made
available for others to use and learn from.
Come to http://www.brasileiro.net/postgres and contribute your own 
PL/pgSQL (or PL/Tcl, PL/Perl) function or trigger! This will help many
Postgres users, both novice and experienced, to use its procedural
languages.
The CookBook has several sections, and you can add your own. No login
is required, just come and contribute.

Once again http://www.brasileiro.net/postgres 

Oh, did I mention that you get your own "PostgreSQL Powered" button
when you contribute a function/trigger? :)

-Roberto

-- 
+| http://fslc.usu.edu USU Free Software & GNU/Linux Club|--+
  Roberto Mello - Computer Science, USU - http://www.brasileiro.net 
  http://www.sdl.usu.edu - Space Dynamics Lab, Web Developer
Tetris tagline: @@  o@o    @oo  oo@  

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Jan Wieck

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > What about a collector deamon, fired up by the postmaster and
> > receiving UDP packets from the backends. Under heavy load, it
> > might miss some statistic messages, well, but that's  not  as
> > bad as having locks causing backends to loose performance.
>
> Interesting thought, but we don't want UDP I think; that just opens
> up a whole can of worms about checking access permissions and so forth.
> Why not a simple pipe?  The postmaster creates the pipe and the
> collector daemon inherits one end, while all the backends inherit the
> other end.

I don't think so - though I haven't tested the following yet,
but AFAIR it's correct.

Have the postmaster creating two UDP sockets before it  forks
off the collector. It can examine the peer addresses of both,
so they don't need well known port numbers,  it  can  be  the
randomly  ones  assigned  by  the kernel. Thus, we don't need
SO_REUSE on them either.

Now, since the collector is forked off by the postmaster,  it
knows  the  peer  address  of the other socket. And since all
backends get forked off from the postmaster as well,  they'll
all  use  the  same  peer  address,  don't  they?  So all the
collector has to look at is the sender address including port
number  of  the  packets.  It needs to be what the postmaster
examined, anything else is from someone else and goes to  bit
heaven.  The  same  way the backends know where to send their
statistics.

If I'm right that in the case of fork()  all  children  share
the  same  socket  with the same peer address, then it's even
safe in the case the collector dies. The postmaster can still
hold the collectors socket and will notice that the collector
died (due to a wait() returning it's PID)  and  can  fire  up
another one. Again some packets got lost (plus all the so far
collected statistics, hmmm - aint that a cool  way  to  reset
statistic  counters - killing the collector?), but it did not
disturb any live backend in any way. They will never get  any
signal,  don't  care  about what's done with their statistics
and such. They just do their work...

Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Jan Wieck

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > What about a collector deamon, fired up by the postmaster and
> > receiving UDP packets from the backends. Under heavy load, it
> > might miss some statistic messages, well, but that's  not  as
> > bad as having locks causing backends to loose performance.
>
> Interesting thought, but we don't want UDP I think; that just opens
> up a whole can of worms about checking access permissions and so forth.
> Why not a simple pipe?  The postmaster creates the pipe and the
> collector daemon inherits one end, while all the backends inherit the
> other end.

I don't think so - though I haven't tested the following yet,
but AFAIR it's correct.

Have the postmaster creating two UDP sockets before it  forks
off the collector. It can examine the peer addresses of both,
so they don't need well known port numbers,  it  can  be  the
randomly  ones  assigned  by  the kernel. Thus, we don't need
SO_REUSE on them either.

Now, since the collector is forked off by the postmaster,  it
knows  the  peer  address  of the other socket. And since all
backends get forked off from the postmaster as well,  they'll
all  use  the  same  peer  address,  don't  they?  So all the
collector has to look at is the sender address including port
number  of  the  packets.  It needs to be what the postmaster
examined, anything else is from someone else and goes to  bit
heaven.  The  same  way the backends know where to send their
statistics.

If I'm right that in the case of fork()  all  children  share
the  same  socket  with the same peer address, then it's even
safe in the case the collector dies. The postmaster can still
hold the collectors socket and will notice that the collector
died (due to a wait() returning it's PID)  and  can  fire  up
another one. Again some packets got lost (plus all the so far
collected statistics, hmmm - aint that a cool  way  to  reset
statistic  counters - killing the collector?), but it did not
disturb any live backend in any way. They will never get  any
signal,  don't  care  about what's done with their statistics
and such. They just do their work...

Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> We postulate that one of those has to exist.  Alternatively, you make the
> option read
> wal_sync_method = fsync | open_sync
> In the "parse_hook" for the parameter you if #ifdef out 'open_sync' as a
> valid option if none of those exist, so a user will get "'open_sync' is
> not a valid option value".

I like this a lot.  In fact, I am mightily tempted to make it

wal_sync_method = fsync | fdatasync | open_sync | open_datasync

where fdatasync would only be valid if configure found fdatasync() and
open_datasync would only be valid if we found O_DSYNC exists and isn't
O_SYNC.  This would let people try all the available methods under
realistic test conditions, for hardly any extra work.

Furthermore, the documentation could say something like "The default is
the first available method in the order open_datasync, fdatasync, fsync,
open_sync" (assuming that Vadim's preferences are right).

A small problem is that I don't want to be doing multiple strcasecmp's
to figure out what to do in xlog.c.  Do you object if I add an
"assign_hook" to guc.c that's called when an actual assignment is made?
That would provide a place to set up the flag variables that xlog.c
would actually look at.  Furthermore, having an assign_hook would let us
support changing this value at SIGHUP, not only at postmaster start.
(The assign hook would just need to fsync whatever WAL file is currently
open and possibly close/reopen the file, to ensure that no blocks miss
getting synced when we change conventions.)

Creeping featurism strikes again ;-) ... but this feels right ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> I haven't followed the jungle of numbers too closely.
> Is it not the case that WAL + fsync is still faster than 7.0 + fsync and
> WAL/no fsync is still faster than 7.0/no fsync?

I believe the first is true in most cases.  I wouldn't swear to the
second though, since WAL requires more I/O and doesn't save any fsyncs
if you've got 'em all turned off anyway ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> "The default is 'on' if your system defines one of the macros O_SYNC,
> O_DSYNC, O_FSYNC, and if O_SYNC and O_DSYNC are distinct, otherwise the
> default is 'off'."
> 
> The net result of this would be that the average user would have
> absolutely no clue what the default on his machine is.
> 
> Additionally consider that maybe O_SYNC and O_DSYNC have different values
> but the kernel treats them the same anyway.  We really shouldn't try to
> guess that far.

Good point.  I think Tom already found dfsync points to fsync in his
libc, or something like that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

> I've been mentally working through the code, and see only one reason why
> it might be necessary to go with a compile-time choice: suppose we see
> that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?

We postulate that one of those has to exist.  Alternatively, you make the
option read

wal_sync_method = fsync | open_sync

In the "parse_hook" for the parameter you if #ifdef out 'open_sync' as a
valid option if none of those exist, so a user will get "'open_sync' is
not a valid option value".

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

RE: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Mikheev, Vadim


> I believe that we don't know enough yet to nail down a hard-wired
> decision.  Vadim's idea of preferring O_DSYNC if it appears to be
> different from O_SYNC is a good first cut, but I think we'd 
> better make it possible to override that, at least for testing purposes.

So let's leave fsync as default and add option to open log files
with O_DSYNC/O_SYNC.

Vadim

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] pgmonitor completed

2001-03-15 Thread Bruce Momjian


I have completed all the features I want in the first release of
pgmonitor.  It is available at:

ftp://candle.pha.pa.us/pub/postgresql/pgmonitor.tar.gz

I am going to send this over soon to announce/general to encourage its
use.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
>> Peter, what do you think about configuration-dependent defaults for
>> GUC variables?

> We have plenty of those already, but we should avoid a variable whose
> specification is:

> "The default is 'on' if your system defines one of the macros O_SYNC,
> O_DSYNC, O_FSYNC, and if O_SYNC and O_DSYNC are distinct, otherwise the
> default is 'off'."

Unfortunately, I think that's just about what the default would need to
be.  What alternative do you have to offer?

> The net result of this would be that the average user would have
> absolutely no clue what the default on his machine is.

Sure he would.  Fire up the software and do "SHOW wal_use_fsync"
(or whatever we call it).  I think the documentation could just say
"the default is platform-dependent".

> Additionally consider that maybe O_SYNC and O_DSYNC have different values
> but the kernel treats them the same anyway.  We really shouldn't try to
> guess that far.

Well, that's exactly *why* we need an overridable default.  Or would you
like to try to do some performance measurements in configure?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

> However, I can actually make a case for this: we are flushing out
> performance bugs in a new feature, ie WAL.

I haven't followed the jungle of numbers too closely.

Is it not the case that WAL + fsync is still faster than 7.0 + fsync and
WAL/no fsync is still faster than 7.0/no fsync?

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

> "Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> > ... I would either
> > use fsync as default or don't deal with O_SYNC at all.
> > But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
> > use O_DSYNC by default.
>
> Hm.  We could do that reasonably painlessly as a compile-time test in
> xlog.c, but I'm not clear on how it would play out as a GUC option.
> Peter, what do you think about configuration-dependent defaults for
> GUC variables?

We have plenty of those already, but we should avoid a variable whose
specification is:

"The default is 'on' if your system defines one of the macros O_SYNC,
O_DSYNC, O_FSYNC, and if O_SYNC and O_DSYNC are distinct, otherwise the
default is 'off'."

The net result of this would be that the average user would have
absolutely no clue what the default on his machine is.

Additionally consider that maybe O_SYNC and O_DSYNC have different values
but the kernel treats them the same anyway.  We really shouldn't try to
guess that far.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Larry Rosenman


I'd actually vote for it to remain for a release or two or more, as 
we get more experience with stuff, the defaults may be different for 
different workloads. 

LER
-- 
Larry Rosenman
 http://www.lerctr.org/~ler/
Phone: +1 972 414 9812
 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 US

>> Original Message <<

On 3/15/01, 2:46:20 PM, Bruce Momjian <[EMAIL PROTECTED]> wrote 
regarding Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC:


> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > I later read Vadim's comment that fsync() of two blocks may be faster
> > > than two O_* writes, so I am now confused about the proper solution.
> > > However, I think we need to pick one and make it invisible to the user.
> > > Perhaps a compiler/config.h flag for testing would be a good solution.
> >
> > I believe that we don't know enough yet to nail down a hard-wired
> > decision.  Vadim's idea of preferring O_DSYNC if it appears to be
> > different from O_SYNC is a good first cut, but I think we'd better make
> > it possible to override that, at least for testing purposes.
> >
> > So I think it should be configurable at *some* level.  I don't much care
> > whether it's a config.h entry or a GUC variable.
> >
> > But consider this: we'll be more likely to get some feedback from the
> > field (allowing us to refine the policy in future releases) if it is a
> > GUC variable.  Not many people will build two versions of the software,
> > but people might take the trouble to play with a run-time configuration
> > setting.

> Yes, I can imagine.  Can we remove it once we know the answer?

> --
>   Bruce Momjian|  http://candle.pha.pa.us
>   [EMAIL PROTECTED]   |  (610) 853-3000
>   +  If your life is a hard drive, |  830 Blythe Avenue
>   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?

> http://www.postgresql.org/users-lounge/docs/faq.html

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > I later read Vadim's comment that fsync() of two blocks may be faster
> > than two O_* writes, so I am now confused about the proper solution. 
> > However, I think we need to pick one and make it invisible to the user. 
> > Perhaps a compiler/config.h flag for testing would be a good solution.
> 
> I believe that we don't know enough yet to nail down a hard-wired
> decision.  Vadim's idea of preferring O_DSYNC if it appears to be
> different from O_SYNC is a good first cut, but I think we'd better make
> it possible to override that, at least for testing purposes.
> 
> So I think it should be configurable at *some* level.  I don't much care
> whether it's a config.h entry or a GUC variable.
> 
> But consider this: we'll be more likely to get some feedback from the
> field (allowing us to refine the policy in future releases) if it is a
> GUC variable.  Not many people will build two versions of the software,
> but people might take the trouble to play with a run-time configuration
> setting.

Yes, I can imagine.  Can we remove it once we know the answer?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> I later read Vadim's comment that fsync() of two blocks may be faster
> than two O_* writes, so I am now confused about the proper solution. 
> However, I think we need to pick one and make it invisible to the user. 
> Perhaps a compiler/config.h flag for testing would be a good solution.

I believe that we don't know enough yet to nail down a hard-wired
decision.  Vadim's idea of preferring O_DSYNC if it appears to be
different from O_SYNC is a good first cut, but I think we'd better make
it possible to override that, at least for testing purposes.

So I think it should be configurable at *some* level.  I don't much care
whether it's a config.h entry or a GUC variable.

But consider this: we'll be more likely to get some feedback from the
field (allowing us to refine the policy in future releases) if it is a
GUC variable.  Not many people will build two versions of the software,
but people might take the trouble to play with a run-time configuration
setting.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


[ Charset ISO-8859-1 unsupported, converting... ]
> > Based on the tests we did last week, it seems clear than on many
> > platforms it's a win to sync the WAL log by writing it with open()
> > option O_SYNC (or O_DSYNC where available) rather than 
> > issuing explicit fsync() (resp. fdatasync()) calls.
> 
> I don't remember big difference in using fsync or O_SYNC in tfsync
> tests. Both depend on block size and keeping in mind that fsync
> allows us syncing after writing *multiple* blocks I would either
> use fsync as default or don't deal with O_SYNC at all.

I see what you are saying.  That the OS may be faster at fsync'ing two
blocks in one operation rather than doing to O_SYNC operations.

Seems we should just pick a default and leave the rest for a later
release.  Marc wants RC1 tomorrow, I think.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> > I've been mentally working through the code, and see only one reason why
> > it might be necessary to go with a compile-time choice: suppose we see
> > that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?  With the
> > compile-time choice it's easy: #define USE_FSYNC_FOR_WAL, and sail on.
> > If it's a GUC variable then we need a way to prevent the GUC option from
> > becoming unset (which would disable the fsync() calls, leaving nothing
> > to replace 'em).  Doable, perhaps, but seems kind of ugly ... any
> > thoughts about that?
> 
> I don't think having something a run-time option is always a good idea. 
> Giving people too many choices is often confusing.  
> 
> I think we should just check at compile time, and choose O_* if we have
> it, and if not, use fsync().  No one will ever do the proper timing
> tests to know which is better except us.  Also, it seems O_* should be
> faster because you are fsync'ing the buffer you just wrote, so there is
> no looking around for dirty buffers like fsync().

I later read Vadim's comment that fsync() of two blocks may be faster
than two O_* writes, so I am now confused about the proper solution. 
However, I think we need to pick one and make it invisible to the user. 
Perhaps a compiler/config.h flag for testing would be a good solution.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Can someone explain why configure/platform-specific flags are allowed to
> > be added at this stage in the release, but my pgmonitor patch was
> > rejected?
> 
> Possibly just because Marc hasn't stomped on me quite yet ;-)
> 
> However, I can actually make a case for this: we are flushing out
> performance bugs in a new feature, ie WAL.


You did a masterful job of making my pgmonitor patch sound like a debug
aid instead of a feature too.  :-)

Have you considered a career in law.  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane


Bruce Momjian <[EMAIL PROTECTED]> writes:
> Can someone explain why configure/platform-specific flags are allowed to
> be added at this stage in the release, but my pgmonitor patch was
> rejected?

Possibly just because Marc hasn't stomped on me quite yet ;-)

However, I can actually make a case for this: we are flushing out
performance bugs in a new feature, ie WAL.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Based on the tests we did last week, it seems clear than on many
> platforms it's a win to sync the WAL log by writing it with open()
> option O_SYNC (or O_DSYNC where available) rather than issuing explicit
> fsync() (resp. fdatasync()) calls.  In theory fsync ought to be faster,
> but it seems that too many kernels have inefficient implementations of
> fsync.

Can someone explain why configure/platform-specific flags are allowed to
be added at this stage in the release, but my pgmonitor patch was
rejected?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian


> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > As a general rule, if something can be a run time option, as opposed to a
> > compile time option, then it should be.  At the very least you keep the
> > installation simple and allow for easier experimenting.
> 
> I've been mentally working through the code, and see only one reason why
> it might be necessary to go with a compile-time choice: suppose we see
> that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?  With the
> compile-time choice it's easy: #define USE_FSYNC_FOR_WAL, and sail on.
> If it's a GUC variable then we need a way to prevent the GUC option from
> becoming unset (which would disable the fsync() calls, leaving nothing
> to replace 'em).  Doable, perhaps, but seems kind of ugly ... any
> thoughts about that?

I don't think having something a run-time option is always a good idea. 
Giving people too many choices is often confusing.  

I think we should just check at compile time, and choose O_* if we have
it, and if not, use fsync().  No one will ever do the proper timing
tests to know which is better except us.  Also, it seems O_* should be
faster because you are fsync'ing the buffer you just wrote, so there is
no looking around for dirty buffers like fsync().

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 11:45] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > And since we're sorta on the topic of IO, I noticed that it looks
> > like (at least in 7.0.3) that vacuum and certain other routines
> > read files in reverse order.
> 
> Vacuum does that because it's trying to push tuples down from the end
> into free space in earlier blocks.  I don't see much way around that
> (nor any good reason to think that it's a critical part of vacuum's
> performance anyway).  Where else have you seen such behavior?

Just vacuum, but the source is large, and I'm sort of lacking
on database-foo so I guessed that it may be done elsewhere.

You can optimize this out by implementing the read behind yourselves
sorta like this:

struct sglist *
read(fd, len)
{

if (fd.lastpos - fd.curpos <= THRESHOLD) {
fd.curpos = fd.lastpos - THRESHOLD;
len = THRESHOLD;
}

return (do_read(fd, len));
}

of course this is entirely wrong, but illustrates what
would/could help.

I would fix FreeBSD, but it's sort of a mess and beyond what
I've got time to do ATM.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Alfred Perlstein <[EMAIL PROTECTED]> writes:
> And since we're sorta on the topic of IO, I noticed that it looks
> like (at least in 7.0.3) that vacuum and certain other routines
> read files in reverse order.

Vacuum does that because it's trying to push tuples down from the end
into free space in earlier blocks.  I don't see much way around that
(nor any good reason to think that it's a critical part of vacuum's
performance anyway).  Where else have you seen such behavior?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Peter Eisentraut <[EMAIL PROTECTED]> [010315 11:33] wrote:
> Alfred Perlstein writes:
> 
> > Sorry, what's a GUC? :)
> 
> Grand Unified Configuration system
> 
> It's basically a cute name for the achievement that there's now a single
> name space and interface for (almost) all postmaster run time
> configuration variables,

Oh, thanks.

Well considering that, a runtime check for doing_sync_wal_writes
== 1 shouldn't be that expensive.  Sort of the inverse of -F,
meaning that we're using O_SYNC for WAL writes, we don't need to
fsync it.

Btw, if you guys want to get some speed with WAL, I'd implement a
write-behind process if it was possible to do the O_SYNC writes.

...

And since we're sorta on the topic of IO, I noticed that it looks
like (at least in 7.0.3) that vacuum and certain other routines
read files in reverse order.

The problem (at least in FreeBSD) is that we haven't tuned
the system to detect reverse reading and hence don't do
much readahead.  There may be some going on as a function
of the read clustering, but I'm not entirely sure.

I'd suspect that other OSs might have neglected to check
for reverse reading of files as well, but I'm not sure.

Basically, if there was a way to do this another way, or
anticipate the backwards motion and do large reads, it
may add latency, but it should improve performance.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Lamar Owen

Alfred Perlstein wrote:
> * Tom Lane <[EMAIL PROTECTED]> [010315 11:07] wrote:
> > Peter, what do you think about configuration-dependent defaults for
> > GUC variables?

> Sorry, what's a GUC? :)

Grand Unified Configuration, Peter E.'s baby.

See the thread starting at
http://www.postgresql.org/mhonarc/pgsql-hackers/2000-03/msg00107.html
for details.

(And the search is working :-)).
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut


Alfred Perlstein writes:

> Sorry, what's a GUC? :)

Grand Unified Configuration system

It's basically a cute name for the achievement that there's now a single
name space and interface for (almost) all postmaster run time
configuration variables,

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 11:07] wrote:
> "Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> > ... I would either
> > use fsync as default or don't deal with O_SYNC at all.
> > But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
> > use O_DSYNC by default.
> 
> Hm.  We could do that reasonably painlessly as a compile-time test in
> xlog.c, but I'm not clear on how it would play out as a GUC option.
> Peter, what do you think about configuration-dependent defaults for
> GUC variables?

Sorry, what's a GUC? :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> ... I would either
> use fsync as default or don't deal with O_SYNC at all.
> But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
> use O_DSYNC by default.

Hm.  We could do that reasonably painlessly as a compile-time test in
xlog.c, but I'm not clear on how it would play out as a GUC option.
Peter, what do you think about configuration-dependent defaults for
GUC variables?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

RE: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Mikheev, Vadim


> Based on the tests we did last week, it seems clear than on many
> platforms it's a win to sync the WAL log by writing it with open()
> option O_SYNC (or O_DSYNC where available) rather than 
> issuing explicit fsync() (resp. fdatasync()) calls.

I don't remember big difference in using fsync or O_SYNC in tfsync
tests. Both depend on block size and keeping in mind that fsync
allows us syncing after writing *multiple* blocks I would either
use fsync as default or don't deal with O_SYNC at all.
But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
use O_DSYNC by default.
(BTW, we didn't compare fdatasync and O_SYNC yet).

Vadim

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] rtrim giving weird result

2001-03-15 Thread Ross J. Reedstrom


On Thu, Mar 15, 2001 at 01:18:57PM -0500, Tom Lane wrote:
> "Ross J. Reedstrom" <[EMAIL PROTECTED]> writes:
> >> Is there a way to just remove the "_opto" from the end of the string?
> 
> > If you have exactly one known string to (optionally) remove, this works
> > (and even works if the string is missing. Watch out for the early
> > occurance of substring problem, though!):
> 
> > test=# select substr('center_out_opto',1,(strpos('center_out_opto','_opto')-1)); 
> 
> My first thought for any moderately complicated string-bashing problem
> is to write a function in pltcl or plperl ... they are much stronger in
> string manipulation than SQL itself is.

Agreed, hence the caveats about 'exactly one string, that you know ahead of
time, and never appears as a substring ...'

But it _can_ be done, it's just not pretty. And it _is_ standard SQL:
here's the SQL92 spelling of the above:

SELECT SUBSTRING ('center_out_opto' FROM 1 FOR (POSITION ('_opto' IN 
'center_out_opto') - 1));

Ross

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Sheduling in SQL

2001-03-15 Thread Vincent AE Scott

On Thu, 15 Mar 2001, Paul wrote:

> Sorry, if I used not corresponding mailing list, but I really dont
> know where to send such email.
> 
> Is that possible to add one more SQL command to Postgres? The problem
> that IMHO no one RDBMS allows SQL command for sheduling. To support
> sheduling SQL programmers have to use outer tools to periodically
> check database if event commit. But IMHO it's much better to add one
> more SQL command to allow sheduling in same SQL.
> 
> My thoughts about such command follow:
> =

one option for doing this, ( in a fairly non-portable way ), is to create
a 'C' function contained in a shared library.  on most unixen you can put
in _init and _fini functions such that when the library is dlopened/closed
the functions execute.  simply create a thread in the _init, that sits
arround on a timer, then does some stuff.  not ideal, but an option

PGP key:  http://codex.net/pgp/pgp.asc

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] rtrim giving weird result

2001-03-15 Thread Tom Lane


"Ross J. Reedstrom" <[EMAIL PROTECTED]> writes:
>> Is there a way to just remove the "_opto" from the end of the string?

> If you have exactly one known string to (optionally) remove, this works
> (and even works if the string is missing. Watch out for the early
> occurance of substring problem, though!):

> test=# select substr('center_out_opto',1,(strpos('center_out_opto','_opto')-1)); 

My first thought for any moderately complicated string-bashing problem
is to write a function in pltcl or plperl ... they are much stronger in
string manipulation than SQL itself is.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> As a general rule, if something can be a run time option, as opposed to a
> compile time option, then it should be.  At the very least you keep the
> installation simple and allow for easier experimenting.

I've been mentally working through the code, and see only one reason why
it might be necessary to go with a compile-time choice: suppose we see
that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?  With the
compile-time choice it's easy: #define USE_FSYNC_FOR_WAL, and sail on.
If it's a GUC variable then we need a way to prevent the GUC option from
becoming unset (which would disable the fsync() calls, leaving nothing
to replace 'em).  Doable, perhaps, but seems kind of ugly ... any
thoughts about that?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

> I think we need to make both O_SYNC and fsync() choices available in
> 7.1.  Two important questions need to be settled:
>
> 1. Is a compile-time flag (in config.h.in) good enough, or do we need
> to make it configurable via a GUC variable?  (A variable would have to
> be postmaster-start-time changeable only, so you'd still need a
> postmaster restart to change it.)

As a general rule, if something can be a run time option, as opposed to a
compile time option, then it should be.  At the very least you keep the
installation simple and allow for easier experimenting.

> There's also the lesser question of what to call the config symbol
> or variable.

I suggest "wal_use_fsync" as a GUC variable, assuming the default would be
off.  Otherwise "wal_use_open_sync".  (Use a general-to-specific naming
scheme to allow for easier grouping.  Having defaults be "off"
consistently is more intuitive.)

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] rtrim giving weird result

2001-03-15 Thread Ross J. Reedstrom

On Thu, Mar 15, 2001 at 09:34:04AM -0800, G. Anthony Reina wrote:
> Ken Hirsch wrote:
> 
> > So rtrim("center_out_opto", "_opto") returns
> > "center_ou"
> > because "u" is not in the set {o, p, t, _} but all the characters after it
> > are.
> > rtrim("center_out_opto", "pot_") will produce the same thing.
> >

Modulo the correct quoting conventions for strings, of course.

> 
> That seems like an odd definition (although as Tom points out, it is
> consistent with Oracle).

Yup, I got bit by it, trying to remove 'The ' from the front of a set of
words, in order to get an approximation of 'library sort'.

> 
> Is there a way to just remove the "_opto" from the end of the string?

If you have exactly one known string to (optionally) remove, this works
(and even works if the string is missing. Watch out for the early
occurance of substring problem, though!):

test=# select substr('center_out_opto',1,(strpos('center_out_opto','_opto')-1)); 
   substr   

 center_out
(1 row)

test=#  select substr('center_out_opto',1,(strpos('center_out_opto','foo')-1));
 substr  
-
 center_out_opto
(1 row)

test=# 

Ross

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Alfred Perlstein <[EMAIL PROTECTED]> writes:
> * Tom Lane <[EMAIL PROTECTED]> [010315 09:35] wrote:
>> BTW, are there any platforms where O_DSYNC exists but has a different
>> spelling?

> Yes, FreeBSD only has: O_FSYNC
> it doesn't have O_SYNC nor O_DSYNC.

Okay ... we can fall back to O_FSYNC if we don't see either of the
others.  No problem.  Any other weird cases out there?  I think Andreas
might've muttered something about AIX but I'm not sure now.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 09:35] wrote:
> 
> BTW, are there any platforms where O_DSYNC exists but has a different
> spelling?

Yes, FreeBSD only has: O_FSYNC
it doesn't have O_SYNC nor O_DSYNC.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane


Based on the tests we did last week, it seems clear than on many
platforms it's a win to sync the WAL log by writing it with open()
option O_SYNC (or O_DSYNC where available) rather than issuing explicit
fsync() (resp. fdatasync()) calls.  In theory fsync ought to be faster,
but it seems that too many kernels have inefficient implementations of
fsync.

I think we need to make both O_SYNC and fsync() choices available in
7.1.  Two important questions need to be settled:

1. Is a compile-time flag (in config.h.in) good enough, or do we need
to make it configurable via a GUC variable?  (A variable would have to
be postmaster-start-time changeable only, so you'd still need a
postmaster restart to change it.)

2. Which way should be the default?

There's also the lesser question of what to call the config symbol
or variable.

My inclination is to go with a compile-time flag named USE_FSYNC_FOR_WAL
and have the default be off (ie, use O_SYNC by default) but I'm not
strongly set on that.  Opinions anyone?

In any case the code should automatically prefer O_DSYNC over O_SYNC if
available, and should prefer fdatasync() over fsync() if available;
I doubt we need to provide a knob to alter those choices.

BTW, are there any platforms where O_DSYNC exists but has a different
spelling?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] rtrim giving weird result

2001-03-15 Thread G. Anthony Reina


Ken Hirsch wrote:

> So rtrim("center_out_opto", "_opto") returns
> "center_ou"
> because "u" is not in the set {o, p, t, _} but all the characters after it
> are.
> rtrim("center_out_opto", "pot_") will produce the same thing.
>

That seems like an odd definition (although as Tom points out, it is
consistent with Oracle).

Is there a way to just remove the "_opto" from the end of the string?

-Tony



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Sheduling in SQL

2001-03-15 Thread Tom Lane


Hannu Krosing <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> (some of the protocol changes in 6.4 were done to make it easier ;-)).

> I may misremember, but IIRC some older protocol (or at least libpq) 
> returned 0 as backend pid to listening client if it was notified by itself.

> Currently it returns the actual pid for any backend. Is this what you 
> changed?

That was one of the smaller items.  The bigger problem was that the
backend wouldn't forward you NOTIFY events unless you issued a constant
stream of dummy queries.

> Anyhow we need some _documented_ way to get backend pid

PQbackendPID() seems adequately documented to me ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Sheduling in SQL

2001-03-15 Thread Hannu Krosing

Tom Lane wrote:

> 
> I've built applications that do roughly this sort of thing in Postgres
> (some of the protocol changes in 6.4 were done to make it easier ;-)).

I may misremember, but IIRC some older protocol (or at least libpq) 
returned 0 as backend pid to listening client if it was notified by itself.

Currently it returns the actual pid for any backend. Is this what you 
changed?

Anyhow we need some _documented_ way to get backend pid (there is one 
actually received and stored with "cookie" for Ctrl-C processing, but 
AFAIK it is neither documented as being the backend id nor is there a 
function to get at it).

For my own use I created a C function pid() but perhaps there should be 
something mainstream for this.

---
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CeBit

2001-03-15 Thread Jan Wieck

Michael Meskes wrote:
> Is anyone on this list in Hannover for CeBit? Maybe we could arrange a
> meeting.

Looks pretty much that I'll be still in Hamburg by then. What
are the days you planned?

Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> What about a collector deamon, fired up by the postmaster and
> receiving UDP packets from the backends. Under heavy load, it
> might miss some statistic messages, well, but that's  not  as
> bad as having locks causing backends to loose performance.

Interesting thought, but we don't want UDP I think; that just opens
up a whole can of worms about checking access permissions and so forth.
Why not a simple pipe?  The postmaster creates the pipe and the
collector daemon inherits one end, while all the backends inherit the
other end.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Sheduling in SQL

2001-03-15 Thread Tom Lane

Paul <[EMAIL PROTECTED]> writes:
> CREATE SHEDULER name
> ON table.field
> [FOR [EACH]|[LAST]]
> EXECUTE PROCEDURE func(arguments)

> When the current time becomes equal or more than minimal time in
> the _table.field_, the event happens and the _func_ will be executed,
> and after that all records in this _table_ that in the _field_ have
> time equal or less than current time will be deleted.

This strikes me as way too problem-specific to be reasonable as a
general-purpose system extension.

You can actually build this sort of facility in Postgres as it stands,
using a background process that executes the items from the "todo"
table.  You'd put rules or triggers on the todo table to send out a
NOTIFY event, which the background guy would listen for; that would cue
him to re-select the minimum timestamp in the table.  Then he'd just
sleep until the next NOTIFY or time to do something.

The primary advantage of doing things this way is that you have an
actual client process executing the todo actions, so it could perform
outside-the-database actions as well as any database updates that might
be needed.  In the scheme you describe, the "func" would have to be
executed in some disembodied backend context --- it wouldn't even have
a client to talk to, let alone any chance of doing outside-the-database
actions.

I've built applications that do roughly this sort of thing in Postgres
(some of the protocol changes in 6.4 were done to make it easier ;-)).
Unfortunately that was proprietary code and I can't show it to you,
but it's not really difficult.  Perhaps you'd like to do up a simple
example and contribute it as a "contrib" module?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Database corruption in 7.0.3

2001-03-15 Thread Tom Lane


Tim Allen <[EMAIL PROTECTED]> writes:
> Are there any known database-corrupting bugs in 7.0.3?

None that aren't also in earlier releases, AFAIR, so your report is
fairly troubling.  However there's not enough here to venture a guess
about the source of the problem.

Do you see any backend crashes or other misbehavior before the VACUUM
error pops up, or is that the only symptom?

It would be a good idea to rebuild the system with assert checks on
(configure --enable-cassert), in hopes that some Assert a little closer
to the source of the problem will fire.  Also, if you can spare some
disk space for logging, running the postmaster with -d2 to log all
queries might provide useful historical context when the problem
reappears.

I would like to be able to study the corrupted table, as well.  Can you
see your way to either giving me access to your machine, or (if the
database isn't too large) sending me a tar dump of the whole $PGDATA
directory next time it happens?

Please contact me off-list so we can figure out how best to pursue this
problem.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PostgreSQL on multi-CPU systems

2001-03-15 Thread Tom Lane

[EMAIL PROTECTED] (D'Arcy J.M. Cain) writes:
> One thing I notice is that a single query can seem to block other queries,
> at least to some extent.

It's not supposed to, except with certain specific features (for
example, I don't think any of the index types other than btree allow
concurrent insertions).  Can you give a concrete example?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Unicode in postgresql

2001-03-15 Thread Tatsuo Ishii


> Hi all pgsql-hackers,
> 
> Iam a new hacker of this list.
> 
> I and a few others have started an Linux localization project for Indian
> Languages called - Project Tuxila (http://inapp.com/tuxila), and are
> currently doing the localization for an Indian language called
> "Malayalam". The utilities that will be developed as part of the
> project will be under GNU GPL.
> 
> We are trying to develop all the required utilities for linux
> localization. And as part of that we are trying to implement
> "Malayalam" into postgreSQL with the Unicode support available in it.

What kind of encoding is Malayalam? Is it ISO 2022 compatible? Or yet
another local encoding?

> Could anyone tell me, whether there is any research going in
> postgreSQL-Unicode areas, so that i can communicate with them and try to
> find solutions to my problems.

PostgreSQL 7.1 will have a feature that does an automatic encoding
conversion between Unicode(UTF-8) and other encodings including ISO
8859-1 to 5, EUC(Extended Unix Code) in the database engine.

> Is there any list for postgreSQL-Unicode?
> 
> Is there any Unicode sorting engine present in postgreSQL?

No.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] Problem build python extension on Unixware

2001-03-15 Thread Joel Quinet


Hello all,

I try to build postgresql 7.1 beta 5 on UnixWare 7.1.1. I have problem to
build the Python extension,
there is a problem to build the shared library, I have see some information
on that but don't help me.
Sorry

Can you help me by providing the correct command

Thanks
Joel






---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] PostgreSQL on multi-CPU systems

2001-03-15 Thread D'Arcy J.M. Cain

Thus spake Tom Lane
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > I have tested PostgreSQL with 2-4 CPU linux boxes. In summary, 2 CPU
> > was a big win, but 4 was not. I'm not sure where the bottle neck is
> > though.
> 
> Our not-very-good implementation of spin locking (using select() to
> wait) might have something to do with this.  Sometime soon I'd like to
> look at using POSIX semaphores where available, instead of spinlocks.

One thing I notice is that a single query can seem to block other queries,
at least to some extent.  It makes me wonder if we effectively have a
single threaded system.  In fact, I have some simple queries that if
I send a bunch at once, the first one can take 15 seconds while the
others zip through.  Is this related to what you are talking about?

-- 
D'Arcy J.M. Cain|  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Unicode in postgresql

2001-03-15 Thread Suraj Kumar S.


Hi all pgsql-hackers,

Iam a new hacker of this list.

I and a few others have started an Linux localization project for Indian
Languages called - Project Tuxila (http://inapp.com/tuxila), and are
currently doing the localization for an Indian language called
"Malayalam". The utilities that will be developed as part of the
project will be under GNU GPL.

We are trying to develop all the required utilities for linux
localization. And as part of that we are trying to implement
"Malayalam" into postgreSQL with the Unicode support available in it.

Could anyone tell me, whether there is any research going in
postgreSQL-Unicode areas, so that i can communicate with them and try to
find solutions to my problems.

Is there any list for postgreSQL-Unicode?

Is there any Unicode sorting engine present in postgreSQL?

I also invite interested hackers to participate in this FreeSoftware 
movement.

regards,
Suraj Kumar S.
--
GNU/Linux rulz!


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Jan Wieck

Bruce Momjian wrote:
>
> Yes, it seems storing info in shared memory and having a system table to
> access it is the way to go.

Depends,

first  of all we need to know WHAT we want to collect.  If we
talk about block read/write statistics  and  such  on  a  per
table  base, which is IMHO the most accurate thing for tuning
purposes, then we're talking about an infinite size of shared
memory perhaps.

And  shared  memory has all the interlocking problems we want
to avoid.

What about a collector deamon, fired up by the postmaster and
receiving UDP packets from the backends. Under heavy load, it
might miss some statistic messages, well, but that's  not  as
bad as having locks causing backends to loose performance.

The  postmaster  could already provide the UDP socket for the
backends, so the collector can know  the  peer  address  from
which  to  accept  statistics messages only. Any message from
another peer address is  simply  ignored.   For  getting  the
statistics  out  of  it,  the  collector  has  his own server
socket, using TCP and providing some lookup protocol.

Now whatever the backend has to tell the collector, it simply
throws  a UDP packet into his direction. If the collector can
catch it or not, not the backends problem.

Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: AW: [HACKERS] Re: Week number

2001-03-15 Thread Karel Zak


On Wed, Mar 14, 2001 at 01:23:30PM -0500, Roland Roberts wrote:
> > "AZ" == Zeugswetter Andreas SB <[EMAIL PROTECTED]> writes:
> 
> >> Unix day-of-week starts on Sunday, not Monday, which is what
> >> date_trunc('dow',...) returns. Presumably this is modeled on
> >> the traditional notion (at least in the US; I suspect this is
> >> true in most European countries at least) of Sunday being "the
> >> first day of week".
> 
> AZ> Germany and Austria have Monday as first day of week, I think
> AZ> most of Europe also.
> 
> I believe the goal was to have a to_char() that was complete and
> Oracle-compatible.  Perhaps we need to also have a trunc() which is

 Yes, an Oracle-compatiblity is important for masks (format pictures)
used in both (Ora and PG). But our PG's implementation has some extensions,
for example 'ID' ISO-day-of-week in 7.2 where Monday = first day of week.
I hope all countries will glad :-)


 for 'WW' and 'D' are results same:

Ora:

SVRMGR> select to_char( to_date('2001/03/12', '/MM/DD'), 'WW Day D
/MM/DD') from dual;
TO_CHAR(TO_DATE('2001/03/
-
11 Monday2 2001/03/12
1 row selected.

PG:

select to_char( to_date('2001/03/12', '/MM/DD'), 'WW Day D /MM/DD');
  to_char
---
 11 Monday2 2001/03/12
(1 row)

Karel

-- 
 Karel Zak  <[EMAIL PROTECTED]>
 http://home.zf.jcu.cz/~zakkr/
 
 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

 PGP signature


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Sheduling in SQL

2001-03-15 Thread Paul


Sorry, if I used not corresponding mailing list, but I really dont
know where to send such email.

Is that possible to add one more SQL command to Postgres? The problem
that IMHO no one RDBMS allows SQL command for sheduling. To support
sheduling SQL programmers have to use outer tools to periodically
check database if event commit. But IMHO it's much better to add one
more SQL command to allow sheduling in same SQL.

My thoughts about such command follow:
=

The SQL command for sheduler creating:

CREATE SHEDULER name
ON table.field
[FOR [EACH]|[LAST]]
EXECUTE PROCEDURE func(arguments)

When the current time becomes equal or more than minimal time in
the _table.field_, the event happens and the _func_ will be executed,
and after that all records in this _table_ that in the _field_ have
time equal or less than current time will be deleted.

The other fields of this _table_ could be used as _arguments_ (or
agregates of the other fields when _FOR EACH_ is absent).

_FOR LAST_ - only for the record(s) of the _table_ that has(ve) the
maximum time (that equal or less the current time) the event(s) will
be processed.

_FOR EACH_ - if there is such parameter for each corresponding record
the event could be processed, not for all at once.

For each _CREATE SHEDULER_ will be created:
1. B-tree index on _table.field_.
2. Inner trigger on insert/delete/update _table.field_ to have up to
date min(_table.field_) for nearest event processing.


The SQL command for sheduler deleting:

DELETE SHEDULER name



-- 
Best regards,
 Paul Mamin  mailto:[EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Database corruption in 7.0.3

2001-03-15 Thread Denis Perchine


Can confirm this. Get this just yesterday time ago...

Messages:

NOTICE:  Rel acm: TID 1697/217: OID IS INVALID. TUPGONE 1.

And lots of such lines...
And

pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.


In the end :-((( I lost a library of our institute... :-((( But I have a 
backup!!! :- This table even have NO indices!!!

Program received signal SIGSEGV, Segmentation fault.
0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311
311 alignedSize = MAXALIGN((*lp).lp_len);
(gdb) bt
#0  0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311
#1  0x80a9b07 in vc_scanheap (vacrelstats=0x82675b0, onerel=0x8273428, 
vacuum_pages=0xbfffe928, fraged_pages=0xbfffe918) at vacuum.c:1022
#2  0x80a8e8b in vc_vacone (relid=27296, analyze=0 '\000', va_cols=0x0) at 
vacuum.c:599
#3  0x80a8217 in vc_vacuum (VacRelP=0xbfffe9b4, analyze=0 '\000', 
va_cols=0x0) at vacuum.c:299
#4  0x80a818b in vacuum (vacrel=0x8267400 "", verbose=1 '\001', analyze=0 
'\000', va_spec=0x0) at vacuum.c:223
#5  0x813fba5 in ProcessUtility (parsetree=0x8267418, dest=Remote) at 
utility.c:694
#6  0x813c16e in pg_exec_query_dest (query_string=0x820aaa0 "vacuum verbose 
acm;", dest=Remote, aclOverride=0 '\000') at postgres.c:617
#7  0x813c08e in pg_exec_query (query_string=0x820aaa0 "vacuum verbose acm;") 
at postgres.c:562
#8  0x813d4c3 in PostgresMain (argc=9, argv=0xb068, real_argc=9, 
real_argv=0xba3c) at postgres.c:1588
#9  0x811ace5 in DoBackend (port=0x8223068) at postmaster.c:2009
#10 0x811a639 in BackendStartup (port=0x8223068) at postmaster.c:1776
#11 0x811932f in ServerLoop () at postmaster.c:1037
#12 0x8118b0e in PostmasterMain (argc=9, argv=0xba3c) at postmaster.c:725
#13 0x80d5e5e in main (argc=9, argv=0xba3c) at main.c:93
#14 0x40111fee in __libc_start_main () from /lib/libc.so.6

This is plain 7.0.3.

On Thursday 15 March 2001 14:52, Tim Allen wrote:
> We have an application that we were running quite happily using pg6.5.3
> in various customer sites. Now we are about to roll out a new version of
> our application, and we are going to use pg7.0.3. However, in testing
> we've come across a couple of isolated incidents of database
> corruption. They are sufficiently rare that I can't reproduce the problem,
> nor can I put my finger on just what application behaviour causes the
> problems.
>
> The symptoms most often involve some sort of index corruption, which is
> reported by vacuum and it seems that vacuum can fix it. On occasion vacuum
> reports "invalid OID" or similar (sorry, don't have exact wording of
> message). On one occasion the database has been corrupted to the point of
> unusability (ie vacuum admitted that it couldn't fix the problem), and a
> dump/restore was required (thankfully that at least worked). The index
> corruption also occasionally manifests itself in the form of spurious
> uniqueness constraint violation errors.
>
> The previous version of our app using 6.5.3 has never shown the slightest
> symptom of database misbehaviour, to the best of my knowledge, despite
> fairly extensive use. So our expectations are fairly high :-).
>
> One thing that is different about the new version of our app is that we
> now use multiple connections to the database (previously we only had
> one). We can in practice have transactions in progress on several
> connections at once, and it is possible for some transactions to be rolled
> back under application control (ie explicit ROLLBACK; statement).
>
> I realise I haven't really provided an awful lot of information that would
> help identify the problem, so I shall attempt to be understanding if
> no-one can offer any useful suggestions. But I hope someone can :-). Has
> anyone seen this sort of problem before? Are there any known
> database-corrupting bugs in 7.0.3? I don't recall anyone mentioning any in
> the mailing lists. Is using multiple connections likely to stimulate any
> known areas of risk?
>
> BTW we are using plain vanilla SQL, no triggers, no new types defined, no
> functions, no referential integrity checks, nothing more ambitious than a
> multi-column primary key.
>
> The platform is x86 Red Hat Linux 6.2. Curiously enough, on one of our
> testing boxes and on my development box we have never seen this, but we
> have seen it several times on our other test box and at least one customer
> site, so there is some possibility it's related to dodgy hardware. The
> customer box with the problem is a multi-processor box, all the other
> boxes we've tested on are single-processor.
>
> TIA for any help,
>
> Tim

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--

---(end of broadcast)-

[HACKERS] Database corruption in 7.0.3

2001-03-15 Thread Tim Allen


We have an application that we were running quite happily using pg6.5.3
in various customer sites. Now we are about to roll out a new version of
our application, and we are going to use pg7.0.3. However, in testing
we've come across a couple of isolated incidents of database
corruption. They are sufficiently rare that I can't reproduce the problem,
nor can I put my finger on just what application behaviour causes the
problems.

The symptoms most often involve some sort of index corruption, which is
reported by vacuum and it seems that vacuum can fix it. On occasion vacuum
reports "invalid OID" or similar (sorry, don't have exact wording of
message). On one occasion the database has been corrupted to the point of
unusability (ie vacuum admitted that it couldn't fix the problem), and a
dump/restore was required (thankfully that at least worked). The index
corruption also occasionally manifests itself in the form of spurious
uniqueness constraint violation errors.

The previous version of our app using 6.5.3 has never shown the slightest
symptom of database misbehaviour, to the best of my knowledge, despite
fairly extensive use. So our expectations are fairly high :-).

One thing that is different about the new version of our app is that we
now use multiple connections to the database (previously we only had
one). We can in practice have transactions in progress on several
connections at once, and it is possible for some transactions to be rolled
back under application control (ie explicit ROLLBACK; statement).

I realise I haven't really provided an awful lot of information that would
help identify the problem, so I shall attempt to be understanding if
no-one can offer any useful suggestions. But I hope someone can :-). Has
anyone seen this sort of problem before? Are there any known
database-corrupting bugs in 7.0.3? I don't recall anyone mentioning any in
the mailing lists. Is using multiple connections likely to stimulate any
known areas of risk?

BTW we are using plain vanilla SQL, no triggers, no new types defined, no
functions, no referential integrity checks, nothing more ambitious than a
multi-column primary key.

The platform is x86 Red Hat Linux 6.2. Curiously enough, on one of our
testing boxes and on my development box we have never seen this, but we
have seen it several times on our other test box and at least one customer
site, so there is some possibility it's related to dodgy hardware. The
customer box with the problem is a multi-processor box, all the other
boxes we've tested on are single-processor.

TIA for any help,

Tim

-- 
---
Tim Allen  [EMAIL PROTECTED]
Proximity Pty Ltd  http://www.proximity.com.au/
  http://www4.tpg.com.au/users/rita_tim/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

88 matches

Mail list logo