Re: [HACKERS] bugzilla (Was: What do you want me to do?)

2003-11-09 Thread Christopher Kings-Lynne
Seriously, I have wondered if it might be a good idea to assemble a 
small hit team that would take some high profile open source projects 
and make sure they worked with Postgres. Bugzilla would be the most 
obvious candidate, but there are certainly others. I suspect that could 
be quite productive, though.

Thoughts?
Count me out - I spend way too much of my time working on phpPgAdmin as 
it is :)

Chris

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] bugzilla (Was: What do you want me to do?)

2003-11-09 Thread Alvaro Herrera
On Sun, Nov 09, 2003 at 04:12:50PM +0800, Christopher Kings-Lynne wrote:
 Seriously, I have wondered if it might be a good idea to assemble a 
 small hit team that would take some high profile open source projects 
 and make sure they worked with Postgres. Bugzilla would be the most 
 obvious candidate, but there are certainly others. I suspect that could 
 be quite productive, though.
 
 Count me out - I spend way too much of my time working on phpPgAdmin as 
 it is :)

Count me out too.  I already ported MagicPoint to work well with
Postgres, and I'm a little overwhelmed already to do anything else.

err... so, how does MagicPoint use Postgres?  I suppose it's only
because some of us use it to give Postgres talks and such ;-)

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
On the other flipper, one wrong move and we're Fatal Exceptions
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Coding help

2003-11-09 Thread Christopher Kings-Lynne

The only idea I have is to call oidin() to do the conversion from string
to oid.  I see this in copy.c:
loaded_oid = DatumGetObjectId(DirectFunctionCall1(oidin,
   CStringGetDatum(string)));
if (loaded_oid == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 errmsg(invalid OID in COPY data)));
I would give that a try.
Yes but in this case, the Node is parsed like this:

| COMMENT ON LARGE OBJECT NumericOnly IS comment_text
{
 ...

 n-objname = makeList1(makeAConst($5));

 ...

 $$ = (Node *) n;
}
So there is no CString to convert to a Datum...

Chris



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Coding help

2003-11-09 Thread Bruce Momjian
Christopher Kings-Lynne wrote:
 
  The only idea I have is to call oidin() to do the conversion from string
  to oid.  I see this in copy.c:
  
  loaded_oid = DatumGetObjectId(DirectFunctionCall1(oidin,
 CStringGetDatum(string)));
  if (loaded_oid == InvalidOid)
  ereport(ERROR,
  (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
   errmsg(invalid OID in COPY data)));
  
  I would give that a try.
 
 Yes but in this case, the Node is parsed like this:
 
 | COMMENT ON LARGE OBJECT NumericOnly IS comment_text
 {
 
   ...
 
 
   n-objname = makeList1(makeAConst($5));
 
   ...
 
   $$ = (Node *) n;
 }
 
 So there is no CString to convert to a Datum...

But I see makeAConst as:

static Node *
makeAConst(Value *v)
{
Node *n;

switch (v-type)
{
case T_Float:
n = makeFloatConst(v-val.str);
break;

case T_Integer:
n = makeIntConst(v-val.ival);
break;

case T_String:
default:
n = makeStringConst(v-val.str, NULL);
break;
}

return n;
}

which uses makeStringConst().

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Coding help

2003-11-09 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 | COMMENT ON LARGE OBJECT NumericOnly IS comment_text
   n-objname = makeList1(makeAConst($5));

Forget the makeAConst step --- it's just wasted cycles.  In the
execution routine, you can use code comparable to define.c's defGetInt64()
to convert the Value node into a numeric OID, ie, either do intVal() or
a call of oidin().

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] New database access library, libpqxx-object

2003-11-09 Thread Roger Leigh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I've been using libpqxx to access a Postgres database for the last few
months, and I wrote a set of classes to wrap around libpqxx, which
gradually evolved into a small set of classes and class templates.
These allow database tables and rows of tables to be represented as
real C++ objects, which you can interact with using normal class
methods.

Basically, it's an abstraction layer to separate the SQL SELECT,
INSERT, UPDATE and DELETE commands from the user of the classes, so
the classes become database aware: the user of the class doesn't
need to be aware of the database interaction going on under the hood.
Database relationships such as joins using foreign keys and table
inheritance may be expressed using C++ inheritance.

The table classes take a pqxx::Connection in their constructors and
an optional pqxx::Transaction (they can run in either transaction or
auto-commit modes).


The current version of the library is available from:
http://www.whinlatter.uklinux.net/libpqxx-object-0.1.0.tar.bz2
This includes an API reference and a short tutorial explaining how it
is intended to be used.  It's not yet completely perfect, but it does
its job.  I'm now looking at additional ways to wrap the functionality
of libpqxx.

Are there any similar libraries in existence (either for Postgres or
any other DBMS)?  (I'm looking to pinch some ideas ;-)


If it would be of interest, you are welcome to include this in your
libpqxx library, or as a separate project on GBorg (I'll relicense it
under the standard Postgres licence).


Regards,
Roger

- -- 
Roger Leigh

Printing on GNU/Linux?  http://gimp-print.sourceforge.net/
GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/

iD8DBQE/qWhJVcFcaSW/uEgRAtbyAJ0WhzoNdZn/U4G1xiYsLE3kUoLvPwCfZdVn
7KRMpSqloWFzFnClDRZOsUI=
=E0AU
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-09 Thread Andreas Grabmüller
Hi all,

well, the most of the programming itself is done, what's currently missing is

1) Some fine tuning on the layout
2) Adding the static pages
3) Translating the pages, news, events etc. into german and maybe some other languages 
if we find someone to translate it
4) Creating something that handles old links as the new ones are not compatible
5) Reformatting the PHP code as some parts currently look horrible ;)
6) All the things I have forgotten...

It's currently not easy to work together on the code as it's on my Subversion server 
and nobody else uses Subversion ;) If someone wants to help me on this it would be 
good if we could get some CVS repository on gborg for this...

Well, before doing that we maybe should decide if we combine www and advocacy (and 
maybe even developer?)  into one big site... I would like that Idea, I think I will 
create a little sample page later to see how it could look like :)

Mit freundlichen Grüßen
Andreas Grabmüller

- Original-Nachricht -
Von: Dave Page [EMAIL PROTECTED]
An: Peter Eisentraut [EMAIL PROTECTED], Robert Treat [EMAIL PROTECTED]
CC: Alvaro Herrera [EMAIL PROTECTED], Josh Berkus [EMAIL PROTECTED], [EMAIL 
PROTECTED], [EMAIL PROTECTED]
Datum: Friday, November 07, 2003 09:34 AM
Betreff: [pgsql-www] [HACKERS] Changes to Contributor List

[...]

  How can we help?
 
 Andreas?
 
 Regards, Dave.
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

--
LetzPlay.de
| Freemail:   http://www.letzplay.de/mail
| Forenhosting: http://www.letzplay.de/foren
---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] possible replace() bug - postgres 7.3.1

2003-11-09 Thread Ryan Mahoney
I am running PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96

the following query fails:

SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') ||
\'' || replace(data, '\'', '\'\'') || '\' WHERE foreign_sales_lead_id =
\'' || id || '\';' FROM temp_sales_lead ORDER BY id, row LIMIT 10;

with the error:

ERROR:  repalloc: invalid request size 1073741824

but, the following query (without replace):
SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') ||
\'' || data || '\' WHERE foreign_sales_lead_id = \'' || id || '\';' FROM
temp_sales_lead ORDER BY id, row LIMIT 10;

works just fine, producing output like:

UPDATE pa_sales_lead SET details = COALESCE(details, '') || 'EMPLOY: 50'
WHERE foreign_sales_lead_id = '1004';

any ideas?

-r


On Thu, 2003-11-06 at 12:52, Carlos wrote:
 Hello Forum,
 
 Could someone please send me the link and the names of the files to
 install the version 7.4 in Windows 2000 server and XP Pro?  I cannot
 discern the files that one is supposed to download..
 
 Thanks in advance for your response.
 
 Carlos Oliva
 Senior Programmer/Analyst
 Positive Business Solutions, Inc.
 Cincinnati, OH 45240-1640
 (513) 772 - 2255 ext 129
 [EMAIL PROTECTED]
 Yahoo ID: ramboid_1997
-- 
Ryan Mahoney [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [pgsql-advocacy] Changes to Contributor List

2003-11-09 Thread Ryan Mahoney
I think it is part of the incentive for corporations to contribute - not
just an impressive list for PHB.  It's nice to get the recognition for
their time/money contributions and a good way for the PGDG to show their
appreciation.

-r

On Thu, 2003-11-06 at 14:34, Josh Berkus wrote:
 Peter,
 
   I was discussing specifically the Recognized Corporate Contributors 
 which
   is, AFAIK, strictly a PHB thing, no?
  
  No.
 
 Please explain.
-- 
Ryan Mahoney [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Erroneous PPC spinlock code

2003-11-09 Thread Marcus Meissner
On Thu, Nov 06, 2003 at 12:08:56AM +0100, Reinhard Max wrote:
 On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote:
 
  Peter Eisentraut [EMAIL PROTECTED] writes:
 
   The SuSE PPC guru said that the PPC spinlock code we currently use
   may behave erroneously on multiprocessor systems.
 
  What's his evidence for that claim?
 
 Let's ask himself.
 
  The code we have is based directly on the recommendations in the PPC
  manuals, and has been tested on multi-CPU systems.
 
 Marcus, can you explain the details, please?

I reviewed the documentation again (at:
http://www-1.ibm.com/servers/esdd/articles/powerpc.html
) and it seems to agree with your opinion.

I retract my comment, leave your code as-is.

Ciao, Marcus


pgp0.pgp
Description: PGP signature


[HACKERS] Setting client encoding with jdbc

2003-11-09 Thread Brad
I'm interested in changing the client encoding from within a java
program.  I'm using connection pooling and need to change it depending
on the client's language preference. Is there a function I can call
which changes the client encoding.  Something like

  pStmt = con.prepareStatement(select
pg_set_client_encoding('LATIN5'); );
  rs = pStmt.executeQuery();

I can set the client encoding from within psql by /encoding LATIN5 but
not sure how to execute something like this from java.  Any help would
be greatly appreciated.

Thanks,

Brad

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Very poor estimates from planner

2003-11-09 Thread Rod Taylor
On Thu, 2003-11-06 at 10:35, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  -  Hash Join  (cost=3D1230.79..60581.82 rows=3D158 width=3D54)=
   (actual time=3D1262.35..151200.29 rows=3D1121988 loops=3D1)
  Hash Cond: (outer.account_id =3D inner.account_id)
  -  Hash Join  (cost=3D1226.78..52863.43 rows=3D1542558 w=
  idth=3D50) (actual time=3D1261.63..100418.30 rows=3D1573190 loops=3D1)
  (join of bsod, tsb, tss)
 
 (btw, would you mind turning off MIME encoding in your mails to the PG
 lists?  It's a real PITA to quote.)

I can, though I would ask which email client you use that doesn't pull
content out of mime encoded emails.

  So yes, since this is a full table scan all values will be joined since
  the foreign key enforces them all to exist.
 
 Well, no, because only 1121988 rows come out of the join when 1573190
 went in.  So the actual selectivity of the join is about 70%.  The
 question is why the planner is estimating the selectivity at 0.01%
 (158/1542558).
 
 Could we see the pg_stats rows for service.account_id and
 account.account_id?

 relname |  attname   | stanullfrac | stawidth | stadistinct | stakind1
| stakind2 | stakind3 | stakind4 | staop1 | staop2 | staop3 | staop4
|  
stanumbers1| stanumbers2 |
stanumbers3 | stanumbers4 |  
stavalues1| stavalues2 | stavalues3 |
stavalues4
-++-+--+-+--+--+--+--+++++--+-+-+-+-+++
 service | account_id |   0 |4 |  10 |1
|3 |0 |0 | 96 | 97 |  0 |  0 |
{0.388393,0.0825893,0.078125,0.0758929,0.0703125,0.0647321,0.0647321,0.0636161,0.0625,0.0491071}
 | {0.591672}  | | | 
{1,8221,8223,8226,8222,8218,8220,8219,8224,8225}||   |
 account | account_id |   0 |4 |  -1 |2
|3 |0 |0 | 97 | 97 |  0 |  0
|  
| {0.97034}   | | | 
{1,10178,12136,14099,16054,18011,19966,21924,23881,26018,27995} ||   |
(2 rows)




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Greg Stark
Manfred Spraul [EMAIL PROTECTED] writes:

 Greg Stark wrote:
 
 I'm assuming fsync syncs writes issued by other processes on the same file,
 which isn't necessarily true though.
 
 It was already pointed out that we can't rely on that assumption.
 
 
 So the NetBSD and Sun developers I checked with both asserted fsync does in
 fact guarantee this. And SUSv2 seems to back them up:
 

 At least Linux had one problem: fsync() syncs the inode to disk, but not the
 directory entry: if you rename a file, open it, write to it, fsync, and the
 computer crashes, then it's not guaranteed that the file rename is on the disk.
 I think only the old ext2 is affected, not the journaling filesystems.

That's true. But why would postgres ever have to worry about files being
renamed being synced? Tables and indexes don't get their files renamed
typically. WAL logs?

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [pgsql-www] [HACKERS] Changes to Contributor List

2003-11-09 Thread elein
emeritus is a perfectly good latin word.  No need to
dumb things down.

--elein

On Wed, Nov 05, 2003 at 05:26:29PM -0500, Bruce Momjian wrote:
 Tom Lane wrote:
  Josh Berkus [EMAIL PROTECTED] writes:
   B) What contributors are listed under Major Developers who haven't
   contributed any code since 7.1.0?
  
  I think we had agreed that formerly-listed contributors would not be
  deleted, but would be moved to a new section titled Contributors
  Emeritus or some such.  Please make sure that Tom Lockhart and Vadim
  get listed that way, at least.
 
 
 I think the Emeritus word might be too hard for non-native English
 speakers, and even for less educated English speakers.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Performance features the 4th

2003-11-09 Thread Stephen
Yes, I would like to see the vacuum delay patch go into 7.4.1 if possible.
It's really useful. I don't think there is any major risk in adding the
delay patch into a minor revision given the small amount of code change.

Stephen


Matthew T. O'Connor [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Tom Lane wrote:

 Jan Wieck [EMAIL PROTECTED] writes:
 
 
 As a matter of fact, people who have performance problems are likely to
 be the same who have upgrade problems. And as Gaetano pointed out
 correctly, we will see wildforms with one or the other feature applied.
 
 
 
 I'd believe that for patches of the size of my original VACUUM-delay
 hack (or even a production-grade version of same, which'd probably be
 10x larger).  The kind of wholesale rewrite you are currently proposing
 is much too large to consider folding back into 7.4.*, IMHO.
 
 
 Do people think that the VACUUM-delay patch by itself, would be usefully
 enough on it's own to consider working it into 7.4.1 or something?  From
 the little feedback I have read on the VACUUM-delay patch used in
 isolation, it certainly does help.  I would love to see it put into 7.4
 somehow.

 The far more rigorous changes that Jan is working on, will be welcome
 improvements for 7.5.


 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Dreaming About Redesigning SQL

2003-11-09 Thread Anthony W. Youngman
In article [EMAIL PROTECTED], Lauri Pietarinen
[EMAIL PROTECTED] writes
Anthony W. Youngman wrote:
In article [EMAIL PROTECTED], Lauri Pietarinen lauri.pie
[EMAIL PROTECTED] writes
Anthony W. Youngman wrote:
In article [EMAIL PROTECTED], Lauri Pietarinen
[EMAIL PROTECTED] writes

Well, if it is normalised, how easy is it for you to change the 
customer_id of an order?  Anyway,

Incredibly easy. Just update the customer_id field of the invoice
record. A single change to a single row

And I presume the system will automatically move all related stuff 
(order details etc.) into
the same block as the new customer?  How long will that take? What if 
there is no room for it there?

Well, I'd view an order as an entity. As such, I would give it its own
FILE, and your question doesn't make sense. 

But then your formula for disk head movements does not make sense either!

Why not? The order is a real-world thing, and as such I would have an
ORDERS file, in which each order is a single entry, with customer_id
as one of its attributes. order detail is an attribute of order, so
if I change customer_id it's the relational equivalent of just
changing one cell in one row. The chances of me having to move the
record is pretty near nil, and if I do it won't change bucket so at most
it involves two frames (or disk blocks, if that's what you want to call
them).

But if the system did move
the stuff, it would be four disk accesses - read/write to delete the old
entry, read/write to save the new. As for enough room - well - it'll
fall over if we have a disk full (or it might not).

Not enough room here means not enought room in the block of the 
customer (from which you
were supposed to get all data in one read, or disk head movement).  That 
would mean that your
order information would be moved perhaps to another block and result in 
an extra head movement,
or am I right?

Which I've taken in to account - if there isn't enough room in the
original bucket, I need to either overflow into the next bucket which
might exist, or to create it if it doesn't. Ie two head movements to
delete from the first bucket, and two head movements to add to the
second.

And it will only fall over if I need to create a new bucket and there's
no space left on the disk (or if (and this is very unlikely in this
scenario) it triggers a split which again needs space and there's none
left on disk).

Or have you not sussed that we view order detail as an attribute of
order (which is therefore stored as part of the same thing), but
customer is separate from order, is stored separately, and is linked
by a relationship. (Whereas order detail is NOT related to order,
because they are part of the same thing :-)


Well, it does result in data being stored multiple times ;-)

What on earth is wrong with that?  Do you know how much 160GB of disk 
cost's today?
I could ask:  does your system work in, say 4KB?  That's how much memory 
the first
computer  I used (a Wang 2000)  had.  Probably it would not work at 
all.  In the 50's
they did amazing things with hardly any compilers and very little 
memory. I am referring
to Whirlwind. See http://www.cedmagic.com/history/whirlwind-computer.html.
Could you have done that with MV?  My point?  Why are we discussing 
restrictions
to memory and CPU speed of the 70's and 80's?  If an SQL DBMS uses more 
memory
and disk, and it is available, why complain about *that*.  Im not 
impying that you
cannot complain about other matters, e.g. ease of development etc. and 
you might
even be right.  Be it as it is,  I am not trying to make you abandon 
your MV database.

As always, you're relying on hardware to help :-) You know what I think
of that :-) And 160Gb of disk is only cheap if you're using IDE on a
desktop PC - it costs a hell of a lot more for a laptop or SCSI for a
server. And if it's embedded it maybe that the *room* is expensive, not
the capacity ...

And: what if I was just reading customer-data.  Would the same formula
apply (= (2+N)*ST*1.05)?
  
Nope. If I understand you correctly, you want attributes that belong to
the entity customer, not the entity invoice. T = ST * 1.05. (By the
way, billing and/or invoice address (for example) are invoice
attributes, not company attributes.)

No,  I want you to give me a list of all your customers.  How many disk 
reads?

T = N * 1.05 where N is the number of customers. What do you want to
know about those customers? Address? Phone number*s*? Anything else?
That's *all* at no extra cost.

Well, no thanks.  I just wanted their names this time.
The relational alternative, with an index on customer_name, would be 
again an order
of magnitune less disk reads.

Well, if you let me use an index here, I'm sorry, GAME OVER! The best
you can do would be a photo finish.

Assuming an overhead of, say, 4 bytes per index entry, the entire index
would be

Size = 4 * N + sigma(name_length) + sigma(key_length)

Okay, I've probably got some padding there as well, but so will you. And
note I 

Re: [HACKERS] Erroneous PPC spinlock code

2003-11-09 Thread Reinhard Max
On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote:

 Peter Eisentraut [EMAIL PROTECTED] writes:

  The SuSE PPC guru said that the PPC spinlock code we currently use
  may behave erroneously on multiprocessor systems.

 What's his evidence for that claim?

Let's ask himself.

 The code we have is based directly on the recommendations in the PPC
 manuals, and has been tested on multi-CPU systems.

Marcus, can you explain the details, please?


cu
Reinhard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 You want to find, open, and fsync() every file in the database cluster
 for every checkpoint?  Sounds like a non-starter to me.

 Except a) this is outside any critical path, and b) only done every few
 minutes and c) the fsync calls on files with no dirty buffers ought to be
 cheap, at least as far as i/o.

The directory search and opening of the files is in itself nontrivial
overhead ... particularly on systems where open(2) isn't speedy, such
as Solaris.  I also disbelieve your assumption that fsync'ing a file
that doesn't need it will be free.  That depends entirely on what sort
of indexes the OS keeps on its buffer cache.  There are Unixen where
fsync requires a scan through the entire buffer cache because there is
no data structure that permits finding associated buffers any more
efficiently than that.  (IIRC, the HPUX system I'm typing this on is
like that.)  On those sorts of systems, we'd be way better off to use
O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs.
Check the archives --- this was all gone into in great detail when we
were testing alternative methods for fsyncing the WAL files.

 So the NetBSD and Sun developers I checked with both asserted fsync does in
 fact guarantee this. And SUSv2 seems to back them up:

 The fsync() function can be used by an application to indicate that all
 data for the open file description named by fildes is to be transferred to
 the storage device associated with the file described by fildes in an
 implementation-dependent manner.

The question here is what is meant by data for the open file
description.  If it said all data for the file referenced by the open
FD then I would agree that the spec says what you claim.  As is, I
think it would be entirely within the spec for the OS to dump only
buffers that had been dirtied through that particular FD.  Notice that
the last part of the sentence is careful to respect the distinction
between the FD and the file; why isn't the first part?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Darwin Startup Script Patch

2003-11-09 Thread David Wheeler
Clearly I can't spell.

Regards,

David

--- postgresql-7.4RC1/contrib/start-scripts/PostgreSQL.darwin   Mon Dec 
 9 13:26:07 2002
+++ /Users/david/Desktop/PostgreSQL.darwin  Wed Nov  5 17:53:23 2003
@@ -18,9 +18,9 @@
 #
 # POSTGRESQLSERVER=-YES-
 #
-# The startup bundle will now be read to go. To prevent this script 
from
+# The startup bundle will now be ready to go. To prevent this script 
from
 # starting PostgreSQL at system startup, simply change that line in
-# /etc/hostconfig to:
+# /etc/hostconfig back to:
 #
 # POSTGRESQLSERVER=-NO-
 #

--
David Wheeler AIM: dwTheory
[EMAIL PROTECTED]  ICQ: 15726394
http://www.kineticode.com/ Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]
Kineticode. Setting knowledge in motion.[sm]
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] BTree index

2003-11-09 Thread vjanand

I am trying to find information regarding creation of B-tree index in postgres 
for variable length character data (Char/varchar type). Specifically, what 
pagination policy is used, does it use prefix BTree, or any other form of 
compression (encoding)? 


Regards,
VJ Anand



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] UPPER()/LOWER() and UTF-8

2003-11-09 Thread Alexey Mahotkin
 TL == Tom Lane [EMAIL PROTECTED] writes:

TL writes: upper/lower aren't
TL going to work desirably in any multi-byte character set
TL encoding.

 Can you please point me at their implementation?  I do not
 understand why that's impossible.

TL Because they use ctype.h's toupper() and tolower()
TL functions, which only work on single-byte characters.

Aha, that's in src/backend/utils/adt/formatting.c, right?

Yes, I see, it goes byte by byte and uses toupper().  I believe we
could look at the locale, and if it is UTF-8, then use (or copy)
e.g. g_utf8_strup/strdown, right?

 
http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup

I belive that patch could be written in a matter of hours.


TL There has been some discussion of using wctype.h where
TL available, but this has a number of issues, notably figuring
TL out the correct mapping from the server string encoding (eg
TL UTF-8) to unpacked wide characters.  At minimum we'd need to
TL know which charset the locale setting is expecting, and there
TL doesn't seem to be a portable way to find that out.

TL IIRC, Peter thinks we must abandon use of libc's locale
TL functionality altogether and write our own locale layer before
TL we can really have all the locale-specific functionality we
TL want.

I believe that native Unicode strings (together with human language
handling) should be introduced as (almost) separate data type (which
have nothing to do with locale), but that's bluesky maybe.

--alexm

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Manfred Spraul
Greg Stark wrote:

I'm assuming fsync syncs writes issued by other processes on the same file,
which isn't necessarily true though.
 

It was already pointed out that we can't rely on that assumption.
   

So the NetBSD and Sun developers I checked with both asserted fsync does in
fact guarantee this. And SUSv2 seems to back them up:
 

At least Linux had one problem: fsync() syncs the inode to disk, but not 
the directory entry: if you rename a file, open it, write to it, fsync, 
and the computer crashes, then it's not guaranteed that the file rename 
is on the disk.
I think only the old ext2 is affected, not the journaling filesystems.

--
   Manfred
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Stephen
The delay patch worked so well, I couldn't resist asking if a similar patch
could be added for COPY command (pg_dump). It's just an extension of the
same idea. On a large DB, backups can take very long while consuming a lot
of IO slowing down other select and write operations. We operate on a backup
window during low traffic period at night. It'll be nice to be able to run
pg_dump *anytime* and no longer need to worry about the backup window.
Backups will take longer to run, but like in the case of the VACUUM, it's a
win for many people to be able to let it run in the background through the
whole day. The delay should be optional and defaults to zero so those who
wish to backup immediately can still do it. The way I see it, routine
backups and vacuums should be ubiquitous once properly configured.

Regards,

Stephen


Tom Lane [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Jan Wieck [EMAIL PROTECTED] writes:
  I am currently looking at implementing ARC as a replacement strategy. I
  don't have anything that works yet, so I can't really tell what the
  result would be and it might turn out that we want both features.

 It's likely that we would.  As someone (you?) already pointed out,
 VACUUM has bad side-effects both in terms of cache flushing and in
 terms of sheer I/O load.  Those effects require different fixes AFAICS.

 One thing that bothers me here is that I don't see how adjusting our
 own buffer replacement strategy is going to do much of anything when
 we cannot control the kernel's buffer replacement strategy.  To get any
 real traction we'd have to go back to the take over most of RAM for
 shared buffers approach, which we already know to have a bunch of
 severe disadvantages.

 regards, tom lane

 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Jan Wieck
Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

How I can see the background writer operating is that he's keeping the 
buffers in the order of the LRU chain(s) clean, because those are the 
buffers that most likely get replaced soon. In my experimental ARC code 
it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
been involved in that, nap depending on where he got down the queues (to 
increase the write rate when running low on clean buffers), and do it 
all over again.
You probably need one more knob here: how often to issue the fsyncs.
I'm not convinced once per outer loop is a sufficient answer.
Otherwise this is sounding pretty good.
This is definitely heading into the right direction.

I currently have a crude and ugly hacked system, that does checkpoints 
every minute but streches them out over the whole time. It writes out 
the dirty buffers in T1+T2 LRU order intermixed, streches out the flush 
over the whole checkpoint interval and does sync()+usleep() every 32 
blocks (if it has time to do this).

This is clearly the wrong way to implement it, but ...

The same system has ARC and delayed vacuum. With normal, unmodified 
checkpoints every 300 seconds, the transaction responsetime for 
new_order still peaks at over 30 seconds (5 is already too much) so the 
system basically come to a freeze during a checkpoint.

Now with this high-frequent sync()ing and checkpointing by the minute, 
the entire system load levels out really nice. Basically it's constantly 
checkpointing. So maybe the thing we're looking for is to make the 
checkpoint process the background buffer writer process and let it 
checkpoint 'round the clock. Of course, with a bit more selectivity on 
what to fsync and not doing system wide sync() every 10-500 milliseconds :-)

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] UPPER()/LOWER() and UTF-8

2003-11-09 Thread Alexey Mahotkin
 TL == Tom Lane [EMAIL PROTECTED] writes:

TL Alexey Mahotkin [EMAIL PROTECTED] writes:
 I'm running Postgresql 7.3.4 with ru_RU.UTF-8 locale (with
 UNICODE database encoding), and all is almost well, except that
 UPPER() and LOWER() seem to ignore locale.

TL upper/lower aren't going to work desirably in any multi-byte
TL character set encoding.  

Can you please point me at their implementation?  I do not understand
why that's impossible.

TL I think Peter E. is looking into what
TL it would take to fix this for 7.5, but at present you are
TL going to need to use a single-byte encoding within the server.
TL (Nothing to stop you from using UTF-8 on the client side
TL though.)


Thanks,

--alexm

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] NetBSD/acorn32

2003-11-09 Thread Patrick Welche
I may have missed 1.4:

PostgreSQL 7.5devel on arm-unknown-netbsdelf1.6ZE, compiled by GCC gcc (GCC) 3.3.2-nb1

but all tests work there. I didn't manage to compile bison 1.875 under
NetBSD-1.6P/acorn32, as gcc 2.95.3 really didn't like it. Upgrading to
NetBSD-1.6ZE/acorn32 upgraded to gcc 3.3.2, so let me build the bison.
(bison 1.75 was fine for both) This doesn't say anything about postgresql
though, and I'm guessing that postgresql itself would be fine with the
older NetBSD/acorn32...

Cheers,

Patrick

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] NetBSD/acorn32

2003-11-09 Thread Peter Eisentraut
Patrick Welche writes:

 PostgreSQL 7.5devel on arm-unknown-netbsdelf1.6ZE, compiled by GCC gcc (GCC) 
 3.3.2-nb1

Can you test 7.4?

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] possible replace() bug - postgres 7.3.1

2003-11-09 Thread Hannu Krosing
Ryan Mahoney kirjutas N, 06.11.2003 kell 23:03:
 I am running PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96
 
 the following query fails:
 
 SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') ||
 \'' || replace(data, '\'', '\'\'') || '\' WHERE foreign_sales_lead_id =
 \'' || id || '\';' FROM temp_sales_lead ORDER BY id, row LIMIT 10;
 
 with the error:
 
 ERROR:  repalloc: invalid request size 1073741824
 
 but, the following query (without replace):
 SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') ||
 \'' || data || '\' WHERE foreign_sales_lead_id = \'' || id || '\';' FROM
 temp_sales_lead ORDER BY id, row LIMIT 10;
 
 works just fine, producing output like:
 
 UPDATE pa_sales_lead SET details = COALESCE(details, '') || 'EMPLOY: 50'
 WHERE foreign_sales_lead_id = '1004';
 
 any ideas?

what does just

SELECT replace(data, '\'', '\'\'')
  FROM temp_sales_lead
 ORDER BY id, row LIMIT 10; 

produce ?


Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] UPPER()/LOWER() and UTF-8

2003-11-09 Thread Hannu Krosing
Alexey Mahotkin kirjutas K, 05.11.2003 kell 17:11:
 Aha, that's in src/backend/utils/adt/formatting.c, right?
 
 Yes, I see, it goes byte by byte and uses toupper().  I believe we
 could look at the locale, and if it is UTF-8, then use (or copy)
 e.g. g_utf8_strup/strdown, right?
 
  
 http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup
 
 I belive that patch could be written in a matter of hours.
 
 
 TL There has been some discussion of using wctype.h where
 TL available, but this has a number of issues, notably figuring
 TL out the correct mapping from the server string encoding (eg
 TL UTF-8) to unpacked wide characters.  At minimum we'd need to
 TL know which charset the locale setting is expecting, and there
 TL doesn't seem to be a portable way to find that out.
 
 TL IIRC, Peter thinks we must abandon use of libc's locale
 TL functionality altogether and write our own locale layer before
 TL we can really have all the locale-specific functionality we
 TL want.
 
 I believe that native Unicode strings (together with human language
 handling) should be introduced as (almost) separate data type (which
 have nothing to do with locale), but that's bluesky maybe.

They should have nothing to do with _system_ locale, but you can
neither  UPPER()/LOWER() nor ORDER BY unless you know the locale. It is
just that the locale should either be property of column or given in the
SQL statement.

I guess one could write UCHAR, UVARCHAR, UTEXT types based on ICU.

-
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] BTree index

2003-11-09 Thread Alvaro Herrera
On Wed, Nov 05, 2003 at 09:08:31AM -0600, [EMAIL PROTECTED] wrote:

 I am trying to find information regarding creation of B-tree index in postgres 
 for variable length character data (Char/varchar type). Specifically, what 
 pagination policy is used, does it use prefix BTree, or any other form of 
 compression (encoding)? 

I was very surprised while writing this answer:

The whole key is stored (no prefix, no pagination).  If the key is too
big, it won't fit into the index and the insertion will be rejected:

regression=# create table test (a text);
CREATE TABLE
regression=# create index test_idx on test(a);
CREATE INDEX
regression=# insert into test values ('hello world');
INSERT 17115 1
regression=# insert into test select repeat(a,10) from test;
INSERT 17116 1
regression=# insert into test select repeat(a,10) from test;
INSERT 0 2
regression=# insert into test select repeat(a,10) from test;
INSERT 0 4
regression=# insert into test select repeat(a,10) from test;
INSERT 0 8
regression=# insert into test select repeat(a,10) from test;
ERROR:  fila de índice requiere 12624 bytes, tamaño máximo es 8191
-- oops
regression=# set lc_messages to 'C';
SET
regression=# insert into test select repeat(a,10) from test;
ERROR:  index row requires 12624 bytes, maximum size is 8191

So, what size were the tuples inserted:
regression=# select max(length(a)) from test;
  max   

 11
(1 fila)

What!? 11 bytes?  I have always had the idea that the tuples were
uncompressed, so how can 11 bytes be stored in 8191 bytes?  After
tracking into the sourcecode, I found that in
src/backend/access/common/indextuple.c the index_formtuple routine seems
to compress the key before insertion.  In src/include/tuptoaster.h there
is a symbol for activation of this feature that is set at least on my
sources (TOAST_INDEX_HACK).

So, there you are: the compression used is the same lousy fast LZ
algorithm used elsewhere in the TOAST code (toast_compress_datum()).

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
FOO MANE PADME HUM

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] OSR5: Passes without the setsockopt() calls...

2003-11-09 Thread Larry Rosenman
Can I get a comment from someone on this please...

LER

--On Saturday, November 08, 2003 18:03:45 -0600 Larry Rosenman 
[EMAIL PROTECTED] wrote:

From my SCO contacts re: the failure on OSR5:


I haven't had time and won't have time to fully diagnose this until
after i get back from vacation. In the mean time, i am attaching a
patch which gets OpenServer 5.0.7 to pass 92 of 93 tests with one
test ignored. Basically, this patch just says if you're on OpenServer
then don't do the setsockopt calls. Not very satisfactory as setsockopt
should work on OSR5.
I know this isn't an acceptable patch, and I'm not sure when my contact
is back from Vacation.
Do we want to hold the release?

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


pgp0.pgp
Description: PGP signature


Re: [HACKERS] possible replace() bug - postgres 7.3.1

2003-11-09 Thread Joe Conway
Hannu Krosing wrote:
Ryan Mahoney kirjutas N, 06.11.2003 kell 23:03:
I am running PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96
what does just

SELECT replace(data, '\'', '\'\'')
  FROM temp_sales_lead
 ORDER BY id, row LIMIT 10; 

produce ?
There was a bug in replace(), fixed by Tom in 7.3.3 I think (for sure by 
7.3.4). If you have any embedded % characters in data, it would 
trigger the bug. Try upgrading to 7.3.4.

HTH,

Joe



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] OSR5: Passes without the setsockopt() calls...

2003-11-09 Thread Marc G. Fournier


On Sun, 9 Nov 2003, Larry Rosenman wrote:

 Can I get a comment from someone on this please...

no, the release will not be held up



 LER


 --On Saturday, November 08, 2003 18:03:45 -0600 Larry Rosenman
 [EMAIL PROTECTED] wrote:

  From my SCO contacts re: the failure on OSR5:
 
 
  I haven't had time and won't have time to fully diagnose this until
  after i get back from vacation. In the mean time, i am attaching a
  patch which gets OpenServer 5.0.7 to pass 92 of 93 tests with one
  test ignored. Basically, this patch just says if you're on OpenServer
  then don't do the setsockopt calls. Not very satisfactory as setsockopt
  should work on OSR5.
 
 
  I know this isn't an acceptable patch, and I'm not sure when my contact
  is back from Vacation.
 
  Do we want to hold the release?
 
  LER
 
  --
  Larry Rosenman http://www.lerctr.org/~ler
  Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
  US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



 --
 Larry Rosenman http://www.lerctr.org/~ler
 Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
 US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Performance features the 4th

2003-11-09 Thread Gaetano Mendola
Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

However, I have not seen much evidence yet that the vacuum delay alone 
does that much.


Gaetano and a couple of other people did experiments that seemed to show
it was useful.  I think we'd want to change the shape of the knob per
later suggestions (sleep 10 ms every N blocks, instead of N ms every
block) but it did seem that there was useful bang for little buck there.
Right, I'd like to try know the patch:  sleep N ms every M blocks.
Can you please post this patch ?
BTW, I'll see if I'm able to apply it also to a 7.3.X ( our production
DB).
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Performance features the 4th

2003-11-09 Thread Jan Wieck
scott.marlowe wrote:

On Fri, 7 Nov 2003, Matthew T. O'Connor wrote:

- Original Message - 
From: Jan Wieck [EMAIL PROTECTED]
 Tom Lane wrote:
  Gaetano and a couple of other people did experiments that seemed to show
  it was useful.  I think we'd want to change the shape of the knob per
  later suggestions (sleep 10 ms every N blocks, instead of N ms every
  block) but it did seem that there was useful bang for little buck there.

 I thought it was sleep N ms every M blocks.

 Have we seen any numbers? Anything at all? Something that gives us a
 clue by what factor one has to multiply the total time a VACUUM
 ANALYZE takes, to get what effect in return?

I have some time on sunday to do some testing.  Is there a patch that I can
apply that implements either of the two options? (sleep 10ms every M blocks
or sleep N ms every M blocks).
I know Tom posted the original patch that sleept N ms every 1 block (where N
is  10 due to OS limitations).  Jan can you post a patch that has just the
sleep code in it? Or should it be easy enough for me to cull out of the
larger patch you posted?
The reason for the change is that the minumum sleep period on many systems 
is 10mS, which meant that vacuum was running 20X slower than normal.  
While it might be necessary in certain very I/O starved situations to make 
it this slow, it would probably be better to be able to get a vacuum that 
ran at about 1/2 to 1/5 speed for most folks.  So, since the delta can't 
less than 10mS on most systems, it's better to just leave it at a fixed 
amount and change the number of pages vacuumed per sleep.
I disagree with that. If you limit yourself to the number of pages being 
the only knob you have and set the napping time fixed, you can only 
lower the number of sequentially read pages to slow it down. Making read 
ahead absurd in an IO starved situation ...

I'll post a patch doing

every N pages nap for M milliseconds

using two GUC variables and based on a select(2) call later.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Performance features the 4th

2003-11-09 Thread Jan Wieck
Matthew T. O'Connor wrote:

- Original Message - 
From: Jan Wieck [EMAIL PROTECTED]
Tom Lane wrote:
 Gaetano and a couple of other people did experiments that seemed to show
 it was useful.  I think we'd want to change the shape of the knob per
 later suggestions (sleep 10 ms every N blocks, instead of N ms every
 block) but it did seem that there was useful bang for little buck there.
I thought it was sleep N ms every M blocks.

Have we seen any numbers? Anything at all? Something that gives us a
clue by what factor one has to multiply the total time a VACUUM
ANALYZE takes, to get what effect in return?
I have some time on sunday to do some testing.  Is there a patch that I can
apply that implements either of the two options? (sleep 10ms every M blocks
or sleep N ms every M blocks).
I know Tom posted the original patch that sleept N ms every 1 block (where N
is  10 due to OS limitations).  Jan can you post a patch that has just the
sleep code in it? Or should it be easy enough for me to cull out of the
larger patch you posted?
Sorry for the delay, had to finish some other concept yesterday (will be 
published soon).

The attached patch adds

vacuum_group_delay_size = 10 (range 1-1000)
vacuum_group_delay_msec = 0  (range 0-1000)
and does the sleeping via select(2). It does it only at the same places 
where Tom had done the usleep() in his hack, so I guess there is still 
some more to do besides the documentation, before it can be added to 
7.4.1. But it should be enough to get some testing done.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
Index: src/backend/access/nbtree/nbtree.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.106
diff -c -b -r1.106 nbtree.c
*** src/backend/access/nbtree/nbtree.c  2003/09/29 23:40:26 1.106
--- src/backend/access/nbtree/nbtree.c  2003/11/09 23:39:36
***
*** 27,32 
--- 27,40 
  #include storage/smgr.h
  
  
+ /*
+  * Variables for vacuum_group_delay option (in commands/vacuumlazy.c)
+  */
+ extern intvacuum_group_delay_size;/* vacuum N pages */
+ extern intvacuum_group_delay_msec;/* then sleep M msec */
+ extern intvacuum_group_delay_count;
+ 
+ 
  /* Working state for btbuild and its callback */
  typedef struct
  {
***
*** 610,615 
--- 618,632 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_group_delay_msec  0)
+   {
+   if (++vacuum_group_delay_count = 
vacuum_group_delay_size)
+   {
+   PG_DELAY(vacuum_group_delay_msec);
+   vacuum_group_delay_count = 0;
+   }
+   }
+ 
ndeletable = 0;
page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
***
*** 736,741 
--- 753,769 
Buffer  buf;
Pagepage;
BTPageOpaque opaque;
+ 
+   CHECK_FOR_INTERRUPTS();
+ 
+   if (vacuum_group_delay_msec  0)
+   {
+   if (++vacuum_group_delay_count = vacuum_group_delay_size)
+   {
+   PG_DELAY(vacuum_group_delay_msec);
+   vacuum_group_delay_count = 0;
+   }
+   }
  
buf = _bt_getbuf(rel, blkno, BT_READ);
page = BufferGetPage(buf);
Index: src/backend/commands/vacuumlazy.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.32
diff -c -b -r1.32 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c   2003/09/25 06:57:59 1.32
--- src/backend/commands/vacuumlazy.c   2003/11/09 23:40:13
***
*** 88,93 
--- 88,100 
  static TransactionId OldestXmin;
  static TransactionId FreezeLimit;
  
+ /*
+  * Variables for vacuum_group_delay option (in commands/vacuumlazy.c)
+  */
+ int   vacuum_group_delay_size = 10;   /* vacuum N pages */
+ int   vacuum_group_delay_msec = 0;/* then sleep M msec */
+ int   vacuum_group_delay_count = 0;
+ 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
***
*** 228,233 
--- 235,249 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_group_delay_msec  0)
+   {
+

Re: [HACKERS] Coding help

2003-11-09 Thread Christopher Kings-Lynne

| COMMENT ON LARGE OBJECT NumericOnly IS comment_text
 n-objname = makeList1(makeAConst($5));


Forget the makeAConst step --- it's just wasted cycles.  In the
execution routine, you can use code comparable to define.c's defGetInt64()
to convert the Value node into a numeric OID, ie, either do intVal() or
a call of oidin().
I thought the whole problem here is that OIDs are unsigned ints, hence 
intVal() won't allow the highest OIDs?  If that's not the case, then the 
 easy solution is to just use an IConst or whatever it's called. (Which 
I had working before).

Chris

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Coding help

2003-11-09 Thread Christopher Kings-Lynne
I thought the whole problem here is that OIDs are unsigned ints, hence 
intVal() won't allow the highest OIDs?


Exactly.  That's why you need to handle T_Float too.  See the int8
example, which has just the same problem.
It occurs to me then that I could just then use FloatOnly in the grammar 
and   then just attempt to coerce that to an Oid using oidin?  Floats 
are stored as strings, so that'd be OK, and the oidin function would 
barf if it was an illegal uint?

Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Coding help

2003-11-09 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 It occurs to me then that I could just then use FloatOnly in the grammar 

I don't think so, unless you are proposing something very odd in the
lexer to force it to treat an integer literal as a float in this one
context.

What is the problem with following the lead of the existing code?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Setting client encoding with jdbc

2003-11-09 Thread Kris Jurka


On 7 Nov 2003, Brad wrote:

 I'm interested in changing the client encoding from within a java
 program.  I'm using connection pooling and need to change it depending
 on the client's language preference. Is there a function I can call
 which changes the client encoding.  Something like

   pStmt = con.prepareStatement(select
 pg_set_client_encoding('LATIN5'); );
   rs = pStmt.executeQuery();

 I can set the client encoding from within psql by /encoding LATIN5 but
 not sure how to execute something like this from java.  Any help would
 be greatly appreciated.


Please direct JDBC questions to the pgsql-jdbc list.

You may not set the client encoding from JDBC.  The JDBC driver requires a
unicode client encoding to work properly.  You may then reencode the data
in your charset of preference using the standard java APIs.

Kris Jurka


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Bruce Momjian
Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  You want to find, open, and fsync() every file in the database cluster
  for every checkpoint?  Sounds like a non-starter to me.
 
  Except a) this is outside any critical path, and b) only done every few
  minutes and c) the fsync calls on files with no dirty buffers ought to be
  cheap, at least as far as i/o.
 
 The directory search and opening of the files is in itself nontrivial
 overhead ... particularly on systems where open(2) isn't speedy, such
 as Solaris.  I also disbelieve your assumption that fsync'ing a file
 that doesn't need it will be free.  That depends entirely on what sort
 of indexes the OS keeps on its buffer cache.  There are Unixen where
 fsync requires a scan through the entire buffer cache because there is
 no data structure that permits finding associated buffers any more
 efficiently than that.  (IIRC, the HPUX system I'm typing this on is
 like that.)  On those sorts of systems, we'd be way better off to use
 O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs.
 Check the archives --- this was all gone into in great detail when we
 were testing alternative methods for fsyncing the WAL files.

Not sure on this one --- let's look at our options
O_SYNC
fsync
sync

Now, O_SYNC is going to force every write to the disk.  If we have a
transaction that has to write lots of buffers (has to write them to
reuse the shared buffer), it will have to wait for every buffer to hit
disk before the write returns --- this seems terrible to me and gives
the drive no way to group adjacent writes.  Even on HPUX, which has poor
fsync dirty buffer detection, if the fsync is outside the main
processing loop (checkpoint process), isn't fsync better than O_SYNC? 
Now, if we are sure that writes will happen only in the checkpoint
process, O_SYNC would be OK, I guess, but will we ever be sure of that?
I can't imagine a checkpoint process keeping up with lots of active
backends, especially if the writes use O_SYNC.  The problem is that
instead of having backend write everything to kernel buffers, we are all
of a sudden forcing all writes of dirty buffers to disk.  sync() starts
to look very attractive compared to that option.

fsync is better in that we can force it after a number of writes, and
can delay it, so we can write a buffer and reuse it, then later issue
the fsync.  That is a win, though it doesn't allow the drive to group
adjacent writes in different files.  Sync of course allows grouping of
all writes by the drive, but writes all non-PostgreSQL dirty buffers
too.  Ideally, we would have an fsync() where we could pass it a list of
our files and it would do all of them optimally.

From what I have heard so far, sync() still seems like the most
efficient method.  I know it only schedules write, but with a sleep
after it, it seems like maybe the best bet.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Bruce Momjian
Tom Lane wrote:
 Jan Wieck [EMAIL PROTECTED] writes:
  What still needs to be addressed is the IO storm cause by checkpoints. I 
  see it much relaxed when stretching out the BufferSync() over most of 
  the time until the next one should occur. But the kernel sync at it's 
  end still pushes the system hard against the wall.
 
 I have never been happy with the fact that we use sync(2) at all.  Quite
 aside from the I/O storm issue, sync() is really an unsafe way to do a
 checkpoint, because there is no way to be certain when it is done.  And
 on top of that, it does too much, because it forces syncing of files
 unrelated to Postgres.
 
 I would like to see us go over to fsync, or some other technique that
 gives more certainty about when the write has occurred.  There might be
 some scope that way to allow stretching out the I/O, too.
 
 The main problem with this is knowing which files need to be fsync'd.
 The only idea I have come up with is to move all buffer write operations
 into a background writer process, which could easily keep track of
 every file it's written into since the last checkpoint.  This could cause
 problems though if a backend wants to acquire a free buffer and there's
 none to be had --- do we want it to wait for the background process to
 do something?  We could possibly say that backends may write dirty
 buffers for themselves, but only if they fsync them immediately.  As
 long as this path is seldom taken, the extra fsyncs shouldn't be a big
 performance problem.
 
 Actually, once you build it this way, you could make all writes
 synchronous (open the files O_SYNC) so that there is never any need for
 explicit fsync at checkpoint time.  The background writer process would
 be the one incurring the wait in most cases, and that's just fine.  In
 this way you could directly control the rate at which writes are issued,
 and there's no I/O storm at all.  (fsync could still cause an I/O storm
 if there's lots of pending writes in a single file.)

This outlines the same issue --- a very active backend might dirty 5k
buffers --- if those 5k buffers have to be written using O_SYNC, it will
take much longer than doing 5k buffer writes and doing an fsync() or
sync() at the end.

Having another process do the writing does allow some paralellism, but
people don't seem to care of buffers having to be read in from the
kernel buffer cache, so what big benefit do we get by having someone
else write into the kernel buffer cache, except allowing a central place
to fsync, and is it worth it considering that it might be impossible to
configure a system where the writer process can keep up with all the
backends?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Bruce Momjian
scott.marlowe wrote:
 On Tue, 4 Nov 2003, Tom Lane wrote:
 
  Jan Wieck [EMAIL PROTECTED] writes:
   What still needs to be addressed is the IO storm cause by checkpoints. I 
   see it much relaxed when stretching out the BufferSync() over most of 
   the time until the next one should occur. But the kernel sync at it's 
   end still pushes the system hard against the wall.
  
  I have never been happy with the fact that we use sync(2) at all.  Quite
  aside from the I/O storm issue, sync() is really an unsafe way to do a
  checkpoint, because there is no way to be certain when it is done.  And
  on top of that, it does too much, because it forces syncing of files
  unrelated to Postgres.
  
  I would like to see us go over to fsync, or some other technique that
  gives more certainty about when the write has occurred.  There might be
  some scope that way to allow stretching out the I/O, too.
  
  The main problem with this is knowing which files need to be fsync'd.
 
 Wasn't this a problem that the win32 port had to solve by keeping a list 
 of all files that need fsyncing since Windows doesn't do sync() in the 
 classical sense?  If so, then could we use that code to keep track of the 
 files that need fsyncing?

Yes, I have that code from SRA.  They used threading, so they recorded
all the open files in local memory and opened/fsync/closed them for
checkpoints.  We have to store the file names in a shared area, perhaps
an area of shared memory with an overflow to a disk file.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Bruce Momjian

I would be interested to know if you have the background write process
writing old dirty buffers to kernel buffers continually if the sync()
load is diminished.  What this does is to push more dirty buffers into
the kernel cache in hopes the OS will write those buffers on its own
before the checkpoint does its write/sync work.  This might allow us to
reduce sync() load while preventing the need for O_SYNC/fsync().

Perhaps sync() is bad partly because the checkpoint runs through all the
dirty shared buffers and writes them all to the kernel and then issues
sync() almost guaranteeing a flood of writes to the disk.  This method
would find fewer dirty buffers in the shared buffer cache, and therefore
fewer kernel writes needed by sync().

---

Jan Wieck wrote:
 Tom Lane wrote:
 
  Jan Wieck [EMAIL PROTECTED] writes:
  
  How I can see the background writer operating is that he's keeping the 
  buffers in the order of the LRU chain(s) clean, because those are the 
  buffers that most likely get replaced soon. In my experimental ARC code 
  it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
  n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
  been involved in that, nap depending on where he got down the queues (to 
  increase the write rate when running low on clean buffers), and do it 
  all over again.
  
  You probably need one more knob here: how often to issue the fsyncs.
  I'm not convinced once per outer loop is a sufficient answer.
  Otherwise this is sounding pretty good.
 
 This is definitely heading into the right direction.
 
 I currently have a crude and ugly hacked system, that does checkpoints 
 every minute but streches them out over the whole time. It writes out 
 the dirty buffers in T1+T2 LRU order intermixed, streches out the flush 
 over the whole checkpoint interval and does sync()+usleep() every 32 
 blocks (if it has time to do this).
 
 This is clearly the wrong way to implement it, but ...
 
 The same system has ARC and delayed vacuum. With normal, unmodified 
 checkpoints every 300 seconds, the transaction responsetime for 
 new_order still peaks at over 30 seconds (5 is already too much) so the 
 system basically come to a freeze during a checkpoint.
 
 Now with this high-frequent sync()ing and checkpointing by the minute, 
 the entire system load levels out really nice. Basically it's constantly 
 checkpointing. So maybe the thing we're looking for is to make the 
 checkpoint process the background buffer writer process and let it 
 checkpoint 'round the clock. Of course, with a bit more selectivity on 
 what to fsync and not doing system wide sync() every 10-500 milliseconds :-)
 
 
 Jan
 
 -- 
 #==#
 # It's easier to get forgiveness for being wrong than for being right. #
 # Let's break this rule - forgive me.  #
 #== [EMAIL PROTECTED] #
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Bruce Momjian
Tom Lane wrote:
 Jan Wieck [EMAIL PROTECTED] writes:
  That is part of the idea. The whole idea is to issue physical writes 
  at a fairly steady rate without increasing the number of them 
  substantial or interfering with the drives opinion about their order too 
  much. I think O_SYNC for random access can be in conflict with write 
  reordering.
 
 Good point.  But if we issue lots of writes without fsync then we still
 have the problem of a write storm when the fsync finally occurs, while
 if we fsync too often then we constrain the write order too much.  There
 will need to be some tuning here.

I know the BSD's have trickle sync --- if we write the dirty buffers to
kernel buffers many seconds before our checkpoint, the kernel might
right them to disk for use and sync() will not need to do it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Bruce Momjian
Tom Lane wrote:
 Andrew Sullivan [EMAIL PROTECTED] writes:
  On Sun, Nov 02, 2003 at 01:00:35PM -0500, Tom Lane wrote:
  real traction we'd have to go back to the take over most of RAM for
  shared buffers approach, which we already know to have a bunch of
  severe disadvantages.
 
  I know there are severe disadvantages in the current implementation,
  but are there in-principle severe disadvantages?
 
 Yes.  For one, since we cannot change the size of shared memory
 on-the-fly (at least not portably), there is no opportunity to trade off
 memory usage dynamically between processes and disk buffers.  For
 another, on many systems shared memory is subject to being swapped out.
 Swapping out dirty buffers is a performance killer, because they must be
 swapped back in again before they can be written to where they should
 have gone.  The only way to avoid this is to keep the number of shared
 buffers small enough that they all remain fairly hot (recently used)
 and so the kernel won't be tempted to swap out any part of the region.

Agreed, we can't resize shared memory, but I don't think most OS's swap
out shared memory, and even if they do, they usually have a kernel
configuration parameter to lock it into kernel memory.  All the old
unixes locked the shared memory into kernel address space and in fact
this is why many of them required a kernel recompile to increase shared
memory.  I hope the ones that have pagable shared memory have a way to
prevent it --- at least FreeBSD does, not sure about Linux.

Now, the disadvantages of large kernel cache, small PostgreSQL buffer
cache is that data has to be transfered to/from the kernel buffers, and
second, we can't control the kernel's cache replacement strategy, and
will probably not be able to in the near future, while we do control our
own buffer cache replacement strategy.

Looking at the advantages/disadvantages, a large shared buffer cache
looks pretty good to me.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Joe Conway
Bruce Momjian wrote:
Having another process do the writing does allow some paralellism, but
people don't seem to care of buffers having to be read in from the
kernel buffer cache, so what big benefit do we get by having someone
else write into the kernel buffer cache, except allowing a central place
to fsync, and is it worth it considering that it might be impossible to
configure a system where the writer process can keep up with all the
backends?
This might be far fetched, but I wonder if having a writer process opens 
up the possibility of running PostgreSQL in a cluster? I'm thinking of 
two servers, mounted to the same data volume, and some kind of 
coordination between the writer processes. Anyone know if this is 
similar to how Oracle handles RAC?

Joe

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-09 Thread Joe Conway
Bruce Momjian wrote:
Agreed, we can't resize shared memory, but I don't think most OS's swap
out shared memory, and even if they do, they usually have a kernel
configuration parameter to lock it into kernel memory.  All the old
unixes locked the shared memory into kernel address space and in fact
this is why many of them required a kernel recompile to increase shared
memory.  I hope the ones that have pagable shared memory have a way to
prevent it --- at least FreeBSD does, not sure about Linux.
I'm pretty sure at least Linux, Solaris, and HPUX all work this way -- 
otherwise Oracle would have the same problem with their SGA, which is 
kept in shared memory.

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] [ADMIN] [HACKERS] retrieve statement from catalogs

2003-11-09 Thread David Busby
Can you just use the SQL logging and read that file afterwords?

/djb

- Original Message - 
From: Jaime Casanova [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Friday, November 07, 2003 14:59
Subject: Re: [GENERAL] [ADMIN] [HACKERS] retrieve statement from catalogs



 Hi everybody, can anyone tell me if there's a way to retrieve the select
 instruction executed from the catalogs, or maybe via some structure in a
 trigger?

 The reason is that i have some selects constructed on-the-fly (just part
of
 it) and i want to save that in a table in order to know what are the most
 used for optimizing them.

 Another reason for doing that is that i don't know all the selects in the
 program nor the page and this is an easier solution than examine all the
 code.

 thanx in advance, el_vigia

 _
 Protect your PC - get McAfee.com VirusScan Online
 http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])