date:20070208

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Richard Troy


On Thu, 8 Feb 2007, Joshua D. Drake wrote:
>
> Well how deep are we talking here? My understanding of what Jan wants to
> do is simple.
>
> Be able to declare which triggers are fired depending on the state of
> the cluster.
>
> In Jan's terms, the Origin or Subscriber. In Replicator terms the Master
> or Slave.
>
> This is useful because I may have a trigger on the Master and the same
> trigger on the Slave. You do not want the trigger to fire on the Slave
> because we are doing data replication. In short, the we replicate the
> result, not the action.
>
> However, you may want triggers that are on the Slave to fire separately.
> A reporting server that generates materialized views is a good example.
> Don't tie up the Master with what a Slave can do.
>

It'd be great if Jan considers the blending of replication; any given DB
instance shouldn't be only a master/originator or only a slave/subscriber.
A solution that lets you blend replication strategies in a single db is,
from my point of view, very important.

> > I have no clue what got you into what you are doing here.

Jan, some sleep now and then might be helpful to your public disposition.
-smile-

peace,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Full page writes improvement

2007-02-08 Thread Koichi Suzuki

Full_page_compress is not intended to use with PITR slave, but for the
case to keep both online backup and archive log for archive recovery,
which is very popular PostgreSQL operation now.

I've just posted my evaluation for the patch as a reply for another
thread of the same proposal (sorry, I created new thread because old one
seemed not good).

It compares log compression with gzip case.  Also, our proposal can
combine with gzip.   It's overall overhead is slightly less than just
copying WAL using cat.   As a result, my proposal does not include
serious overhead.

Please refer to the thread "Archive log compression keeping physical log
available in the crash recovery".  I appreciate further opinion/comment
on this.   I'd like to have more suggestion which evaluation is useful.

I've posted two (archive and restore) commands and a small patch.
These two commands can be treated as contrib and the patch itself does
work if WAL is simply copied to the archive directory.

Regards;
Koichi Suzuki

Tom Lane wrote:
> Koichi Suzuki <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Doesn't this break crash recovery on PITR slaves?
> 
>> Compressed archive log contains the same data as full_page_writes off
>> case.   So the influence to PITR slaves is the same as full_page_writes off.
> 
> Right.  So what is the use-case for running your primary database with
> full_page_writes on and the slaves with it off?  It doesn't seem like
> a very sensible combination to me.
> 
> Also, it seems to me that some significant performance hit would be
> taken by having to grovel through the log files to remove and re-add the
> full-page data.  Plus you are actually writing *more* WAL data out of
> the primary, not less, because you have to save both the full-page
> images and the per-tuple data they normally replace.  Do you have
> numbers showing that there's actually any meaningful savings overall?
> 
>   regards, tom lane
> 

-- 
Koichi Suzuki

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] How can I use 2GB of shared buffers on Windows?

2007-02-08 Thread Takayuki Tsunakawa

From: "Magnus Hagander" <[EMAIL PROTECTED]>
> hnetcfg.dll is a part of Windows. "Home Networking Configuration
> Manager". LPK.DLL is also a part of Windows - it's the language
pack.

Thank you for information.

> On Thu, Feb 08, 2007 at 09:50:26PM +0900, Takayuki Tsunakawa wrote:
>> When I try to start PostgreSQL 8.2.1 on Windows 2003 Server with
>> shared_buffers=1024MB, I get the following error messages in the
Event
>> Log (with log_min_messages=debug5) and can't start PostgreSQL:
>
> Is this for testing, or for production? From what I've heard, you
would
> normally never want that much shared memory - I've seen more reports
on
> taht you shuld keep it as low as possible, really. For performance
> reasons.

For testing.  I wanted to place all data in shared buffers to
eliminate reads from disk while I run pgbench repeatedly (actually
most reads should come from kernel cache, though.)

Does PostgreSQL for Windows have any problem when using a large
database cache unlike UNIX versions?  I'm excited about your current
great work to enable building all of PostgreSQL with MSVC.  I thought
you are aiming at making PostgreSQL 64-bit on Windows in the near
future (though you may not have signified in ML.)  I'm afraid MinGW
will not extend to 64-bit (for x64 and Itanium) at least reliably and
immediately, due to the difference of data model -- 'long' is still
32-bit in 64-bit applications on Windows.  I thought Magnus-san got
worried about it and started the activity of completely switching to
MSVC.

BTW, the current PostgreSQL for Windows is very slow, isn't it?  I
compared the performance of PostgreSQL 8.2.x for Linux (RHEL4 for x86,
kernel 2.6.x) and Windows Server 2003.  I ran 'pgbench -c32 -t500' on
the same machine with the same disk layout for data files and WAL,
i.e. they are stored on separate disks.  The settings in
postgresql.conf is the same, except for wal_sync_method -- it is set
to open_sync on Linux and open_datasync on Windows, because they are
the best for each platform.
Linux version shows 1100 tps, but Windows version shows only 450 tps.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Archive log compression keeping physical log available in the crash recovery

2007-02-08 Thread Koichi Suzuki


Sorry for the late responce;

Gzip can reduce the archive log size about one fourth.   My point is 
that it can still be large enough.Removing physical log record (by 
replacing them with logical log record) from archive log will achieve 
will shrink the size of the archive log to one twentieth, in the case of 
pgbehcn test about ten hours (3,600,000 transactions) with database size 
about 2GB.   In the case of gzip, maybe becuase of higher CPU load, 
total throughput for gzip is less than just copying WAL to archive.  In 
our case, throughput seems to be slightly higher than just copying 
(preserving physical log) or gzip.   I'll gather the meaturement result 
and try to post.


The size of archive log seems not affected by the size of the database, 
but just by the number of transactions.  In the case of 
full_page_writes=on and full_page_compress=on, compressed archive log 
size seems to be dependent only on the number of transactions and 
transaction characteristics.


Our evaluation result is as follows:
Database size: 2GB
WAL size (after 10hours pgbench run): 48.3GB
gzipped size: 8.8GB
removal of the physical log: 2.36GB
fullpage_writes=off log size: 2.42GB

The reason why archive log size of our case is slightly smaller than 
full_page_writes=off is because we remove not only the physical logs 
but also each page header and the dummy part at the tail of each log 
segment.


Further, we can apply gzip to this archive (2.36GB).   Final size is 
0.75GB, less than one sixtieth of the original WAL.


Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec, 
and our compression to 2.36GB needed about 1010sec, slightly less than 
just cat command (1386sec).   When gzip is combined with our compression 
(48.3GB to 0.75GB), total duration was about 1330sec.


This shows that phyiscal log removal is good selection for the following 
case:


1) Need same crash recovery possibility as full_page_writes=on, and
2) Need to shrink the size of archive log for loger period to store.

Of course, if we care crash recovery in PITR slave, we still need 
physical log records in archive log.   In this case, because archive log 
is not intended to be kept long, its size will not be an issue.


I'm planning to do archive log size evalutation with other benchmarks 
such as DBT-2 as well.


Materials for this has already been thrown to HACKERS and PATCHES.   I 
hope you try this.



Jim Nasby wrote:
I thought the drive behind full_page_writes = off was to reduce the 
amount of data being written to pg_xlog, not to shrink the size of a 
PITR log archive.


ISTM that if you want to shrink a PITR log archive you'd be able to get 
good results by (b|g)zip'ing the WAL files in the archive. I quick test 
on my laptop shows over a 4x reduction in size. Presumably that'd be 
even larger if you increased the size of WAL segments.


On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:

This is a proposal for archive log compression keeping physical log in 
WAL.


In PotgreSQL 8.2, full-page_writes option came back to cut out physical
log both from WAL and archive log.   To deal with the partial write
during the online backup, physical log is written only during the online
backup.

Although this dramatically reduces the log size, it can risk the crash
recovery.   If any page is inconsisitent because of the fault, crash
recovery doesn't work because full page images are necessary to recover
the page in such case.  For critical use, especially in commercial use,
 we don't like to risk the crash recovery chance, while reducing the
archive log size will be crucial too for larger databases.WAL size
itself may be less critical, because they're reused cyclickly.

Here, I have a simple idea to reduce archive log size while keeping
physical log in xlog:

1. Create new GUC: full_page_compress,

2. Turn on both the full_page_writes and full_page_compress: physical
log will be written to WAL at the first write to a page after the
checkpoint, just as conventional full_page_writes ON.

3. Unless physical log is written during the online backup, this can be
removed from the archive log.   One bit in XLR_BKP_BLOCK_MASK
(XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
three of them are in use) and this mark can be set in XLogInsert().
With the both full_page_writes and full_page_compress on, both logical
log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
flag on.  Having both physical and logical log in a same WAL is not
harmful in the crash recovery.  In the crash recovery, physical log is
used if it's available.  Logical log is used in the archive recovery, as
the corresponding physical log will be removed.

4. The archive command (separate binary), removes physical logs if
XLR_BKP_REMOVABLE flag is on.   Physical logs will be replaced by a
minumum information of very small size, which is used to restore the
physical log to keep other log records's LSN consistent.

5. The restore command (sep

Re: [HACKERS] Variable length varlena headers redux

2007-02-08 Thread Bruce Momjian


Uh, I thought the approach was to create type-specific in/out functions,
and add casting so every time there were referenced, they would expand
to a varlena structure in memory.

---

Gregory Stark wrote:
> 
> I've been looking at this again and had a few conversations about it. This may
> be easier than I had originally thought but there's one major issue that's
> bugging me. Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
> 
> The two approaches I see are either 
> 
> a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> for constructing new tuples and behaves exactly as it does now. So you always
> construct a four-byte header datum. Then in heap_form*tuple we check if you
> can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> existing datums and would interpret the header bits.
> 
> This seems very fragile since one stray call site using VARATT_DATA to find
> the data in an existing datum would cause random bugs that only occur rarely
> in certain circumstances. It would even work as long as the size is filled in
> with VARATT_SIZEP first which it usually is, but fail if someone changes the
> order of the statements.
> 
> or 
> 
> b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> everywhere change over to a new macro api. That seems like a pretty big
> burden. It's safer but means every contrib module would have to be updated and
> so on.
> 
> I'm hoping I'm missing something and there's a way to do this without breaking
> the api for every user function.
> 
> 

-- Start of included mail From: Tom Lane <[EMAIL PROTECTED]>

> To: Gregory Stark <[EMAIL PROTECTED]>
> cc: Gregory Stark <[EMAIL PROTECTED]>, Bruce Momjian <[EMAIL PROTECTED]>, 
>   Peter Eisentraut <[EMAIL PROTECTED]>, 
> pgsql-hackers@postgresql.org, 
>   Martijn van Oosterhout 
> Subject: Re: [HACKERS] Fixed length data types issue 
> Date: Mon, 11 Sep 2006 13:15:43 -0400
> Lines: 64
> Xref: stark.xeocode.com work.enterprisedb:683

> Gregory Stark <[EMAIL PROTECTED]> writes:
> > In any case it seems a bit backwards to me. Wouldn't it be better to
> > preserve bits in the case of short length words where they're precious
> > rather than long ones? If we make 0xxx the 1-byte case it means ...
> 
> Well, I don't find that real persuasive: you're saying that it's
> important to have a 1-byte not 2-byte header for datums between 64 and
> 127 bytes long.  Which is by definition less than a 2% savings for those
> values.  I think its's more important to pick bitpatterns that reduce
> the number of cases heap_deform_tuple has to think about while decoding
> the length of a field --- every "if" in that inner loop is expensive.
> 
> I realized this morning that if we are going to preserve the rule that
> 4-byte-header and compressed-header cases can be distinguished from the
> data alone, there is no reason to be very worried about whether the
> 2-byte cases can represent the maximal length of an in-line datum.
> If you want to do 16K inline (and your page is big enough for that)
> you can just fall back to the 4-byte-header case.  So there's no real
> disadvantage if the 2-byte headers can only go up to 4K or so.  This
> gives us some more flexibility in the bitpattern choices.
> 
> Another thought that occurred to me is that if we preserve the
> convention that a length word's value includes itself, then for a
> 1-byte header the bit pattern 1000 is meaningless --- the count
> has to be at least 1.  So one trick we could play is to take over
> this value as the signal for "toast pointer follows", with the
> assumption that the tuple-decoder code knows a-priori how big a
> toast pointer is.  I am not real enamored of this, because it certainly
> adds one case to the inner heap_deform_tuple loop and it'll give us
> problems if we ever want more than one kind of toast pointer.  But
> it's a possibility.
> 
> Anyway, a couple of encodings that I'm thinking about now involve
> limiting uncompressed data to 1G (same as now), so that we can play
> with the first 2 bits instead of just 1:
> 
> 00xx  4-byte length word, aligned, uncompressed data (up to 1G)
> 01xx  4-byte length word, aligned, compressed data (up to 1G)
> 100x  1-byte length word, unaligned, TOAST pointer
> 1010  2-byte length word, unaligned, uncompressed data (up to 4K)
> 1011  2-byte length word, unaligned, compressed data (up to 4K)
> 11xx  1-byte length word, unaligned, uncompressed data (up to 63b)
> 
> or
> 
> 00xx  4-byte length word, aligned, uncompressed data (up to 1G)
> 010x  2-byte length word, unaligned, uncompressed data (up to 8K)
> 011x  2-byte length word, unaligned, compressed data (up to 8K)
> 1000  1-byte length word, u

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom CommandIDs,updated patch

2007-02-08 Thread Simon Riggs

On Thu, 2007-02-08 at 22:15 -0500, Tom Lane wrote:

> I'm testing the patch currently.  I was a bit surprised to find the
> without_oid test failing, but it makes sense because I'm using a
> MAXALIGN=8 machine.  I suppose Heikki tested on MAXALIGN=4.

That's definitely strange. The patch has been performance tested on a
MAXALIGN=8 system on tables without OIDs on, but I'm not sure we ran the
make check itself on that server. Will do that in future.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Bruce Momjian


I just want an outline of what each option is supposed to control.  If
that information is in a documentation patch, then fine, he can just
post that and tell people to read the patch documentation.

---

Joshua D. Drake wrote:
> Jan Wieck wrote:
> > On 2/8/2007 3:32 PM, Bruce Momjian wrote:
> >> Alvaro Herrera wrote:
> >>> > > Is this a new policy that after discussion, all patches must be >
> >>> > resubmitted with a summary and conclusions of the discussion? I can
> >>> > > certainly do that for you, but just tell me if you are going to
> >>> ask the > > same from everyone.
> >>> > > No, I am asking only this time because I feel there was too much
> >>> > disconnect between the patch and the extensive replication discussion
> >>> > that few community members would see the connection.
> >>>
> >>> FYI, in my opinion the trigger addition is clearly useful to Mammoth
> >>> Replicator as well.  In fact, it's so obviously useful that I didn't see
> >>> a need to state that in the original thread where it was discussed.
> >>
> >> Right, I know it is useful too, but I would like a layout of what it
> >> does and why so everyone is clear on it.
> 
> Well how deep are we talking here? My understanding of what Jan wants to
> do is simple.
> 
> Be able to declare which triggers are fired depending on the state of
> the cluster.
> 
> In Jan's terms, the Origin or Subscriber. In Replicator terms the Master
> or Slave.
> 
> This is useful because I may have a trigger on the Master and the same
> trigger on the Slave. You do not want the trigger to fire on the Slave
> because we are doing data replication. In short, the we replicate the
> result, not the action.
> 
> However, you may want triggers that are on the Slave to fire separately.
> A reporting server that generates materialized views is a good example.
> Don't tie up the Master with what a Slave can do.
> 
> Sincerely,
> 
> Joshua D. Drake
> 
> 
> 
> > 
> > I have no clue what got you into what you are doing here. But that shall
> > not be my real concern. If you feel the need to do this sort of thing,
> > be my guest. I will add the remaining changes to pg_rewrite, including
> > the new support commands and changes to psql as well as pg_dump and
> > resubmit the new patch with explanations that will hopefully help you to
> > comprehend what and how this relatively small and fully backward
> > compatible change in the trigger and rule firing mechanism will work and
> > what existing problems it will solve.
> > 
> > 
> > Regards,
> > Jan
> > 
> 
> 
> -- 
> 
>   === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>  http://www.commandprompt.com/
> 
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
> 
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Joshua D. Drake

Jan Wieck wrote:
> On 2/8/2007 3:32 PM, Bruce Momjian wrote:
>> Alvaro Herrera wrote:
>>> > > Is this a new policy that after discussion, all patches must be >
>>> > resubmitted with a summary and conclusions of the discussion? I can
>>> > > certainly do that for you, but just tell me if you are going to
>>> ask the > > same from everyone.
>>> > > No, I am asking only this time because I feel there was too much
>>> > disconnect between the patch and the extensive replication discussion
>>> > that few community members would see the connection.
>>>
>>> FYI, in my opinion the trigger addition is clearly useful to Mammoth
>>> Replicator as well.  In fact, it's so obviously useful that I didn't see
>>> a need to state that in the original thread where it was discussed.
>>
>> Right, I know it is useful too, but I would like a layout of what it
>> does and why so everyone is clear on it.

Well how deep are we talking here? My understanding of what Jan wants to
do is simple.

Be able to declare which triggers are fired depending on the state of
the cluster.

In Jan's terms, the Origin or Subscriber. In Replicator terms the Master
or Slave.

This is useful because I may have a trigger on the Master and the same
trigger on the Slave. You do not want the trigger to fire on the Slave
because we are doing data replication. In short, the we replicate the
result, not the action.

However, you may want triggers that are on the Slave to fire separately.
A reporting server that generates materialized views is a good example.
Don't tie up the Master with what a Slave can do.

Sincerely,

Joshua D. Drake

> 
> I have no clue what got you into what you are doing here. But that shall
> not be my real concern. If you feel the need to do this sort of thing,
> be my guest. I will add the remaining changes to pg_rewrite, including
> the new support commands and changes to psql as well as pg_dump and
> resubmit the new patch with explanations that will hopefully help you to
> comprehend what and how this relatively small and fully backward
> compatible change in the trigger and rule firing mechanism will work and
> what existing problems it will solve.
> 
> 
> Regards,
> Jan
> 

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch

2007-02-08 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Simon Riggs wrote:
>> Combo is OK, because it's a *combination* of two CommandIds. 
>> 
>> That means they are ComboCommandIds or CCIs.

> CCI is CommandCounterIncrement to me, so let's not use that
> abbreviation.

Agreed.  I looked for a bit at adding a separate typedef ComboCommandId
to be used where a cid was definitely a combo cid, but it didn't seem to
be very useful to do that...

I'm testing the patch currently.  I was a bit surprised to find the
without_oid test failing, but it makes sense because I'm using a
MAXALIGN=8 machine.  I suppose Heikki tested on MAXALIGN=4.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom CommandIDs,updated patch

2007-02-08 Thread Simon Riggs

On Fri, 2007-02-09 at 00:08 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On Thu, 2007-02-08 at 15:32 -0500, Tom Lane wrote: 
> > > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > > Tom Lane wrote:
> > > >> Packed doesn't seem to have quite the right connotation either --- it
> > > >> sounds like it means there are two separable fields in the CID value.
> > > >> 
> > > >> Maybe "composite cid"?
> > > 
> > > > At one point I was thinking "combo". but "composite" sounds good.
> > 
> > Combo is OK, because it's a *combination* of two CommandIds. 
> > 
> > That means they are ComboCommandIds or CCIs.
> 
> CCI is CommandCounterIncrement to me, so let's not use that
> abbreviation.

True; given the similar context that would be a mistake.

Just ComboIds then? I was worried that was cid also.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch

2007-02-08 Thread Alvaro Herrera

Simon Riggs wrote:
> On Thu, 2007-02-08 at 15:32 -0500, Tom Lane wrote: 
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > Tom Lane wrote:
> > >> Packed doesn't seem to have quite the right connotation either --- it
> > >> sounds like it means there are two separable fields in the CID value.
> > >> 
> > >> Maybe "composite cid"?
> > 
> > > At one point I was thinking "combo". but "composite" sounds good.
> 
> Combo is OK, because it's a *combination* of two CommandIds. 
> 
> That means they are ComboCommandIds or CCIs.

CCI is CommandCounterIncrement to me, so let's not use that
abbreviation.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch

2007-02-08 Thread Simon Riggs

On Thu, 2007-02-08 at 15:32 -0500, Tom Lane wrote: 
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> Packed doesn't seem to have quite the right connotation either --- it
> >> sounds like it means there are two separable fields in the CID value.
> >> 
> >> Maybe "composite cid"?
> 
> > At one point I was thinking "combo". but "composite" sounds good.

Combo is OK, because it's a *combination* of two CommandIds. 

That means they are ComboCommandIds or CCIs.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Bad bug in fopen() wrapper code

2007-02-08 Thread Bruce Momjian

Tom Lane wrote:
> "Magnus Hagander" <[EMAIL PROTECTED]> writes:
> > That is part of the original open() code that Claudio did back for 8.0,
> > so it has definitly been working since then.
> 
> Hm, maybe best not to touch it, but still...
> 
> > I haven't really read into
> > the code, though... But a qiuck look doesn't show me any place wher eit
> > does ignore O_EXCL - which combination would that be?
> 
> What's bugging me is that 0 and O_EXCL give the same answer, and
> O_TRUNC and O_TRUNC | O_EXCL give the same answer, but O_CREAT and
> O_CREAT | O_EXCL give different answers, as do O_CREAT | O_TRUNC
> and O_CREAT | O_TRUNC | O_EXCL.  I'm also pretty suspicious of
> both O_CREAT | O_EXCL and O_CREAT | O_TRUNC | O_EXCL giving the
> same answer.  However, I have no idea what the semantics are of
> the symbols the function is mapping into, so maybe it's OK.

I am CC'ing Claudio Natoli on this question about
open.c::openFlagsToCreateFileFlags(), but in looking at the code, it
seems OK because:

o O_EXCL doesn't have any meaning unless O_CREAT is used

o O_TRUNC has no meaning when O_CREAT | O_EXCL are used
because you are guaranteed to be creating a new file

Claudio, the function is here, at the top of the file:


http://developer.postgresql.org/cvsweb.cgi/pgsql/src/port/open.c?rev=1.18;content-type=text%2Fx-cvsweb-markup

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Hierarchical Queries--Status

2007-02-08 Thread Bruce Momjian


Who is working on this item?

---

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Mon, Sep 04, 2006 at 05:15:57PM +0100, Mark Cave-Ayland wrote:
> > 3) Add planner support so that WITH clauses are mapped to a new type of
> > node that utilises two tuplestores - an output tuplestore and a working
> > tuplestore. The output tuple store will in effect be the contents of the
> > table expression while the working tuplestore holds the results of the
> > last iteration if recursive. Also implement some kind of WithState node
> > which keeps track of the recursion state.
> 
> That's basically what I came up with. Basically you have a sort of loop
> in the execution plan where tuples that come out are copied into a
> tuplestore and run through a particular part of the executor again. The
> top-down approach of the executor makes it a bit trickier...
> 
> > Having spent some more time today looking at 1) and also at the SQL 2003
> > spec, it would seem that other databases offer the WITH clause within
> > subqueries and also as part of a view. I'd be interested to hear
> > feedback from other developers as to whether it would be possible to
> > achieve this level of support within PostgreSQL.
> 
> Absolutly possible. The question is how much work :)
> 
> Incidently, if you find a way to support common subplans (where a part
> of the executor is shared between two executions) that might provide a
> way to solve some of the trickier multiple evaluation problems with
> rules. Again, it would just be a tuplestore the stored the results for
> multiple executions.
> 
> Have a nice day,
> -- 
> Martijn van Oosterhout  http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to 
> > litigate.
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Bruce Momjian

Jan Wieck wrote:
> On 2/8/2007 3:32 PM, Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> >> > > Is this a new policy that after discussion, all patches must be 
> >> > > resubmitted with a summary and conclusions of the discussion? I can 
> >> > > certainly do that for you, but just tell me if you are going to ask 
> >> > > the 
> >> > > same from everyone.
> >> > 
> >> > No, I am asking only this time because I feel there was too much
> >> > disconnect between the patch and the extensive replication discussion
> >> > that few community members would see the connection.
> >> 
> >> FYI, in my opinion the trigger addition is clearly useful to Mammoth
> >> Replicator as well.  In fact, it's so obviously useful that I didn't see
> >> a need to state that in the original thread where it was discussed.
> > 
> > Right, I know it is useful too, but I would like a layout of what it
> > does and why so everyone is clear on it.
> 
> I have no clue what got you into what you are doing here. But that shall 
> not be my real concern. If you feel the need to do this sort of thing, 
> be my guest. I will add the remaining changes to pg_rewrite, including 
> the new support commands and changes to psql as well as pg_dump and 
> resubmit the new patch with explanations that will hopefully help you to 
> comprehend what and how this relatively small and fully backward 
> compatible change in the trigger and rule firing mechanism will work and 
> what existing problems it will solve.

Yep, that's what I want everyone to see.  This is standard procedure for
everyone in the community, and core is not immune.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Jan Wieck

On 2/8/2007 3:32 PM, Bruce Momjian wrote:

Alvaro Herrera wrote:
> > Is this a new policy that after discussion, all patches must be 
> > resubmitted with a summary and conclusions of the discussion? I can 
> > certainly do that for you, but just tell me if you are going to ask the 
> > same from everyone.
> 
> No, I am asking only this time because I feel there was too much

> disconnect between the patch and the extensive replication discussion
> that few community members would see the connection.

FYI, in my opinion the trigger addition is clearly useful to Mammoth
Replicator as well.  In fact, it's so obviously useful that I didn't see
a need to state that in the original thread where it was discussed.

Right, I know it is useful too, but I would like a layout of what it
does and why so everyone is clear on it.

I have no clue what got you into what you are doing here. But that shall 
not be my real concern. If you feel the need to do this sort of thing, 
be my guest. I will add the remaining changes to pg_rewrite, including 
the new support commands and changes to psql as well as pg_dump and 
resubmit the new patch with explanations that will hopefully help you to 
comprehend what and how this relatively small and fully backward 
compatible change in the trigger and rule firing mechanism will work and 
what existing problems it will solve.

Regards,
Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-08 Thread Simon Riggs

On Wed, 2007-02-07 at 14:17 -0500, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > The basic idea is that when a tuple is UPDATEd we can, in certain
> > circumstances, avoid inserting index tuples for a tuple. Such tuples are
> > marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to
> > other tuples.
> 
> What is VACUUM FULL going to do when it wants to move one of these things?

In addition to others suggested, one option is to rework VACUUM FULL:

Use case for VACUUM FULL is very low these days. It has an appallingly
long execution time. This can be speeded up by dropping and re-creating
indexes, says the manual, but it is still lengthy. It is even faster to
drop the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old
table and then rebuild the indexes. When moving into the new relation
the space requirement is not double, since we only copy useful
data/space and we also do this using the space the indexes occupied, so
the actual space overhead isn't that high. VACUUM FULL also generates
lots of WAL and ties up lots of memory while it operates - and it uses
memory without any constraint. The CTAS technique doesn't carry across
all of the visible tuple chains, but then to be brutally frank, by the
time VACUUM FULL has actually finished executing, it is very frequently
the oldest transaction anyway, so we needn't really have gone to all the
trouble of moving the tuple chains.

So the main use case for VACUUM FULL is when the space to be freed
inside the table is low enough to make defraging the table quicker than
a CTAS, yet still high enough that we were worried enough to do a VACUUM
FULL. Thats a very narrow use case, and if it exists at all there's a
narrow time window associated with it - only a heavily updated/deleted
table needs vacuuming anyway - that means most often be performing the
VF when we're already into the zone where the CTAS approach is quicker.
VACUUM FULL also forces us to handle various failure cases that leave
half-moved tuples scattered across tables. (Incomplete VACUUM FULLs are
actually fairly common because of its incredibly long run time).

So the radical suggestion is to continue to support the VACUUM FULL
command, but using a newly coded technique that is both higher
performance and more robust. We can either make VACUUM FULL wait until
it actually is the oldest transaction, or we can mark the table in some
way so that a lock cannot be obtained upon it by an earlier Xid. We
should be able to compact the files of a large table one physical file
at a time, so the space overhead is only ever MAX_PHYSICAL_FILESIZE and
the space overhead may become a net space gain as the VF continues. This
is of course (almost) identical to the approach already in use for
CLUSTER, so it seems like that should be acceptable. As a result, its
really not that much code and can still be accomplished on time.

Note also that we would not have to drop and re-add Foreign Keys, since
nothing can have changed while we have the table locked.

Doing this also frees up two heap info bits and simplifies many of the
HeapTupleSatisfies code, which could probably use the helping hand.
Tuples moved to the new files would retain their info bit settings as
they are copied across.

So overall, it seems a lot easier to completely replace VF than to fight
through its complexities and failure cases. If we do the above, then
we'll speed up VACUUM FULL and we'll be able to handle HOT tuples
easily.

> > CREATE INDEX requires some careful work to allow it to identify and
> > correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as
> > a result of the new index.
> 
> I think you've glossed over the CREATE INDEX problem much too easily.
> The difficulty with that is going to be that de-HOT-ifying a tuple
> is going to require multiple updates that can't possibly be put into
> a single WAL record, and I don't think that WAL replay can clean up
> after an incomplete update (since it can't run user-defined functions
> and hence cannot be expected to compute index entries for itself).
> So I don't think you can do that while preserving crash safety.

No intention to gloss, just wanted to get past first post and onto the
really complex stuff. You're absolutely right that this is where much of
the thinking/work needs to take place.

> > Removing the root tuple will require a VACUUM *FULL*.
> 
> That seems unacceptable ... it won't take too long for your table to
> fill up with stubs, and we don't want to return to the bad old days
> when periodic VACUUM FULL was unavoidable.

Completely agree. I wanted to start right at the very beginning, so
everybody would understand the issues, rather than jump straight in
again with additional complexity.

> ISTM we could fix that by extending the index VACUUM interface to
> include two concepts: aside from "remove these TIDs when you find them",
> there could be "replace these TIDs with those TIDs when you find them".
> This would allow

Re: [HACKERS] [PATCHES] plpgsql, return can contains any expression

2007-02-08 Thread Bruce Momjian


OK, where are we on this patch?

---

Pavel Stehule wrote:
> 
> 
> >
> >"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> > >> This patch doesn't seem to cope with cases where the supplied tuple has
> > >> the wrong number of columns, and it doesn't look like it's being 
> >careful
> > >> about dropped columns either.  Also, that's a mighty bizarre-looking
> > >> choice of cache memory context in coerce_to_tuple ... but then again,
> > >> why are you bothering with a cache at all for temporary arrays?
> >
> > > I am sorry, Tom. But I don't understand. I can check number of columns,
> > > ofcourse and I'll do it. What cache for temporary arrays do you mean?
> >
> >I thought that making coerce_to_tuple depend on estate->err_func was
> >pretty bizarre, and that there was no need for any "cache" at all for
> >arrays that need only live as long as the function runs.  All you are
> >saving here is a palloc/pfree cycle, which is not worth the obscurantism
> >and risk of bugs (are you sure natts can never change?).
> 
> No, cache there is ugly. But I don't have idea about more efective 
> implementation of it :-(. First version of this patch was more clean. and 
> little bit slow. This cache save 10%.
> 
> >
> >BTW, if you want this patch to make it into 8.2, it needs to be fixed
> >and resubmitted *very* soon.
> 
> I understand, but I am not able work on it in next four days. And I need 
> help with it from Neil. It will be for 8.3.
> 
> Thank you
> Pavel
> 
> _
> Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. 
> http://messenger.msn.cz/
> 
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Updatable views

2007-02-08 Thread Bruce Momjian


Here is the more recent email I have on this feature work.

---

Bernd Helmle wrote:
> --On Freitag, September 01, 2006 11:41:16 -0400 Tom Lane 
> <[EMAIL PROTECTED]> wrote:
> 
> >
> > So in other words, views on serial columns don't work?  I don't think
> > that's going to be acceptable.
> >
> 
> They work in such a case that someone isn't allowed to put a volatile
> function in an update query
> 
> >
> > Not really worse than what the rewriter is doing already --- in fact,
> > I think it's isomorphic to what would happen to the rule qual
> > expressions in your existing patch.
> >
> 
> Currently you don't have to rewrite the rule conditions itself every
> time you apply them to the query tree since they are stored in pg_rewrite
> matching all various (reversed) varattno's, resno's and whatever.
> If i understand correctly you need to do that with constraints every time
> you fire an update query on a view for each underlying relation
> 
> >
> > I'm about to propose that we should try to go beta next week (see
> > forthcoming message).  If you can strip down your patch to avoid the
> > multi-eval problems in the next couple of days, I'm still willing to
> 
> Depends on how many days "couple of days" are.if you mean the next 
> three days
> then definitely not, since i'm afk for the whole upcoming weekend. Bad 
> timing, but
> it's not deferrable :(
> 
> > consider it, but at the moment I'm assuming that it needs to be held
> > for 8.3.
> 
> Well, i'll see what i can do next weekit's a little bit disappointing 
> that these problems
> raises so late, but that's no one's fault since there are many side effects 
> of the rewriting
> system involved
> 
> Many thanks for your comments.
> 
> Bernd
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] Updatable views

2007-02-08 Thread Bruce Momjian


Where are we on this feature?

---

Bernd Helmle wrote:
> --On Mittwoch, August 30, 2006 12:01:25 -0400 Tom Lane <[EMAIL PROTECTED]> 
> wrote:
> 
> > Bernd Helmle <[EMAIL PROTECTED]> writes:
> >> [ latest views patch ]
> >
> > This is the first time I've actually looked at this patch, and I am
> > dismayed.  viewUpdate.c looks like nothing so much as a large program
> > with a small program struggling to get out.  What is all the stuff about
> > handling multiple base rels?  SQL92, at least, does not say that a join
> > is updatable, and AFAICT this patch is rejecting that too ... though
> > it's hard to tell with the conditions for allowing the join to be
> > updatable scattered through a lot of different functions.  And some of
> > the code seems to be expecting multiple implicit rules and other parts
> > not.  I get the impression that a lot of this code is left over from a
> > more ambitious first draft and ought to be removed in the name of
> > readability/maintainability.
> >
> 
> I not sure what parts of the code you are refering to exactly, but I admit 
> that
> there are code parts that could deal with multiple base relations and 
> rules.
> get_base_base_relation() is an example, it is used to create lookup tables
> for reversed columns so we could break them down to the correct position in
> their base tables. Restricting that to only one base relation wouldn't make 
> any
> difference. Furthermore, SQL99 allows at least updatable views with joined
> relations which preserve their keys in the view definition. So i don't 
> think it's that
> bad to leave parts of the code that way for future improvements.
> 
> > I'm unclear as to why you've got DO INSTEAD NOTHING rules in there ---
> > the spec says that a WITH CHECK OPTION violation results in an error,
> > not in nothing happening, so it doesn't seem to me that we should need
> > any NOTHING rules to implement the spec.  It would probably help if
> 
> Well, instead of something like
> 
> "ERROR:  cannot insert into a view
> HINT:  You need an unconditional ON INSERT DO INSTEAD rule."
> 
> you will get
> 
> "ERROR:  view update commands violates rule condition"
> 
> with the correct error code set, because the view update check function is 
> fired before.
> The first one isn't very useful for someone who simply wants to insert data 
> into the
> view which isn't allowed to get in. You never get the view update check 
> function fired
> without the DO INSTEAD rule applied to a view created with a check option.
> 
> > there were some header documentation that explained exactly how the
> > module intends to transform a SELECT to create the various action rules.
> >
> 
> I agree with you, maybe it's a good to add a README to src/backend/rewrite?
> 
> > The pg_dump changes seem pretty odd too.  Why wouldn't you just
> > ignore implicit rules during a dump, expecting the system to
> > regenerate them when the view is reloaded?
> 
> Uhm, you're right. It's easier to exclude them in the SELECT query directly 
> instead
> of selecting them, iterating over and filter them out. I'll fix that. 
> (Looks like this is a
> "cannot see the wood for the trees"-mistake)
> 
> 
> -- 
>   Thanks
> 
> Bernd
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Re-ordering of OR conditions

2007-02-08 Thread Jim Nasby

IF I run the following with the a < 2900 condition first, the more  
expensive EXISTS only gets executed when needed, but if I change the  
order of the OR's, the EXISTS is always executed. It would be good if  
the optimizer could re-order the OR conditions based on estimated  
cost (granted, this wouldn't work very well if you've got functions  
in the OR, but it'd still be useful):


select * from a where a < 2900 or exists (select * from b where b.a =  
a.a);


Here's a full example. Note the loops count for the Subplan between  
both cases:


decibel=# create table a as select * from generate_series(1,3000) a;
SELECT
decibel=# create table b as select a,b from a, generate_series(1,100)  
b where a > 10;

SELECT
decibel=# create index b__a on b(a);
CREATE INDEX
decibel=# explain analyze select * from a where a < 2900 or exists  
(select * from b where b.a = a.a);

   QUERY PLAN
 

Seq Scan on a  (cost=0.00..8080.41 rows=1997 width=4) (actual  
time=0.014..1.784 rows=3000 loops=1)

   Filter: ((a < 2900) OR (subplan))
   SubPlan
 ->  Index Scan using b__a on b  (cost=0.00..4006.44 rows=1495  
width=8) (actual time=0.009..0.009 rows=1 loops=101)

   Index Cond: (a = $0)
Total runtime: 2.151 ms
(6 rows)

decibel=# explain analyze select * from a where exists (select * from  
b where b.a = a.a) or a < 2000;

   QUERY PLAN
 
-
Seq Scan on a  (cost=0.00..8080.41 rows=1997 width=4) (actual  
time=0.067..37.011 rows=3000 loops=1)

   Filter: ((subplan) OR (a < 2000))
   SubPlan
 ->  Index Scan using b__a on b  (cost=0.00..4006.44 rows=1495  
width=8) (actual time=0.011..0.011 rows=1 loops=3000)

   Index Cond: (a = $0)
Total runtime: 37.497 ms
(6 rows)

decibel=#

(This is on HEAD as of a few minutes ago)
--
Jim Nasby   [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] TODO: Allow SQL functions to reference parameters by name

2007-02-08 Thread Jim Nasby


Can someone add this to the TODO?



patch
Description: Binary data


--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > The way combo cid is supposed to work is that you are deleting a row
> > created in your same transaction by a previous command id, so you look
> > in the combo cid array to see if a match for that pair exists --- if
> > not, you create a new entry and put the two cids on it.
> 
> > So, with the combo lock cid, you do the same process, and lookups of who
> > holds the lock looks at the cid combo, and if the second subtransaction
> > was aborted, the first one is the lock holder.  If you again lock the
> > row, you create a new combo cid and use the original cid there because
> > the second cid was aborted.
> 
> No, because no process other than the originator can see the combo-cid
> data structure, and for locking situations you really need other
> backends to be able to know whether the tuple is locked and how.

Oh, OK, I forgot pg_subtrans is visible to all backends.

> But I think my proposal of extending MultiXact would fix it; please look
> at that.

Sounds good.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> The way combo cid is supposed to work is that you are deleting a row
> created in your same transaction by a previous command id, so you look
> in the combo cid array to see if a match for that pair exists --- if
> not, you create a new entry and put the two cids on it.

> So, with the combo lock cid, you do the same process, and lookups of who
> holds the lock looks at the cid combo, and if the second subtransaction
> was aborted, the first one is the lock holder.  If you again lock the
> row, you create a new combo cid and use the original cid there because
> the second cid was aborted.

No, because no process other than the originator can see the combo-cid
data structure, and for locking situations you really need other
backends to be able to know whether the tuple is locked and how.

But I think my proposal of extending MultiXact would fix it; please look
at that.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Humm, sorry, obviously this makes no sense at all because I mentally
> mixed the Xid locker and the Cids.

After thinking a bit, I have a sketch of a solution.

Assume that we extend the MultiXact infrastructure so that it can track
whether each member of a MultiXact holds shared or exclusive lock.
(There are a couple ways you could do that --- add a parallel bit-array,
or separate the members into two groups.  Details not important for now.)
The only way you could have both shared- and exclusive-lock members is
if they are subtransactions of the same backend, but that fact isn't
real relevant to the MultiXact code.

Then extend MultiXactIdWait() so that you can tell it to wait for all
members to die, or just the exclusive members.

Then the representation of the problem situation would be that a locked
tuple would have as XMAX a MultiXact containing the upper XID as shared
locker and the subtransaction as exclusive locker.  Onlookers could wait
for one or both to die as appropriate depending on what kind of lock
they needed to get.  HEAP_XMAX_EXCL_LOCK would have to be a hint rather
than the truth, ie, once all the exclusive-lock members of the MultiXact
are dead it's really only a shared lock, but I don't see that this poses
any real difficulty.

I don't particularly want to go implement this now; I just want a
proof-of-concept sketch proving that we don't need separate cmin and
cmax to support this.

As for what I think we *should* do near-term, I'm pretty strongly
tempted to suggest that we just throw an error if a subtransaction tries
to upgrade an upper transaction's shared lock to exclusive.  When and if
we get a ton of complaints about that, it'd be time to put forth effort
to fix it.  I suspect the situation doesn't really arise much in
practice, else we'd have heard complaints from the field about the fact
that the shared lock can become lost.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Bruce Momjian

Tom Lane wrote:
> > At one point I was thinking "combo". but "composite" sounds good.
> 
> I like "combo" --- nice and short.
> 
> >> Another issue that we need to think about before we go too far with this
> >> is the problem that we punted on before 8.2 release: how to deal with
> >> rolling back an upgrade of a row-level lock from shared to exclusive
> >> within a subtransaction.  I'm a bit nervous about committing to merging
> >> cmin and cmax before we have an idea how we're going to solve that ---
> >> it might foreclose a solution.  Or maybe we could piggyback on phantom/
> >> composite/whatever CIDs to solve it, which would be great, but let's
> >> try to sketch out a solution now.
> 
> > Good point.  Right now we put our new cid on top of the old lock cid,
> > making rollback impossible to the old lock.  What if instead of
> > overwriting our old cid with a new one, we create a composite cid, and
> > if we roll back, we look up the composite pair and put the old cid back.
> > It would only work with two cids, but that seems sufficient.
> 
> Yeah, that's more or less what I was thinking.  The problem is that the
> composite CID isn't going to be enough info to tell you *where* you have
> to put things back.  And we don't want to try to remember per-row state
> in memory.  Is there a way to generalize either the composite CID or the
> MultiXact mechanism to support this situation without that?

Uh, well, hmmm.

The way combo cid is supposed to work is that you are deleting a row
created in your same transaction by a previous command id, so you look
in the combo cid array to see if a match for that pair exists --- if
not, you create a new entry and put the two cids on it.

So, with the combo lock cid, you do the same process, and lookups of who
holds the lock looks at the cid combo, and if the second subtransaction
was aborted, the first one is the lock holder.  If you again lock the
row, you create a new combo cid and use the original cid there because
the second cid was aborted.

I don't see how any of this is per-row for locks anymore than it is
per-row for insert/delete.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Alvaro Herrera

Alvaro Herrera wrote:

> This starts to look awfully similar to MultiXactIds.  And probably using
> such a mechanism would allow you to "rollback" any number of row locks:
> take the current membersoof the "multicid", substract the one that
> rolled back and use that as new multicid.  The main difference is that
> you'd need to store both the locker Cid and the mode (shared/exclusive).

Humm, sorry, obviously this makes no sense at all because I mentally
mixed the Xid locker and the Cids.

> The other difference is that multicids can be stored locally to a
> backend, no need to have SLRUs etc.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Alvaro Herrera

Bruce Momjian wrote:
> Tom Lane wrote:
> > 
> > Another issue that we need to think about before we go too far with this
> > is the problem that we punted on before 8.2 release: how to deal with
> > rolling back an upgrade of a row-level lock from shared to exclusive
> > within a subtransaction.  I'm a bit nervous about committing to merging
> > cmin and cmax before we have an idea how we're going to solve that ---
> > it might foreclose a solution.  Or maybe we could piggyback on phantom/
> > composite/whatever CIDs to solve it, which would be great, but let's
> > try to sketch out a solution now.
> 
> Good point.  Right now we put our new cid on top of the old lock cid,
> making rollback impossible to the old lock.  What if instead of
> overwriting our old cid with a new one, we create a composite cid, and
> if we roll back, we look up the composite pair and put the old cid back.
> It would only work with two cids, but that seems sufficient.

This starts to look awfully similar to MultiXactIds.  And probably using
such a mechanism would allow you to "rollback" any number of row locks:
take the current membersoof the "multicid", substract the one that
rolled back and use that as new multicid.  The main difference is that
you'd need to store both the locker Cid and the mode (shared/exclusive).

The other difference is that multicids can be stored locally to a
backend, no need to have SLRUs etc.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Packed doesn't seem to have quite the right connotation either --- it
>> sounds like it means there are two separable fields in the CID value.
>> 
>> Maybe "composite cid"?

> At one point I was thinking "combo". but "composite" sounds good.

I like "combo" --- nice and short.

>> Another issue that we need to think about before we go too far with this
>> is the problem that we punted on before 8.2 release: how to deal with
>> rolling back an upgrade of a row-level lock from shared to exclusive
>> within a subtransaction.  I'm a bit nervous about committing to merging
>> cmin and cmax before we have an idea how we're going to solve that ---
>> it might foreclose a solution.  Or maybe we could piggyback on phantom/
>> composite/whatever CIDs to solve it, which would be great, but let's
>> try to sketch out a solution now.

> Good point.  Right now we put our new cid on top of the old lock cid,
> making rollback impossible to the old lock.  What if instead of
> overwriting our old cid with a new one, we create a composite cid, and
> if we roll back, we look up the composite pair and put the old cid back.
> It would only work with two cids, but that seems sufficient.

Yeah, that's more or less what I was thinking.  The problem is that the
composite CID isn't going to be enough info to tell you *where* you have
to put things back.  And we don't want to try to remember per-row state
in memory.  Is there a way to generalize either the composite CID or the
MultiXact mechanism to support this situation without that?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Bruce Momjian

Alvaro Herrera wrote:
> > > Is this a new policy that after discussion, all patches must be 
> > > resubmitted with a summary and conclusions of the discussion? I can 
> > > certainly do that for you, but just tell me if you are going to ask the 
> > > same from everyone.
> > 
> > No, I am asking only this time because I feel there was too much
> > disconnect between the patch and the extensive replication discussion
> > that few community members would see the connection.
> 
> FYI, in my opinion the trigger addition is clearly useful to Mammoth
> Replicator as well.  In fact, it's so obviously useful that I didn't see
> a need to state that in the original thread where it was discussed.

Right, I know it is useful too, but I would like a layout of what it
does and why so everyone is clear on it.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro

On Thu, 2007-08-02 at 12:24 -0800, Stephan Szabo wrote:
> On Thu, 8 Feb 2007, Marc Munro wrote:
> 
> > I don't think this does stop the second from continuing before the
> > first.  What will stop it, is the eventual lock that is taken on the
> > child (triggering) record.
> 
> But at that point, you've already had to compose the new row in order to
> call the trigger for the ri check, right? So, one of those new rows will
> be out of date by the time it actually gets the lock. Presumably that
> means that you need to recalculate the new row, but you've already done a
> check and gotten a lock based on the old new row.

Yes.  That is tricky.  For my proposed scheme to work, I guess we'd have
to be able to drop those locks which were just acquired by the RI
triggers.  Not too simple, I guess.

> Also, another big problem is the fact that SQL requires that the action
> already have happened before the check in cases where the constraint
> references the same table.  The row being updated or inserted might
> reference a row that will be updated or inserted by a later action of the
> same statement.

Hmmm.  That does seem to be the final nail in the coffin.  Consider the
proposal withdrawn, and thanks for explaining it all to me.

__
Marc


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Bruce Momjian

Tom Lane wrote:
> [ time to move this thread to -hackers ]
> 
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> >>> Heikki Linnakangas wrote:
>  Tom Lane wrote:
> > BTW, I don't care much for the terminology "phantom cid" ... there's
> > nothing particularly "phantom" about them, seeing they get onto disk.
> > Can anyone think of a better name?  The best I can do offhand is
> > "merged cid" or "cid pair", which aren't inspiring. 
> 
>  MultiCid, like the MultiXacts? Maybe not, they're quite different 
>  beasts...
> 
> >>> Dual cid?  Double cid?
> 
> >> "Double cid" doesn't sound too bad.  Another thought that just came to
> >> mind is "cid interval" or some variant of that.
> 
> > I don't like "double ctid" because it is really just one ctid, but
> > represents two.  I am thinking "packed ctid" is the right wording.  It
> > doesn't have the same impact as "phantom", but it is probably better.
> 
> Packed doesn't seem to have quite the right connotation either --- it
> sounds like it means there are two separable fields in the CID value.
> 
> Maybe "composite cid"?

At one point I was thinking "combo". but "composite" sounds good.

> Another issue that we need to think about before we go too far with this
> is the problem that we punted on before 8.2 release: how to deal with
> rolling back an upgrade of a row-level lock from shared to exclusive
> within a subtransaction.  I'm a bit nervous about committing to merging
> cmin and cmax before we have an idea how we're going to solve that ---
> it might foreclose a solution.  Or maybe we could piggyback on phantom/
> composite/whatever CIDs to solve it, which would be great, but let's
> try to sketch out a solution now.

Good point.  Right now we put our new cid on top of the old lock cid,
making rollback impossible to the old lock.  What if instead of
overwriting our old cid with a new one, we create a composite cid, and
if we roll back, we look up the composite pair and put the old cid back.
It would only work with two cids, but that seems sufficient.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Stephan Szabo

On Thu, 8 Feb 2007, Marc Munro wrote:

> On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote:
> > On Thu, 8 Feb 2007, Marc Munro wrote:
> . . .
> > >
> > > That other transaction, T1, would have run the same RI triggers and so
> > > would have the same parent records locked.
> >
> > That's not true in the case of delete, since the referencing table
> > triggers are on insert and update. . . .
>
> Let me see if I have this scenario right:
>
> Transaction T1 updates child record C1, with RI causing the parent P1 to
> be locked before the child.
>
> In the meantime transaction T2, successfully deletes C1 as it has not
> yet been locked.
>
> (Please tell me if I have misunderstood what you are saying)
>
> Yes in this case, T1 must abort because the record it was going to
> update has disappeared from underneath it.  I don't see how this is
> significantly different from the same race for the record if the table
> had no RI constraints.  The only difference that I can see, is that T1
> now has some locks that it must relinquish as the transaction aborts.
>
> > . . .  Second, the parent record locks are not
> > exclusive which means that both can be granted, so I don't see how this
> > stops the second from continuing before the first.
>
> I don't think this does stop the second from continuing before the
> first.  What will stop it, is the eventual lock that is taken on the
> child (triggering) record.

But at that point, you've already had to compose the new row in order to
call the trigger for the ri check, right? So, one of those new rows will
be out of date by the time it actually gets the lock. Presumably that
means that you need to recalculate the new row, but you've already done a
check and gotten a lock based on the old new row.

Also, another big problem is the fact that SQL requires that the action
already have happened before the check in cases where the constraint
references the same table.  The row being updated or inserted might
reference a row that will be updated or inserted by a later action of the
same statement.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Alvaro Herrera

Bruce Momjian wrote:
> Jan Wieck wrote:
> > On 2/7/2007 11:12 PM, Bruce Momjian wrote:
> > > Jan Wieck wrote:
> > >> On 2/7/2007 10:35 PM, Bruce Momjian wrote:
> > >>
> > >> > Also, I saw the trigger patch with no explaination of why it was
> > >> > important or who would use it --- that also isn't going to fly well.
> > >> 
> > >> You didn't respond to my explanation how the current Slony 
> > >> implementation could improve and evolve using it. Are you missing 
> > >> something? I am discussing this very issue with our own QA department, 
> > >> and thus far, I think I have a majority of "would use a pg_trigger 
> > >> backpatched PostgreSQL" vs. "No, I prefer a system that knows exactly 
> > >> how it corrupted my system catalog".
> > > 
> > > No, I _now_ understand the use case, but when the patch was posted, the
> > > use case was missing.  I would like to see a repost with the patch, and
> > > a description of its use so we can all move forward on that.
> > 
> > Is this a new policy that after discussion, all patches must be 
> > resubmitted with a summary and conclusions of the discussion? I can 
> > certainly do that for you, but just tell me if you are going to ask the 
> > same from everyone.
> 
> No, I am asking only this time because I feel there was too much
> disconnect between the patch and the extensive replication discussion
> that few community members would see the connection.

FYI, in my opinion the trigger addition is clearly useful to Mammoth
Replicator as well.  In fact, it's so obviously useful that I didn't see
a need to state that in the original thread where it was discussed.

Not sure about the timestamp stuff, because Replicator is not
multi-master, so there's no conflict resolution to take care of.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch

2007-02-08 Thread Tom Lane

[ time to move this thread to -hackers ]

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>>> Heikki Linnakangas wrote:
 Tom Lane wrote:
> BTW, I don't care much for the terminology "phantom cid" ... there's
> nothing particularly "phantom" about them, seeing they get onto disk.
> Can anyone think of a better name?  The best I can do offhand is
> "merged cid" or "cid pair", which aren't inspiring. 

 MultiCid, like the MultiXacts? Maybe not, they're quite different beasts...

>>> Dual cid?  Double cid?

>> "Double cid" doesn't sound too bad.  Another thought that just came to
>> mind is "cid interval" or some variant of that.

> I don't like "double ctid" because it is really just one ctid, but
> represents two.  I am thinking "packed ctid" is the right wording.  It
> doesn't have the same impact as "phantom", but it is probably better.

Packed doesn't seem to have quite the right connotation either --- it
sounds like it means there are two separable fields in the CID value.

Maybe "composite cid"?

Another issue that we need to think about before we go too far with this
is the problem that we punted on before 8.2 release: how to deal with
rolling back an upgrade of a row-level lock from shared to exclusive
within a subtransaction.  I'm a bit nervous about committing to merging
cmin and cmax before we have an idea how we're going to solve that ---
it might foreclose a solution.  Or maybe we could piggyback on phantom/
composite/whatever CIDs to solve it, which would be great, but let's
try to sketch out a solution now.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro

On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote:
> Marc Munro <[EMAIL PROTECTED]> writes:
> > Yes in this case, T1 must abort because the record it was going to
> > update has disappeared from underneath it.  I don't see how this is
> > significantly different from the same race for the record if the table
> > had no RI constraints.  The only difference that I can see, is that T1
> > now has some locks that it must relinquish as the transaction aborts.
> 
> No, the difference is there would have been no error at all before;
> if the record were deleted before T1 got to it then it wouldn't have
> attempted to update it.  I really don't think you can make it work
> to perform updates or deletes on a record you have not yet locked.

The record would be locked before the update or delete is attempted,
however it would not be locked until the referential integrity
constraints have succeeded in acquiring their locks.

It is becoming clear to me that I am missing something but I still don't
know what it is.  If anyone can see it and explain it I'd really
appreciate it.

__
Marc


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Bruce Momjian

Jan Wieck wrote:
> On 2/7/2007 11:12 PM, Bruce Momjian wrote:
> > Jan Wieck wrote:
> >> On 2/7/2007 10:35 PM, Bruce Momjian wrote:
> >> > I find the term "logical proof of it's correctness" too restrictive.  It
> >> > sounds like some formal academic process that really doesn't work well
> >> > for us.
> >> 
> >> Thank you.
> >> 
> >> > Also, I saw the trigger patch with no explaination of why it was
> >> > important or who would use it --- that also isn't going to fly well.
> >> 
> >> You didn't respond to my explanation how the current Slony 
> >> implementation could improve and evolve using it. Are you missing 
> >> something? I am discussing this very issue with our own QA department, 
> >> and thus far, I think I have a majority of "would use a pg_trigger 
> >> backpatched PostgreSQL" vs. "No, I prefer a system that knows exactly 
> >> how it corrupted my system catalog".
> > 
> > No, I _now_ understand the use case, but when the patch was posted, the
> > use case was missing.  I would like to see a repost with the patch, and
> > a description of its use so we can all move forward on that.
> 
> Is this a new policy that after discussion, all patches must be 
> resubmitted with a summary and conclusions of the discussion? I can 
> certainly do that for you, but just tell me if you are going to ask the 
> same from everyone.

No, I am asking only this time because I feel there was too much
disconnect between the patch and the extensive replication discussion
that few community members would see the connection.

I would also like to know what your new features does for each supported
option.  I have not seen that spelled out yet at all.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball

2007-02-08 Thread Andrew Hammond

On Feb 8, 11:28 am, [EMAIL PROTECTED] (Alvaro Herrera) wrote:
> Tom Lane wrote:
> > I thought it might be coming from your --with-libs switch somehow,
> > but when I add that to my configure command it does not change this
> > output at all.  Is it possible you've got environment variables
> > (like CFLAGS) that configure might be picking up?
>
> A stout I'll bet it's $LIBS

Bingo. I'll pay up next time I see you guys. Or you're welcome to drop
by Santa Barbara area to collect.

Andrew



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Tom Lane

Marc Munro <[EMAIL PROTECTED]> writes:
> Yes in this case, T1 must abort because the record it was going to
> update has disappeared from underneath it.  I don't see how this is
> significantly different from the same race for the record if the table
> had no RI constraints.  The only difference that I can see, is that T1
> now has some locks that it must relinquish as the transaction aborts.

No, the difference is there would have been no error at all before;
if the record were deleted before T1 got to it then it wouldn't have
attempted to update it.  I really don't think you can make it work
to perform updates or deletes on a record you have not yet locked.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball

2007-02-08 Thread Alvaro Herrera

Tom Lane wrote:

> I thought it might be coming from your --with-libs switch somehow,
> but when I add that to my configure command it does not change this
> output at all.  Is it possible you've got environment variables
> (like CFLAGS) that configure might be picking up?

A stout I'll bet it's $LIBS

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball

2007-02-08 Thread Tom Lane

"Andrew Hammond" <[EMAIL PROTECTED]> writes:
> I'm trying to build PostgreSQL 8.2.2 outside the ports system on a
> FreeBSD 6.2 amd64 box.

Note you *definitely* want 8.2.3 not 8.2.2.

> configure:2209: checking for C compiler default output file name
> configure:2212: gcc -O3 -funroll-loops -m64   conftest.c /usr/local/lib >&5
> /usr/local/lib: file not recognized: File format not recognized

That's really strange.  For reference, I get this:

configure:2209: checking for C compiler default output file name
configure:2212: gccconftest.c  >&5
configure:2215: $? = 0
configure:2261: result: a.out

I thought it might be coming from your --with-libs switch somehow,
but when I add that to my configure command it does not change this
output at all.  Is it possible you've got environment variables
(like CFLAGS) that configure might be picking up?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball

2007-02-08 Thread Alvaro Herrera

Andrew Hammond wrote:

> This file contains any messages produced by compilers while
> running configure, to aid debugging if configure makes a mistake.
> 
> It was created by PostgreSQL configure 8.2.2, which was
> generated by GNU Autoconf 2.59.  Invocation command line was
> 
>  $ ./configure --prefix=/usr/local/pgsql822_sl126_amd64_FreeBSD62
> --with-includes=/usr/local/include --with-libs=/usr/local/lib
> --enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS=-O3
> -funroll-loops -m64

[...]

> configure:2209: checking for C compiler default output file name
> configure:2212: gcc -O3 -funroll-loops -m64   conftest.c /usr/local/lib >&5
> /usr/local/lib: file not recognized: File format not recognized

Strange.  I tried the same args on 8.2.3, and that config.log extract
shows this:

configure:2209: checking for C compiler default output file name
configure:2212: gcc -O3 -funroll-loops -m64   conftest.c  >&5
[works]

Note no misplaced libs there.  The configure args were

./configure --prefix=/tmp --with-includes=/usr/local/include 
--with-libs=/usr/local/lib --enable-thread-saftey --with-openssl --enable-debug 
CC=gcc CFLAGS="-O3 -funroll-loops -m64"
(including the thread-saftey typo)

I noticed that config.log has unquoted CFLAGS -- I'd say that's a bug,
but maybe it's of no consequence.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Jan Wieck

On 2/7/2007 11:12 PM, Bruce Momjian wrote:

Jan Wieck wrote:

On 2/7/2007 10:35 PM, Bruce Momjian wrote:
> I find the term "logical proof of it's correctness" too restrictive.  It
> sounds like some formal academic process that really doesn't work well
> for us.

Thank you.

> Also, I saw the trigger patch with no explaination of why it was
> important or who would use it --- that also isn't going to fly well.

You didn't respond to my explanation how the current Slony 
implementation could improve and evolve using it. Are you missing 
something? I am discussing this very issue with our own QA department, 
and thus far, I think I have a majority of "would use a pg_trigger 
backpatched PostgreSQL" vs. "No, I prefer a system that knows exactly 
how it corrupted my system catalog".

No, I _now_ understand the use case, but when the patch was posted, the
use case was missing.  I would like to see a repost with the patch, and
a description of its use so we can all move forward on that.

Is this a new policy that after discussion, all patches must be 
resubmitted with a summary and conclusions of the discussion? I can 
certainly do that for you, but just tell me if you are going to ask the 
same from everyone.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball

2007-02-08 Thread Andrew Dunstan



--enable-thread-saftey is surely misspelled.

not sure about the rest.

cheers

andrew



Andrew Hammond wrote:

I'm trying to build PostgreSQL 8.2.2 outside the ports system on a
FreeBSD 6.2 amd64 box. The databases/postgresql81-server port builds
8.1.7 just fine on the same box. My configure fails. I'm new to
FreeBSD so I expect I'm missing something pretty obvious. config.log
follows. Line 2212 is very odd since it appears to be giving gcc the
library prefix as a file name.


This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by PostgreSQL configure 8.2.2, which was
generated by GNU Autoconf 2.59.  Invocation command line was

 $ ./configure --prefix=/usr/local/pgsql822_sl126_amd64_FreeBSD62
--with-includes=/usr/local/include --with-libs=/usr/local/lib
--enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS=-O3
-funroll-loops -m64




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-08 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> I'm honestly looking for some practical use of this.  We have debated 
> other NOTICE messages over the years, but they at least tell you 
> something you can use after the command.

The objection I had to the original patch (which didn't return a notice)
was that this seemed actively misleading:

foo=> DROP TABLE IF EXISTS not_there;
DROP TABLE
foo=>

I would be satisfied if the returned command tag were something else,
maybe "NO OPERATION".

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro

On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote:
> On Thu, 8 Feb 2007, Marc Munro wrote:
. . .
> >
> > That other transaction, T1, would have run the same RI triggers and so
> > would have the same parent records locked.
> 
> That's not true in the case of delete, since the referencing table
> triggers are on insert and update. . . .

Let me see if I have this scenario right: 

Transaction T1 updates child record C1, with RI causing the parent P1 to
be locked before the child.

In the meantime transaction T2, successfully deletes C1 as it has not
yet been locked.

(Please tell me if I have misunderstood what you are saying)

Yes in this case, T1 must abort because the record it was going to
update has disappeared from underneath it.  I don't see how this is
significantly different from the same race for the record if the table
had no RI constraints.  The only difference that I can see, is that T1
now has some locks that it must relinquish as the transaction aborts.

> . . .  Second, the parent record locks are not
> exclusive which means that both can be granted, so I don't see how this
> stops the second from continuing before the first.

I don't think this does stop the second from continuing before the
first.  What will stop it, is the eventual lock that is taken on the
child (triggering) record.  I am not proposing reducing the number of
locks taken, but rather changing the order in which the locks are taken.

 What am I missing? 

__
Marc

signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?

2007-02-08 Thread Larry Rosenman

I can set up build farm on it if yall need it.  Its running 6.2/amd64

--- Original Message ---
From: "Andrew Hammond" <[EMAIL PROTECTED]>
To: " Larry Rosenman " 
Sent: 2/8/07, 12:19:07 PM
Subject: Re: BuildFarm: Do we need another FreeBSD/amd64 member?

On Jan 19 2006, 9:36 pm, [EMAIL PROTECTED] ("Larry Rosenman") wrote:
> Tom Lane wrote:
> > "Larry Rosenman" <[EMAIL PROTECTED]> writes:
> >> I've got a fastFreeBSD/amd64 server available to run Buildfarm
> >> on.
>
> >> However, I see we already have a couple of others running it.
>
> >> My questions are:
> >> 1) do we need another one?
> >> 2) if yes, what options need coverage?
>
> > Looks like we're fairly well covered onfreebsdalready.  Are you
> > willing to consider running some less-popular OS on it?
>
> Not particularly, as I want it to do other things.  However, I might be
> willing
> To get VMWare running on it, and run something else in a VM.
>
> What were you thinking?


FreeBSD 6.2 was just released. I don't see any coverage for it yet.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball

2007-02-08 Thread Andrew Hammond


I'm trying to build PostgreSQL 8.2.2 outside the ports system on a
FreeBSD 6.2 amd64 box. The databases/postgresql81-server port builds
8.1.7 just fine on the same box. My configure fails. I'm new to
FreeBSD so I expect I'm missing something pretty obvious. config.log
follows. Line 2212 is very odd since it appears to be giving gcc the
library prefix as a file name.


This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by PostgreSQL configure 8.2.2, which was
generated by GNU Autoconf 2.59.  Invocation command line was

 $ ./configure --prefix=/usr/local/pgsql822_sl126_amd64_FreeBSD62
--with-includes=/usr/local/include --with-libs=/usr/local/lib
--enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS=-O3
-funroll-loops -m64

## - ##
## Platform. ##
## - ##

hostname = ahammond.ecoffice.experclick.com
uname -m = amd64
uname -r = 6.2-RELEASE
uname -s = FreeBSD
uname -v = FreeBSD 6.2-RELEASE #0: Fri Jan 12 08:43:30 UTC 2007
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP

/usr/bin/uname -p = amd64
/bin/uname -X = unknown

/bin/arch  = unknown
/usr/bin/arch -k   = unknown
/usr/convex/getsysinfo = unknown
hostinfo   = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /sbin
PATH: /bin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /usr/games
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /usr/X11R6/bin
PATH: /home/ahammond/bin


## --- ##
## Core tests. ##
## --- ##

configure:1408: checking build system type
configure:1426: result: x86_64-unknown-freebsd6.2
configure:1434: checking host system type
configure:1448: result: x86_64-unknown-freebsd6.2
configure:1458: checking which template to use
configure:1558: result: freebsd
configure:1700: checking whether to build with 64-bit integer date/time support
configure:1732: result: no
configure:1739: checking whether NLS is wanted
configure:1774: result: no
configure:1782: checking for default port number
configure:1812: result: 5432
configure:2170: checking for C compiler version
configure:2173: gcc --version &5
gcc (GCC) 3.4.6 [FreeBSD] 20060305
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

configure:2176: $? = 0
configure:2178: gcc -v &5
Using built-in specs.
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 3.4.6 [FreeBSD] 20060305
configure:2181: $? = 0
configure:2183: gcc -V &5
gcc: `-V' option must have argument
configure:2186: $? = 1
configure:2209: checking for C compiler default output file name
configure:2212: gcc -O3 -funroll-loops -m64   conftest.c /usr/local/lib >&5
/usr/local/lib: file not recognized: File format not recognized
configure:2215: $? = 1

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?

2007-02-08 Thread Andrew Hammond

On Jan 19 2006, 9:36 pm, [EMAIL PROTECTED] ("Larry Rosenman") wrote:
> Tom Lane wrote:
> > "Larry Rosenman" <[EMAIL PROTECTED]> writes:
> >> I've got a fastFreeBSD/amd64 server available to run Buildfarm
> >> on.
>
> >> However, I see we already have a couple of others running it.
>
> >> My questions are:
> >> 1) do we need another one?
> >> 2) if yes, what options need coverage?
>
> > Looks like we're fairly well covered onfreebsdalready.  Are you
> > willing to consider running some less-popular OS on it?
>
> Not particularly, as I want it to do other things.  However, I might be
> willing
> To get VMWare running on it, and run something else in a VM.
>
> What were you thinking?


FreeBSD 6.2 was just released. I don't see any coverage for it yet.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Stephan Szabo

On Thu, 8 Feb 2007, Marc Munro wrote:

> Oops, forgot to include pgsql-hackers when I responded to this the first
> time.
>
>  On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote:
> > Marc Munro <[EMAIL PROTECTED]> writes:
> > > The RI triggers currently fire when a record is updated.  Under my
> > > proposal they would fire in the same way but before the record is
> locked
> > > rather than after.  Or am I missing your point?
> >
> > IOW, some other transaction could update or delete the tuple
> meanwhile?
> > Doesn't seem very promising.
> >
>
> That other transaction, T1, would have run the same RI triggers and so
> would have the same parent records locked.

That's not true in the case of delete, since the referencing table
triggers are on insert and update. Second, the parent record locks are not
exclusive which means that both can be granted, so I don't see how this
stops the second from continuing before the first.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-08 Thread Andrew Dunstan


Peter Eisentraut wrote:
The downside is that while I wanted to use the IF EXISTS form to reduce 
the chatter at the beginning of schema loading scripts, this just gives 
me a different spelling of that same chatter.


  


There is possibly a good case for dropping the message level.

cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-08 Thread Peter Eisentraut

Andrew Dunstan wrote:
> Peter Eisentraut wrote:
> > What is the practical purpose of the notices emitted by DROP
> > SOMETHING IF EXISTS when the object in fact does not exist?
>
> It was asked for ...

The argument was that MySQL does the same.  Which is valid but not 
overriding.

I'm honestly looking for some practical use of this.  We have debated 
other NOTICE messages over the years, but they at least tell you 
something you can use after the command.  In this case, it just tells 
you that the object which you wanted removed no matter what didn't 
exist in the first place, but the state after the command (which is the 
interesting side) is always the same: "gone".  The only use case I see 
is informing about typos, but the system generally doesn't cater to 
that.

The downside is that while I wanted to use the IF EXISTS form to reduce 
the chatter at the beginning of schema loading scripts, this just gives 
me a different spelling of that same chatter.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-08 Thread Marc Munro

On Thu, 2007-08-02 at 18:06 +0100, Csaba Nagy wrote:

> The problem is that eliminating the deadlock is still not the complete
> cake... the interlocking still remains, possibly leading to degraded
> performance on high contention on very common parent rows. The real
> solution would be when an update on the parent table's non-referenced
> fields is not interlocking at all with updates of the child rows... and
> I think there were some proposals to do that.

Agreed.  There are two issues here, unnecessary blocking and deadlock.
These can be tackled separately.  My proposal deals only with the
deadlock issue. 

Even if if contention is reduced, for instance by implementing
column-level locking, there will still be the potential for deadlock
arising from inconsistent ordering of locks.  I continue to stand by my
proposal.

__
Marc

signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-08 Thread Csaba Nagy

On Thu, 2007-02-08 at 17:47, Marc Munro wrote:
> [snip] One of the causes of deadlocks in Postgres is that its referential
> integrity triggers can take locks in inconsistent orders.  Generally a
> child record will be locked before its parent, but not in all cases.
[snip]

The problem is that eliminating the deadlock is still not the complete
cake... the interlocking still remains, possibly leading to degraded
performance on high contention on very common parent rows. The real
solution would be when an update on the parent table's non-referenced
fields is not interlocking at all with updates of the child rows... and
I think there were some proposals to do that.

In fact one possibility to avoid this problem is vertical partitioning,
i.e. separating the non-key columns in a parallel table and updating
them there. However this is a choice only when you know it beforehand
and you're not inheriting the schema from other DBs...

Cheers,
Csaba.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-08 Thread Marc Munro

I am going to restate my earlier proposal, to clarify it and in the hope
of stimulating more discussion.

One of the causes of deadlocks in Postgres is that its referential
integrity triggers can take locks in inconsistent orders.  Generally a
child record will be locked before its parent, but not in all cases.

My proposal modifies the order in which locks are taken by referential
integrity triggers, so that parents are always locked before their
children.

The proposal is, that referential integrity triggers should fire before
locking the tuple from which they are triggered.  I guess a new sort of
trigger is required for this, a before-lock trigger.

If this is a dumb idea, please tell me why.  If it would cause more
problems than it solves, ditto.  If it would be difficult to implement,
let's discuss and try to find solutions.

__
Marc


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro

Oops, forgot to include pgsql-hackers when I responded to this the first
time.

 On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote:
> Marc Munro <[EMAIL PROTECTED]> writes:
> > The RI triggers currently fire when a record is updated.  Under my
> > proposal they would fire in the same way but before the record is
locked
> > rather than after.  Or am I missing your point?
> 
> IOW, some other transaction could update or delete the tuple
meanwhile?
> Doesn't seem very promising.
> 

That other transaction, T1, would have run the same RI triggers and so
would have the same parent records locked.  The blocked transaction, T2,
once T1 has committed, would fail.

I don't see this as being much different from the current case, where T1
locks and deletes or updates a row, and T2 then tries to manipulate the
same row.  In both cases, locks manage the race for the row, and MVCC
ensures that T2 fails.

__
Marc

signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Bitmap index thoughts

2007-02-08 Thread Gavin Sherry

On Thu, 8 Feb 2007, Heikki Linnakangas wrote:

> Gavin Sherry wrote:
> > I will update the code tomorrow. The focus will be cleaning up the
> > executor modifications. Please look else where for now.
>
> I'm getting a segfault with this test script:
>
> 
> CREATE TABLE bmtest (i int);
>
> INSERT INTO bmtest SELECT 1 FROM generate_series(1,10) a;
> INSERT INTO bmtest SELECT 10 FROM generate_series(1,100) a;
> DELETE FROM bmtest WHERE i = 1;
> VACUUM bmtest;
>
> CREATE INDEX i_bmtest ON bmtest USING bitmap (i);
>
> INSERT INTO bmtest SELECT 10 FROM generate_series(1,10) a;
> 
>

Hmm... this triggers a bug in the newly rewritten update code I think.
I'll post a fix soon.

Thanks for testing!

Gavin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

2007-02-08 Thread Stefan Kaltenbrunner


Alvaro Herrera wrote:

Heikki Linnakangas wrote:

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
---
Reduce WAL activity for page splits:


Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.


this patch killed both of my OpenBSD buildfarm members


Please note that zebra died for a different reason the second time (not
enough shared memory).


yeah that's because initdb is leaving the shm-segments there on a failure.
This resulted in the out-of-shm failure on the following run which I 
have manually cleared now ...


Stefan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Bitmap index thoughts

2007-02-08 Thread Heikki Linnakangas


Gavin Sherry wrote:

I will update the code tomorrow. The focus will be cleaning up the
executor modifications. Please look else where for now.


I'm getting a segfault with this test script:


CREATE TABLE bmtest (i int);

INSERT INTO bmtest SELECT 1 FROM generate_series(1,10) a;
INSERT INTO bmtest SELECT 10 FROM generate_series(1,100) a;
DELETE FROM bmtest WHERE i = 1;
VACUUM bmtest;

CREATE INDEX i_bmtest ON bmtest USING bitmap (i);

INSERT INTO bmtest SELECT 10 FROM generate_series(1,10) a;



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-08 Thread Heikki Linnakangas


Tom Lane wrote:

"Simon Riggs" <[EMAIL PROTECTED]> writes:

The basic idea is that when a tuple is UPDATEd we can, in certain
circumstances, avoid inserting index tuples for a tuple. Such tuples are
marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to
other tuples.


What is VACUUM FULL going to do when it wants to move one of these things?


I suppose it could move the whole chain to the same target page, if 
there is one with enough space to accommodate the whole chain. Or we 
could just cop out and not move tuples marked with HEAP_ONLY_TUPLE. I 
think that would be acceptable; after the last transaction that can see 
the old tuple is finished, the old tuple is dead. After that, VACUUM 
FULL can remove the old tuple and move the remaining tuple as usual.



CREATE INDEX requires some careful work to allow it to identify and
correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as
a result of the new index.


I think you've glossed over the CREATE INDEX problem much too easily.
The difficulty with that is going to be that de-HOT-ifying a tuple
is going to require multiple updates that can't possibly be put into
a single WAL record, and I don't think that WAL replay can clean up
after an incomplete update (since it can't run user-defined functions
and hence cannot be expected to compute index entries for itself).
So I don't think you can do that while preserving crash safety.


Yeah, chilling tuples from HOT state to normal tuples is not easy.

One solution I thought of is to add another flag to heap tuples, 
CHILL_IN_PROGRESS. To chill a tuple, you would:


1. Mark heap tuple with CHILL_IN_PROGRESS
2. Insert missing index entries
3. Clear CHILL_IN_PROGRESS and HEAP_ONLY_TUPLE flags

Index scans would ignore tuples with CHILL_IN_PROGRESS and directly 
pointed to from the index. That way if we crash in the middle of step 2, 
scans and updates would work normally after replay, as if the index 
entries weren't there. CREATE INDEX would have to fail if there's any 
CHILL_IN_PROGRESS tuples, because we wouldn't know which index entries 
need to be inserted; some might already be there. To clear the 
CHILL_IN_PROGRESS flag, a vacuum would be needed. Vacuum would remove 
all index entries for those tuples, but instead of removing the heap 
tuple in the 2nd scan it would just clear the CHILL_IN_PROGRESS flag, 
bringing us back to where we started.



However, the easiest solution would be to make CREATE INDEX wait until 
the old tuple is dead. That should be ok at least for concurrent CREATE 
INDEX, because it already has that kind of a wait between 1st and 2nd 
phase.



Removing the root tuple will require a VACUUM *FULL*.


That seems unacceptable ... it won't take too long for your table to
fill up with stubs, and we don't want to return to the bad old days
when periodic VACUUM FULL was unavoidable.


Agreed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

2007-02-08 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> Stefan Kaltenbrunner wrote:
> >Bruce Momjian wrote:
> >>Log Message:
> >>---
> >>Reduce WAL activity for page splits:
> >>
> >>>Currently, an index split writes all the data on the split page to
> >>>WAL. That's a lot of WAL traffic. The tuples that are copied to the
> >>>right page need to be WAL logged, but the tuples that stay on the
> >>>original page don't.
> >
> >
> >this patch killed both of my OpenBSD buildfarm members

Please note that zebra died for a different reason the second time (not
enough shared memory).

> lopaque was referenced after pfreeing the temp page it pointed to. Also 
> later in the function the LSN of the left page was set, but again using 
> a pointer to the pfreed temp copy instead of the real shared memory buffer.

Applied, thanks.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] compilation of pg_config fails

2007-02-08 Thread Peter Eisentraut

Markus Schiltknecht wrote:
> since str(n?)cat got replaced with strlcat, I fail to build

Since the buildfarm thinks it's OK, I'd first of all check whether you 
have a complete and consistent checkout.  Note that the default cvs 
options will probably not get you one.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] compilation of pg_config fails

2007-02-08 Thread Heikki Linnakangas


Markus Schiltknecht wrote:

Hi,

since str(n?)cat got replaced with strlcat, I fail to build PostgreSQL 
(current CVS HEAD). HAVING_DECL_STRLCAT is not set, so AFAIK, the 
strlcat() function from src/port should be used. However, I've read the 
README there, but still don't quite know what's wrong.


The linker throws:

gcc -O1 -Wall -pg -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g 
pg_config.o  -L../../../src/port  -Wl,-rpath,'/usr/local/pgsql/lib' 
-lpgport -lz -lreadline -lcrypt -ldl -lm  -o pg_config

pg_config.o: In function `show_pgxs':
/home/markus/projects/pgsql/sources/trunk/src/bin/pg_config/pg_config.c:216: 


undefined reference to `strlcat'

Even if objdump confirms that strlcat.o is compiled in 
../../../src/port/libpgport.o:


In archive ../../../src/port/libpgport.a:

strlcpy.o: file format elf32-i386

SYMBOL TABLE:
 ldf *ABS*   strlcpy.c
 ld  .text   .text
 ld  .data   .data
 ld  .bss    .bss
 ld  .debug_abbrev   .debug_abbrev
 ld  .debug_info .debug_info
 ld  .debug_line .debug_line
 ld  .debug_frame    .debug_frame
 ld  .debug_loc  .debug_loc
 ld  .debug_pubnames .debug_pubnames
 ld  .debug_aranges  .debug_aranges
 ld  .debug_str  .debug_str
 ld  .note.GNU-stack .note.GNU-stack
 ld  .comment    .comment
 g F .text  0045 strlcpy
 *UND*   mcount


That dump shows strl*cpy*, not strlcat.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

[HACKERS] compilation of pg_config fails

2007-02-08 Thread Markus Schiltknecht


Hi,

since str(n?)cat got replaced with strlcat, I fail to build PostgreSQL 
(current CVS HEAD). HAVING_DECL_STRLCAT is not set, so AFAIK, the 
strlcat() function from src/port should be used. However, I've read the 
README there, but still don't quite know what's wrong.


The linker throws:

gcc -O1 -Wall -pg -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g 
pg_config.o  -L../../../src/port  -Wl,-rpath,'/usr/local/pgsql/lib' 
-lpgport -lz -lreadline -lcrypt -ldl -lm  -o pg_config

pg_config.o: In function `show_pgxs':
/home/markus/projects/pgsql/sources/trunk/src/bin/pg_config/pg_config.c:216: 


undefined reference to `strlcat'

Even if objdump confirms that strlcat.o is compiled in 
../../../src/port/libpgport.o:


In archive ../../../src/port/libpgport.a:

strlcpy.o: file format elf32-i386

SYMBOL TABLE:
 ldf *ABS*   strlcpy.c
 ld  .text   .text
 ld  .data   .data
 ld  .bss    .bss
 ld  .debug_abbrev   .debug_abbrev
 ld  .debug_info .debug_info
 ld  .debug_line .debug_line
 ld  .debug_frame    .debug_frame
 ld  .debug_loc  .debug_loc
 ld  .debug_pubnames .debug_pubnames
 ld  .debug_aranges  .debug_aranges
 ld  .debug_str  .debug_str
 ld  .note.GNU-stack .note.GNU-stack
 ld  .comment    .comment
 g F .text  0045 strlcpy
 *UND*   mcount

...

# uname -a:

Linux grml 2.6.20 #1 SMP PREEMPT Tue Feb 6 14:48:26 PST 2007 i686 GNU/Linux

Any idea?

Regards

Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

2007-02-08 Thread Stefan Kaltenbrunner


Heikki Linnakangas wrote:

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
---
Reduce WAL activity for page splits:


Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.



this patch killed both of my OpenBSD buildfarm members


lopaque was referenced after pfreeing the temp page it pointed to. Also 
later in the function the LSN of the left page was set, but again using 
a pointer to the pfreed temp copy instead of the real shared memory buffer.


Here's a fix.


confirmed - with that patch -HEAD passes a full regression test run at 
least on emu.



Stefan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

2007-02-08 Thread Heikki Linnakangas


Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
---
Reduce WAL activity for page splits:


Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.



this patch killed both of my OpenBSD buildfarm members


lopaque was referenced after pfreeing the temp page it pointed to. Also 
later in the function the LSN of the left page was set, but again using 
a pointer to the pfreed temp copy instead of the real shared memory buffer.


Here's a fix.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/nbtree/nbtinsert.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.150
diff -c -r1.150 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c	8 Feb 2007 05:05:53 -	1.150
--- src/backend/access/nbtree/nbtinsert.c	8 Feb 2007 09:36:05 -
***
*** 953,959 
  		xlrec.rightsib = BufferGetBlockNumber(rbuf);
  		xlrec.firstright = firstright;
  		xlrec.rnext = ropaque->btpo_next;
! 		xlrec.level = lopaque->btpo.level;
  
  		rdata[0].data = (char *) &xlrec;
  		rdata[0].len = SizeOfBtreeSplit;
--- 953,959 
  		xlrec.rightsib = BufferGetBlockNumber(rbuf);
  		xlrec.firstright = firstright;
  		xlrec.rnext = ropaque->btpo_next;
! 		xlrec.level = ropaque->btpo.level;
  
  		rdata[0].data = (char *) &xlrec;
  		rdata[0].len = SizeOfBtreeSplit;
***
*** 962,968 
  		lastrdata = &rdata[0];
  
  		/* Log downlink on non-leaf pages. */
! 		if (lopaque->btpo.level > 0)
  		{
  			lastrdata->next = lastrdata + 1;
  			lastrdata++;
--- 962,968 
  		lastrdata = &rdata[0];
  
  		/* Log downlink on non-leaf pages. */
! 		if (ropaque->btpo.level > 0)
  		{
  			lastrdata->next = lastrdata + 1;
  			lastrdata++;
***
*** 1040,1047 
  
  		recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata);
  
! 		PageSetLSN(leftpage, recptr);
! 		PageSetTLI(leftpage, ThisTimeLineID);
  		PageSetLSN(rightpage, recptr);
  		PageSetTLI(rightpage, ThisTimeLineID);
  		if (!P_RIGHTMOST(ropaque))
--- 1040,1047 
  
  		recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata);
  
! 		PageSetLSN(origpage, recptr);
! 		PageSetTLI(origpage, ThisTimeLineID);
  		PageSetLSN(rightpage, recptr);
  		PageSetTLI(rightpage, ThisTimeLineID);
  		if (!P_RIGHTMOST(ropaque))

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Proposal: Commit timestamp

2007-02-08 Thread Zeugswetter Andreas ADI SD


> Yes, yes, and yes ... but aside from the problem that you use the very

> ambiguous word "timestamp" (which somehow suggests using a "clock" of 
> some sort), isn't the "begin" timestamp of a long running transaction
 
imho a begin timestamp is near useless

> worse than the "commit" timestamp, when all its work got visible to
the 
> outside world instantaneously?

This is one of the areas I am still worried about. Is one commit lamport
timestamp enough ? 
I think for some conflict resolutions we need to look at the
row level, and resolve conflicts per row and not per transaction 
(yes, this means that a tx might get partially replicated).

What I am trying to lead at is: maybe an infrastructure to produce
wieck lamport timestamps, that can be used in different places like
commit hooks and column defaults, would be of more general use. Maybe
such
a column could be a system column that is not visible with "select *"
for those cases where commit is not enough. And a commit hook could
insert it into clog like storage.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Ooops ... seems we need a re-release pronto

2007-02-08 Thread Jim C. Nasby

On Tue, Feb 06, 2007 at 02:16:05PM -0500, Tom Lane wrote:
> Jim Nasby <[EMAIL PROTECTED]> writes:
> > On Feb 6, 2007, at 12:40 PM, Tom Lane wrote:
> >> ... massive expansion of the tests doesn't seem justified
> 
> > What about the idea that's been floated in the past about a -- 
> > extensive mode for regression testing that would (generally) only be  
> > used by the build farm. That would mean others wouldn't have to  
> > suffer through extremely long make check's.
> 
> > Or is there another reason not to expand the tests?
> 
> I'm not concerned so much about the runtime as the development and
> maintenance effort...

I can see development... but are there enough changes where the
maintenance would be an issue?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

2007-02-08 Thread Heikki Linnakangas


Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
---
Reduce WAL activity for page splits:


Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.



this patch killed both of my OpenBSD buildfarm members


Sorry about that.. I'll take a look at it.



a backtrace of the crash shows:

(gdb) bt
#0  0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50, 
newitemoff=54, newitemsz=76,

newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956
#1  0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53, 
stack=0x8210a078, keysz=2, scankey=0x8210a3f0,
itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at 
nbtinsert.c:538
#2  0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364, 
index_is_unique=1 '\001', heapRel=0x7ec0ee7c)

at nbtinsert.c:141
#3  0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224
#4  0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2, 
arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267
#5  0x1c052d20 in index_insert (indexRelation=0x835fcab0, 
values=0xcf7defd0, isnull=0xcf7df050 "",
heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1 
'\001') at indexam.c:196
#6  0x1c08f87e in CatalogIndexInsert (indstate=0x2, 
heapTuple=0x8210a224) at indexing.c:124
#7  0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c, 
heapTuple=0x7df53e4c) at indexing.c:149
#8  0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1 
'\001', relIsBecomingView=0 '\0')

at rewriteSupport.c:80
#9  0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at 
rewriteDefine.c:432
#10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484, 
dest=0x3c0c8ff4,

completionTag=0xcf7df2a0 "") at pquery.c:1063
#11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4, 
altdest=0x3c0c8ff4,

completionTag=0xcf7df2a0 "") at pquery.c:1131
#12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647, 
dest=0x3c0c8ff4, altdest=0x3c0c8ff4,

completionTag=0xcf7df2a0 "") at pquery.c:700
#13 0x1c187d13 in exec_simple_query (
query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n * 
Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n * 
$PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35 
2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939
#14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480, 
username=0x80ba55b0 "mastermind") at postgres.c:3423

#15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

2007-02-08 Thread Stefan Kaltenbrunner


Bruce Momjian wrote:

Log Message:
---
Reduce WAL activity for page splits:


Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.



this patch killed both of my OpenBSD buildfarm members

a backtrace of the crash shows:

(gdb) bt
#0  0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50, 
newitemoff=54, newitemsz=76,

newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956
#1  0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53, 
stack=0x8210a078, keysz=2, scankey=0x8210a3f0,
itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at 
nbtinsert.c:538
#2  0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364, 
index_is_unique=1 '\001', heapRel=0x7ec0ee7c)

at nbtinsert.c:141
#3  0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224
#4  0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2, 
arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267
#5  0x1c052d20 in index_insert (indexRelation=0x835fcab0, 
values=0xcf7defd0, isnull=0xcf7df050 "",
heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1 
'\001') at indexam.c:196
#6  0x1c08f87e in CatalogIndexInsert (indstate=0x2, 
heapTuple=0x8210a224) at indexing.c:124
#7  0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c, 
heapTuple=0x7df53e4c) at indexing.c:149
#8  0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1 
'\001', relIsBecomingView=0 '\0')

at rewriteSupport.c:80
#9  0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at 
rewriteDefine.c:432
#10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484, 
dest=0x3c0c8ff4,

completionTag=0xcf7df2a0 "") at pquery.c:1063
#11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4, 
altdest=0x3c0c8ff4,

completionTag=0xcf7df2a0 "") at pquery.c:1131
#12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647, 
dest=0x3c0c8ff4, altdest=0x3c0c8ff4,

completionTag=0xcf7df2a0 "") at pquery.c:700
#13 0x1c187d13 in exec_simple_query (
query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n * 
Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n * 
$PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35 
2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939
#14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480, 
username=0x80ba55b0 "mastermind") at postgres.c:3423

#15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

70 matches

Mail list logo