Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Thu, Apr 26, 2007 at 01:35:42PM -0400, Tom Lane wrote:
> AFAICS you do not need to inline create_statement.  The risk factor
> is where you call a routine that does something with a va_list, and
> then you want to do something else (other than va_end) with that va_list
> after it returns.  The one use of create_statement doesn't do that,
> hence no problem.  (We know this works, because stringinfo.c does it.)

Given all the problems I had so far with this code I'd prefer to keep it
inlined. :-)

> BTW, I think there is a small bug there: you fail to call va_end() in the
> failure path.  I'm not sure if there are any modern machines where
> va_end() isn't a no-op, but nonetheless the code isn't meeting the spec.

Fixed. Thanks.

Michael

-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pgsql crollable cursor doesn't support one form ofpostgresql's cu

2007-04-26 Thread Pavel Stehule

Hello,

it's true. There is bug. I'll send actualised version tomorrow.

Regards
Pavel


I haven't read the rest of the thread yet, but is this hunk not buggy?
yylex() is side-effecting, so the two calls to yylex() do not do what
the comment suggests.

>
> ! /* check FROM or IN keyword after direction's specification */
> ! if (check_FROM && (yylex() != K_FROM && yylex() != K_IN))
> ! yyerror("expected \"FROM/IN\"");
>
>   return fetch;

-Neil




_
Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. 
http://messenger.msn.cz/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pgsql crollable cursor doesn't support one form of postgresql's cu

2007-04-26 Thread Neil Conway
I haven't read the rest of the thread yet, but is this hunk not buggy?
yylex() is side-effecting, so the two calls to yylex() do not do what
the comment suggests.

> *** 2083,2091 
>   check_FROM = false;
>   }
> 
> ! /* check FROM keyword after direction's specification */
> ! if (check_FROM && yylex() != K_FROM)
> ! yyerror("expected \"FROM\"");
> 
>   return fetch;
>   }
> --- 2089,2097 
>   check_FROM = false;
>   }
> 
> ! /* check FROM or IN keyword after direction's specification */
> ! if (check_FROM && (yylex() != K_FROM && yylex() != K_IN))
> ! yyerror("expected \"FROM/IN\"");
> 
>   return fetch;

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] too much WAL volume

2007-04-26 Thread Greg Smith

On Thu, 26 Apr 2007, Zeugswetter Andreas ADI SD wrote:


I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second.


With the kind of caching controller that's necessary for any serious OLTP 
work with Postgres, number of I/Os per second isn't really an important 
number.  Total volume of writes to the WAL volume can be though.  It's 
difficult but not impossible to encounter a workload that becomes 
bottlenecked by WAL volume on a good OLTP server, particularly because 
that's often going to a single or RAID-1 disk.  Whether those workloads 
also have the appropriate properties such that their WAL could be shrunk 
usefully in real-time is a good question.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Tom Lane
Dave Page <[EMAIL PROTECTED]> writes:
> I've been seeing this failure intermittently on Narwhal HEAD, and once
> on 8.1. Other branches have been OK, as have other animals running on
> the same physical box. Narwhal-HEAD is run more often than any other
> builds however.

Oh, this is interesting:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=baiji&dt=2007-04-26%2022:00:02

Different compiler, different OS, not quite the same block number (109,
whereas IIRC all the previous examples have complained of block 104).
Is this the same physical machine as narwhal?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [COMMITTERS] pgsql: Remove some of the most blatant brain-fade in the recent guc

2007-04-26 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Is anyone working on this fix?

I dunno, but that patch is gonna get reverted altogether if someone
doesn't fix the fact that it broke PGCLIENTENCODING ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Feature freeze progress report

2007-04-26 Thread Bruce Momjian
Now that we are half-way though the scheduled feature freeze, I wanted
to share my thoughts about this period.

Having just pushed all open items into the patches queue or 8.4 hold
queue, I am seeing that we have many more in-process patches than we
normally do at this stage of the process.  I think there are three
reasons for this:

1)  The patches are not necessarily larger, but are more complex because
much most of the easy TODO items have already been written for previous
PostgreSQL releases.

2)  We have a number of new developers who took on some of these complex
TODO items, and some of the TODO items were significantly beyond the
developer capabilities at the start of the process.

3)  Many of the complex patches are hard to review because they deal
with very complex areas of the code, like reliability or transaction
semantics.

Our community could probably handle a few of these complex patches, but
the volume for this release is significantly higher than previous
releases.  The community is doing a good job of giving patch writers
feedback and new patch versions are getting generated.  However, this
amount of patch churn is not normal.

There are a few possible results that might come out of this:

1)  Feature freeze will be much longer.
2)  More patches will be postponed for later releases than usual.
3)  Most patches will be included but beta will be longer because
of bug fixing.
4)  Most patches will be included but beta will not be any longer.

I think we all hope for #4, but right now, I don't know the probability
of that.  We are going to have to think creatively in the coming weeks
to increase the likelihood of a #4 result.  However, right now, I can't
think of what we can do to improve the odds.  I think the community has
to come up with ideas on how to accomplish this.

[ FYI, I leave on a 2-week trip tomorrow/Friday.]

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] autovacuum does not start in HEAD

2007-04-26 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
> 
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.
> 
> The point was in the resolution of timer; There is a platform that timer
> has only a resolution of milliseconds. We initialize adl_next_worker with
> current_time in rebuild_database_list(), but we could use again the same
> value in do_start_worker(), because there is no measurable difference
> in those low-resolution-platforms.
> 
> 
> Another attached patch, autovacuum-debug.patch, is just for printf-debug.
> I got the following logs without fix -- autovacuum never works.
> 
> # SELECT oid, datname FROM pg_database ORDER BY oid;
>   oid  |  datname  
> ---+---
>  1 | template1
>  11494 | template0
>  11495 | postgres
>  16384 | bench
> (4 rows)
> 
> # pgbench bench -s1 -c1 -t10
> [with configurations of autovacuum_naptime = 10s and log_min_messages = 
> debug1]
> 
> LOG:  do_start_worker skip : 230863399.25, 230863399.25, 
> 230863409.25
> LOG:  rebuild_database_list: db=11495, time=230863404.25
> LOG:  rebuild_database_list: db=16384, time=230863409.25
> DEBUG:  autovacuum: processing database "bench"
> LOG:  do_start_worker skip : 230863404.25, 230863404.25, 
> 230863414.25
> LOG:  do_start_worker skip : 230863404.25, 230863409.25, 
> 230863414.25
> LOG:  rebuild_database_list: db=11495, time=230863409.25
> LOG:  rebuild_database_list: db=16384, time=230863414.25
> LOG:  do_start_worker skip : 230863409.25, 230863409.25, 
> 230863419.25
> LOG:  do_start_worker skip : 230863409.25, 230863414.25, 
> 230863419.25
> LOG:  rebuild_database_list: db=11495, time=230863414.25
> LOG:  rebuild_database_list: db=16384, time=230863419.25
> ...
> (no autovacuum activities forever)
> 
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
> 

[ Attachment, skipping... ]

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Re: [COMMITTERS] pgsql: Remove some of the most blatant brain-fade in the recent guc

2007-04-26 Thread Bruce Momjian

Is anyone working on this fix?

---

Tom Lane wrote:
> Log Message:
> ---
> Remove some of the most blatant brain-fade in the recent guc patch
> (it's so nice to have a buildfarm member that actively rejects naked
> uses of strcasecmp).  This coding is still pretty awful, though, since
> it's going to be O(N^2) in the number of guc variables.  May I direct
> your attention to bsearch?
> 
> Modified Files:
> --
> pgsql/src/backend/utils/misc:
> guc.c (r1.387 -> r1.388)
> 
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/misc/guc.c.diff?r1=1.387&r2=1.388)
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pgsql crollable cursor doesn't support one form of postgresql's cu

2007-04-26 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Pavel Stehule wrote:
> >
> >"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> > > I found one unsupported form plpgsql's fetch statement which is  
> >supported
> > > by postgresql.
> >
> > > PostgreSQL knows
> > > FETCH 3 FROM 
> >
> > > but plpgsql needs everytime direction's keyword.
> >
> >No, I think that's OK, because that form specifies fetching 3 rows,
> >which plpgsql's FETCH doesn't support.
> >
> 
> it's true. There is same question for move statement too. Other difference 
> is unsupported keyword IN.
> 
> It can be fixed:
> 
> *** ./gram.y.orig 2007-04-19 20:27:17.0 +0200
> --- ./gram.y  2007-04-19 20:41:16.0 +0200
> ***
> *** 2059,2071 
>   else if (pg_strcasecmp(yytext, "absolute") == 0)
>   {
>   fetch->direction = FETCH_ABSOLUTE;
> ! fetch->expr = plpgsql_read_expression(K_FROM, "FROM");
>   check_FROM = false;
>   }
>   else if (pg_strcasecmp(yytext, "relative") == 0)
>   {
>   fetch->direction = FETCH_RELATIVE;
> ! fetch->expr = plpgsql_read_expression(K_FROM, "FROM");
>   check_FROM = false;
>   }
>   else if (pg_strcasecmp(yytext, "forward") == 0)
> --- 2059,2071 
>   else if (pg_strcasecmp(yytext, "absolute") == 0)
>   {
>   fetch->direction = FETCH_ABSOLUTE;
> ! fetch->expr = read_sql_construct(K_FROM, K_IN, "FROM/IN", 
> "SELECT ", 
> true, true, NULL);
>   check_FROM = false;
>   }
>   else if (pg_strcasecmp(yytext, "relative") == 0)
>   {
>   fetch->direction = FETCH_RELATIVE;
> ! fetch->expr = read_sql_construct(K_FROM, K_IN, "FROM/IN", 
> "SELECT ", 
> true, true, NULL);
>   check_FROM = false;
>   }
>   else if (pg_strcasecmp(yytext, "forward") == 0)
> ***
> *** 2076,2081 
> --- 2076,2087 
>   {
>   fetch->direction = FETCH_BACKWARD;
>   }
> + else if (tok != T_SCALAR)
> + {
> + plpgsql_push_back_token(tok);
> + fetch->expr = read_sql_construct(K_FROM, K_IN, "FROM/IN", 
> "SELECT ", 
> true, true, NULL);
> + check_FROM = false;
> + }
>   else
>   {
>   /* Assume there's no direction clause */
> ***
> *** 2083,2091 
>   check_FROM = false;
>   }
> 
> ! /* check FROM keyword after direction's specification */
> ! if (check_FROM && yylex() != K_FROM)
> ! yyerror("expected \"FROM\"");
> 
>   return fetch;
>   }
> --- 2089,2097 
>   check_FROM = false;
>   }
> 
> ! /* check FROM or IN keyword after direction's specification */
> ! if (check_FROM && (yylex() != K_FROM && yylex() != K_IN))
> ! yyerror("expected \"FROM/IN\"");
> 
>   return fetch;
>   }
> 
> Regards
> Pavel Stehule
> 
> _
> Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] elog(FATAL) vs shared memory

2007-04-26 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Where are we on this?
> 
> Still trying to think of a less messy solution...

OK, put in the patches hold queue for 8.4.

---


> 
> >> What it essentially says is that trying to clean up shared-memory
> >> state in a PG_TRY block is unsafe: you can't be certain you'll
> >> get to do it.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] elog(FATAL) vs shared memory

2007-04-26 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Where are we on this?

Still trying to think of a less messy solution...

>> What it essentially says is that trying to clean up shared-memory
>> state in a PG_TRY block is unsafe: you can't be certain you'll
>> get to do it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCH] A crash and subsequent recovery of the master can cause the slave to get out-of-sync

2007-04-26 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Florian G. Pflug wrote:
> Hi
> 
> I believe I have discovered the following problem in pgsql 8.2 and HEAD,
> concerning warm-standbys using WAL log shipping.
> 
> The problem is that after a crash, the master might complete incomplete
> actions via rm_cleanup() - but since it won't wal-log those changes,
> the slave won't know about this. This will at least prevent the creation
> of any further restart points on the slave (because safe_restartpoint)
> will never return true again - it it might even cause data corruption,
> if subsequent wal records are interpreted wrongly by the slave because
> it sees other data than the master did when it generated them.
> 
> Attached is a patch that lets RecoveryRestartPoint call all
> rm_cleanup() methods and create a restart point whenever it encounters
> a shutdown checkpoint in the wal (because those are generated after
> recovery). This ought not cause a performance degradation, because
> shutdown checkpoints will occur very infrequently.
> 
> The patch is per discussion with Simon Riggs.
> 
> I've not yet had a chance to test this patch, I only made sure
> that it compiles. I'm sending this out now because I hope this
> might make it into 8.2.4.
> 
> greetings, Florian Pflug


> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Background LRU Writer/free list

2007-04-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Greg Smith wrote:
> I'm mostly done with my review of the "Automatic adjustment of 
> bgwriter_lru_maxpages" patch.  In addition to issues already brought up 
> with that code, there are some small things that need to be done to merge 
> it with the recent pg_stat_bgwriter patch, and I have some concerns about 
> its unbounded scanning of the buffer pool; I'll write that up in more 
> detail or just submit an improved patch as I get time this week.
> 
> But there's a fundamental question that has been bugging me, and I think 
> it impacts the direction that code should take.  Unless I'm missing 
> something in my reading, buffers written out by the LRU writer aren't ever 
> put onto the free list.  I assume this is to stop from prematurely 
> removing buffers that contain useful data.  In cases where a substantial 
> percentage of the buffer cache is dirty, the LRU writer has to scan a 
> significant portion of the pool looking for one of the rare clean buffers, 
> then write it out.  When a client goes to grab a free buffer afterward, it 
> has to scan the same section of the pool to find the now clean buffer, 
> which seems redundant.
> 
> With the new patch, the LRU writer is fairly well bounded in that it 
> doesn't write out more than it thinks it will need; you shouldn't get into 
> a situation where many more pages are written than will be used in the 
> near future.  Given that mindset, shouldn't pages the LRU scan writes just 
> get moved onto the free list?
> 
> --
> * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee

2007-04-26 Thread Bruce Momjian
Simon Riggs wrote:
> > That should go away entirely; to me the main point of the separate
> > wal-writer process is to take over responsibility for not letting too
> > many dirty wal buffers accumulate.
> 
> Yes
> 
> 
> I'll make the agreed changes by next Wed/Thurs. 

I have seen no patch yet with the agreed changes.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] CIC and deadlocks

2007-04-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Pavan Deolasee wrote:
> On 4/11/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> >
> > [ itch... ]  The problem is with time-extended execution of
> > GetSnapshotData; what happens if the other guy lost the CPU for a good
> > long time while in the middle of GetSnapshotData?  He might set his
> > xmin based on info you saw as long gone.
> >
> > You might be correct that it's safe, but the argument would have to
> > hinge on the OldestXmin process being unable to commit because of
> > someone holding shared ProcArrayLock; a point you are definitely not
> > making above.  (Study the comments in GetSnapshotData for awhile,
> > also those in xact.c's commit-related code.)
> >
> >
> My argument was based on what you said above, but I obviously did not
> state it well :)
> 
> Anyways, I think its better to be safe and we agree that its not such a
> bad thing to take exclusive lock on procarray because CIC is not something
> that happens very often. Attached is a revised patch which takes exclusive
> lock on the procarray, rest remaining the same.
> 
> Thanks,
> Pavan
> 
> -- 
> 
> EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Fix for large file support

2007-04-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
> [ redirecting to -hackers for wider comment ]
> 
> Zdenek Kotala <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> > LET_OS_MANAGE_FILESIZE is good way. I think one problem of this option I 
> > fixed. It is size of offset. I went thru the code and did not see any 
> > other problem there. However, how you mentioned it need more testing. I 
> > going to take server with large disk array and I will test it.
> 
> > I would like to add --enable-largefile switch to configure file to 
> > enable access to wide group of users. What you think about it?
> 
> Yeah, I was going to suggest the same thing --- but not with that switch
> name.  We already use enable/disable-largefile to control whether 64-bit
> file access is built at all (this mostly affects pg_dump at the moment).
> 
> I think the clearest way might be to flip the sense of the variable.
> I never found "LET_OS_MANAGE_FILESIZE" to be a good name anyway.  I'd
> suggest "USE_SEGMENTED_FILES", which defaults to "on", and you can
> turn it off via --disable-segmented-files if configure confirms your
> OS has largefile support (thus you could not specify both this and
> --disable-largefile).
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-26 Thread Koichi Suzuki

Josh,

Josh Berkus wrote:

Koichi, Andreas,


1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint.   It consumes much of WAL space.


We need to find a way around this someday.  Other DBs don't do this; it may be 
becuase they're less durable, or because they fixed the problem.


Maybe both.   Fixing the problem may need some means to detect 
partial/inconsistent writes to the data files, which may needs 
additional CPU resource.





I don't think there should be only one setting.   It depend on how
database is operated.   Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling.   I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all.   So I prefer to leave the default off.


Except that, is there any reason to turn this off if we are archiving?  Maybe 
it should just be slaved to archive_command ... if we're not using PITR, it's 
off, if we are, it's on.


Hmm, this sounds to work.  On the other hand, existing users, who are 
happy with the current archiving and would not like to change current 
archiving command to pg_compresslog or archive log size will increase a 
bit.  I'd like to hear some more on this.





1) is there any throughput benefit for platforms with fast CPU but
contrained I/O (e.g. 2-drive webservers)?  Any penalty for servers with
plentiful I/O?

I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL.   In this situation, total I/O decreases because writes to archive
log decreases.   Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.


Yeah, I was just looking for a way to make this a performance feature.  I see 
now that it can't be.  ;-)


As to the performance feature, I tested the patch against 8.3HEAD. 
With pgbench, throughput was as follows:

Case1. Archiver: cp command, wal_add_optimization_info = off,
   full_page_writes=on
Case2. Archiver: pg_compresslog, wal_add_optimization_info = on,
   full_page_writes=on
DB Size: 1.65GB, Total transaction:1,000,000

Throughput was:
Case1: 632.69TPS
Case2: 653.10TPS ... 3% gain.

Archive Log Size:
Case1: 1.92GB
Case2: 0.57GB (about 30% of the Case1)... Before compression, the size 
was 1.92GB.  Because this is based on the number of WAL segment file 
size, there will be at most 16MB error in the measurement.  If we count 
this, the increase in WAL I/O will be less than 1%.





3) How is this better than command-line compression for log-shipping? 
e.g. why do we need it in the database?

I don't fully understand what command-line compression means.   Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.


OK, that answered my question.


This is why I don't like Josh's suggested name of wal_compressable
eighter.
WAL is compressable eighter way, only pg_compresslog would need to be
more complex if you don't turn off the full page optimization. I think a
good name would tell that you are turning off an optimization.
(thus my wal_fullpage_optimization on/off)


Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling 
and I think our general user base will find it even more so.  Now that I have 
Koichi's explanation of the problem, I vote for simply slaving this to the 
PITR settings and not having a separate option at all.


Could I have more specific suggestion on this?

Regards;


--
-
Koichi Suzuki

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Modifying TOAST thresholds

2007-04-26 Thread Bruce Momjian

I have seen no one do peroformance testing of this, so it seems it will
have to wait for 8.4.

---

Gregory Stark wrote:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
> 
> > What I would definitely like to see for 8.3 is some performance testing
> > done to determine whether we ought to change the current defaults.
> > (Both TOAST_TUPLES_PER_PAGE and EXTERN_TUPLES_PER_PAGE ought to be looked
> > at.)
> 
> It will take some thinking before it's even clear what we should be measuring.
> The "optimal" value will depend heavily on the usage pattern so the best value
> for the default will be something hand-wavy like "the smallest tuple size
> where the cost of a select including the column is greater than the time saved
> on a select not including the column" or something like that.
> 
> -- 
>   Gregory Stark
>   EnterpriseDB  http://www.enterprisedb.com
> 
> 
> ---(end of broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
> 
> http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] elog(FATAL) vs shared memory

2007-04-26 Thread Bruce Momjian

Where are we on this?

---

Tom Lane wrote:
> In this thread:
> http://archives.postgresql.org/pgsql-bugs/2007-03/msg00145.php
> we eventually determined that the reported lockup had three components:
> 
> (1) something (still not sure what --- Martin and Mark, I'd really like
> to know) was issuing random SIGTERMs to various postgres processes
> including autovacuum.
> 
> (2) if a SIGTERM happens to arrive while btbulkdelete is running,
> the next CHECK_FOR_INTERRUPTS will do elog(FATAL), causing elog.c
> to do proc_exit(0), leaving the vacuum still recorded as active in
> the shared memory array maintained by _bt_start_vacuum/_bt_end_vacuum.
> The PG_TRY block in btbulkdelete doesn't get a chance to clean up.
> 
> (3) eventually, either we try to re-vacuum the same index or
> accumulation of bogus active entries overflows the array.
> Either way, _bt_start_vacuum throws an error, which btbulkdelete
> PG_CATCHes, leading to_bt_end_vacuum trying to re-acquire the LWLock
> already taken by _bt_start_vacuum, meaning that the process hangs up.
> And then so does anything else that needs to take that LWLock...
> 
> Point (3) is already fixed in CVS, but point (2) is a lot nastier.
> What it essentially says is that trying to clean up shared-memory
> state in a PG_TRY block is unsafe: you can't be certain you'll
> get to do it.  Now this is not a big deal during normal SIGTERM or
> SIGQUIT database shutdown, because we're going to abandon the shared
> memory segment anyway.  However, if we ever want to support individual
> session kill via SIGTERM, it's a problem.  Even if we were not
> interested in someday considering that a supported feature, it seems
> that dealing with random SIGTERMs is needed for robustness in at least
> some environments.
> 
> AFAICS, there are basically two ways we might try to approach this:
> 
> Plan A: establish the rule that you mustn't try to clean up shared
> memory state in a PG_CATCH block.  Anything you need to do like that
> has to be handled by an on_shmem_exit hook function, so it will be
> called during a FATAL exit.  (Or maybe you can do it in PG_CATCH for
> normal ERROR cases, but you need a backing on_shmem_exit hook to
> clean up for FATAL.)
> 
> Plan B: change the handling of FATAL errors so that they are thrown
> like normal errors, and the proc_exit call happens only when we get
> out to the outermost control level in postgres.c.  This would mean
> that PG_CATCH blocks get a chance to clean up before the FATAL exit
> happens.  The problem with that is that a non-cooperative PG_CATCH
> block might think it could "recover" from the error, and then the exit
> does not happen at all.  We'd need a coding rule that PG_CATCH blocks
> *must* re-throw FATAL errors, which seems at least as ugly as Plan A.
> In particular, all three of the external-interpreter PLs are willing
> to return errors into the external interpreter, and AFAICS we'd be
> entirely at the mercy of the user-written Perl or Python or Tcl code
> whether it re-throws the error or not.
> 
> So Plan B seems unacceptably fragile.  Does anyone see a way to fix it,
> or perhaps a Plan C with a totally different idea?  Plan A seems pretty
> ugly but it's the best I can come up with.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] psql default options

2007-04-26 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> I would like to suggest that we make psql default when in interactive mode to
> using AUTOCOMMIT=false and ON_ERROR_ROLLBACK=true.

That is *way* too big a behavioral change to make depend on something as
fragile as whether psql thinks it's interactive or not.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implicit casts to text

2007-04-26 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Where are we on this?
> 
> Since there weren't any objections, I guess we can do it ;-)
> 
> I'll try to do something with Peter's patch plus removing the deadwood.
> Would you add his patch to the queue so I don't forget?

Added.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Implicit casts to text

2007-04-26 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Where are we on this?

Since there weren't any objections, I guess we can do it ;-)

I'll try to do something with Peter's patch plus removing the deadwood.
Would you add his patch to the queue so I don't forget?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Implicit casts to text

2007-04-26 Thread Bruce Momjian

Where are we on this?

---

Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > FWIW, is the attached patch about what you had in mind?  (It probably only 
> > covers "normal" types at the moment.)
> 
> Hm, I hadn't realized that it would take as little work as that ...
> I have an itchy feeling that you missed something but I'm not sure
> what.
> 
> One thing I had wanted to do is take out the existing functions and
> pg_cast entries that are effectively just providing hard-wired
> equivalents to this, but that's merely housekeeping.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Interaction of PITR backups and Bulkoperationsavoiding WAL

2007-04-26 Thread Bruce Momjian
Simon Riggs wrote:
> On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:
> > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > > On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:
> > >> It strikes me that allowing archive_command to be changed on the fly
> > >> might not be such a good idea though, or at least it shouldn't be
> > >> possible to flip it from empty to nonempty during live operation.
> > 
> > > I'd rather fix it the proposed way than force a restart. ISTM wrong to
> > > have an availability feature cause downtime.
> > 
> > I don't think that people are very likely to need to turn archiving on
> > and off on-the-fly.  Your proposed solution introduces a great deal of
> > complexity (and risk of future bugs-of-omission, to say nothing of race
> > conditions) to solve a non-problem.  We have better things to be doing
> > with our development time.
> 
> It's certainly a quicker fix. Unless others object, I'll set
> archive_command to only be changeable at server startup.

Simon, has this patch been submitted?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [DOCS] row-level stats and last analyze time

2007-04-26 Thread Alvaro Herrera
Neil Conway wrote:

> (3) I don't like the fact that the current coding is so willing to throw
> away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
> that the DBA might be interested in the last-VACUUM and last-ANALYZE
> information for a table which hasn't had live operations applied to it
> recently. The rest of the pgstat code has a similarly disappointing
> willingness to silently discard messages it doesn't think are worth
> keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
> other activity, and pgstat_count_xact_commit/rollback() is a no-op
> unless *either* row-level or block-level stats are enabled.)

One thing to keep in mind is that autovac drives some decision from
whether the database has a pgstat entry or not.  In particular it means
it doesn't bother processing non-connectable databases, unless they are
close to Xid wraparound.

I think this behavior is a useful one, since usually vacuuming those
databases is a waste of time anyway.  Whether to drive it from pgstat or
from somewhere else is another matter, but if you want to drive it from
another mechanism, keep in mind that the autovacuum launcher (which is
the process that makes this decision) is not connected to any database
so it cannot examine any catalog's content.  There are of course ways
around that: for example you could put the information in the
pg_database flatfile.  But it's something to keep in mind if you want to
change it.

> If we're so concerned about saving space in the stats hash tables for
> tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
> to record the timestamps for database-wide VACUUMs and ANALYZEs
> separately from table-local VACUUMs and ANALYZEs? That is, a table's
> last_vacuum time could effectively be the max of the last database-wide
> vacuum time and the last VACUUM on that particular table. (Recording the
> time of the last database-wide VACUUM might be worth doing anyway, e.g.
> for avoiding wraparound failure).

Another thing to keep in mind is that autovacuum does not do
database-wide vacuums anymore -- they are not needed.  Xid wraparound
decisions are handled on a table-by-table basis, so information about
when the last database-wide vacuum was is not needed.

Note that Xid wraparound decisions are driven by information in
pg_class.  So it's not a problem that pgstat may lose the info from this
POV.

The bottom line is that the current pgstat behavior and autovacuum are
closely related.  So if you want to change pgstats you should also keep
an eye on how it's going to affect autovac.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [DOCS] row-level stats and last analyze time

2007-04-26 Thread Neil Conway
On Tue, 2007-04-24 at 17:38 -0400, Neil Conway wrote:
> which included other modifications to reduce the pgstat I/O volume in
> 8.1. I don't think this particular change was wise

I looked into this a bit further:

(1) I believe the reasoning for Tom's earlier change was not to reduce
the I/O between the backend and the pgstat process: it was to keep the
in-memory stats hash tables small, and to reduce the amount of data that
needs to be written to disk. When the only stats messages we get for a
table are VACUUM or ANALYZE messages, we discard the message in the
pgstat daemon.

(2) If stats_row_level is false, there won't be a stats hash entry for
any tables, so we can skip sending the VACUUM or ANALYZE message in the
first place, by the same logic. (This is more debatable if the user just
disabled stats_row_level for the current session, although since only a
super-user can do that, perhaps that's OK.)

(3) I don't like the fact that the current coding is so willing to throw
away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
that the DBA might be interested in the last-VACUUM and last-ANALYZE
information for a table which hasn't had live operations applied to it
recently. The rest of the pgstat code has a similarly disappointing
willingness to silently discard messages it doesn't think are worth
keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
other activity, and pgstat_count_xact_commit/rollback() is a no-op
unless *either* row-level or block-level stats are enabled.)

If we're so concerned about saving space in the stats hash tables for
tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
to record the timestamps for database-wide VACUUMs and ANALYZEs
separately from table-local VACUUMs and ANALYZEs? That is, a table's
last_vacuum time could effectively be the max of the last database-wide
vacuum time and the last VACUUM on that particular table. (Recording the
time of the last database-wide VACUUM might be worth doing anyway, e.g.
for avoiding wraparound failure).

Comments?

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] About the simple_heap_update function

2007-04-26 Thread Tom Lane
"rupesh bajaj" <[EMAIL PROTECTED]> writes:
> I try to update a tuple in pg_attribute table by using the function
> simple_heap_update while somequery processing is going on. But when I use to
> see the value of that tuple(just updated) once again by SearchSysCache
> function in the same query processing, I am not able to see the updated
> values. Can you please tell me what is the reason. Or what should i do to
> make the updation visible.

You'd need a CommandCounterIncrement() ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Buildfarm: Stage logs not available for MSVC builds

2007-04-26 Thread Andrew Dunstan

-hackers probably isn't the place for such complaints.

The problem not beacuse of MSVC, but because of member misconfiguration,
by the look of it. The tar command string will need to be set in the
config file and tar installed. I found that I needed bsdtar for Windows
for this to work. See

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=weevil&dt=2007-04-13%2014:54:06


cheers

andrew


Dave Page wrote:
> I just noticed that the stage logs aren't displayed against MSVC build
> hosts as they are for regular hosts, eg:
>
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mastodon&dt=2007-04-25%2001:00:02
>
> vs.
>
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhal&dt=2007-04-25%2002:00:03
>
> Is this WIP, or a bug to be fixed?
>
> Regards Dave.
>
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
>



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Hi, I wanto joinin the developer group of postgresql

2007-04-26 Thread shieldy

Hi, I wanto joinin the developer group of postgresql。
But, I just donot know how to put the first step, as I installed the
postgresql, and also get the postgresql code. after that, I also installed
the cygwin on my computer( as my os is windows xp). but now I wonder what's
my next step. as I have extends some aspects in the postgresql spatial data.
can you give me some suggestions on how should I go on? thankyou!


[HACKERS] About the simple_heap_update function

2007-04-26 Thread rupesh bajaj

Hi,
I try to update a tuple in pg_attribute table by using the function
simple_heap_update while somequery processing is going on. But when I use to
see the value of that tuple(just updated) once again by SearchSysCache
function in the same query processing, I am not able to see the updated
values. Can you please tell me what is the reason. Or what should i do to
make the updation visible.

Thanks,
Rupesh Bajaj


[HACKERS] psql default options

2007-04-26 Thread Gregory Stark

For a long time one of the big gripes we get is that when using psql
interactively if you turn autocommit off and you typo on the nth command
you've suddenly lost all your work. The response was always that we needed a
generic subtransaction facility to handle it.

We've had such a facility for a while but we're still biting people with the
same problem.

I would like to suggest that we make psql default when in interactive mode to
using AUTOCOMMIT=false and ON_ERROR_ROLLBACK=true.

Defaulting to autocommit=false is somewhat surprising and a big change. I
recall the first time I used Oracle I was completely flummoxed by having
autocommit off. I know that'll be a hard sell.

But as I used it I grew quite fond of having autocommit off. It's saved my
bacon more than once. I strongly recommend it to anyone working with data they
care about even a little bit. Being able to double check your work results
before committing them is extremely valuable.

So basically what I'm saying is that I think what I recommend users use should
just be the default when using psql interactively.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Tom Lane
Michael Meskes <[EMAIL PROTECTED]> writes:
> Having spend countless hours debugging this stuff I fully agree with
> you. It's not just ECPGget_variable though. I also had to inline
> create_statement.

AFAICS you do not need to inline create_statement.  The risk factor
is where you call a routine that does something with a va_list, and
then you want to do something else (other than va_end) with that va_list
after it returns.  The one use of create_statement doesn't do that,
hence no problem.  (We know this works, because stringinfo.c does it.)

BTW, I think there is a small bug there: you fail to call va_end() in the
failure path.  I'm not sure if there are any modern machines where
va_end() isn't a no-op, but nonetheless the code isn't meeting the spec.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Mark Wong

On 4/26/07, Michael Meskes <[EMAIL PROTECTED]> wrote:

On Wed, Apr 25, 2007 at 04:38:30PM -0400, Tom Lane wrote:
> My recommendation is to get rid of the APREF hack, deal only in
> va_list not &va_list, and inline ECPGget_variable into the two
> places it's used to avoid the question of passing va_lists around
> after they've been modified.  The routine's not that big (especially
> seeing that only half of it is actually shared by the two callers)
> and it's just not worth the notational effort, let alone any portability
> risks, to factor it out.

Having spend countless hours debugging this stuff I fully agree with
you. It's not just ECPGget_variable though. I also had to inline
create_statement. This is only called once, so no big deal, but the
calling function gets bigger by quite a margin that way. It was
definitely easier to read with these functions, but again all this
hassle isn't worth it.

Attached you'll find a patch that should inline both functions and
remove the APREF stuff. This successfully runs the regression suite on
my Linux box. Please test it on those archs that needed special
treatment before I commit.


I applied it to REL8_2_STABLE/pgsql and ran 'run_build.pl --test
--keepall REL8_2_STABLE'.  Looks like it passed everything for me. :)

Regards,
Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] RESET command seems pretty disjointed now

2007-04-26 Thread Bruce Momjian

Patch applied from Neil.

---

Marko Kreen wrote:
> On 4/23/07, Neil Conway <[EMAIL PROTECTED]> wrote:
> > On Tue, 2007-04-17 at 16:34 +0300, Marko Kreen wrote:
> > > Attached patch does following conversions:
> >
> > ISTM it would be cleaner to use an enum to identify the different
> > variants of the DISCARD command, rather than a character string.
> >
> > Is guc.c still the logical place for the implementation of DISCARD?
> > Something under backend/commands might be better, although I don't see a
> > real obvious place for it.
> >
> > The psql tab completion code requires updating for the new DISCARD
> > command.
> 
> Attached patch addresses all 3 comments.  As it will be
> top-level command, I put code into commands/discard.c
> 
> -- 
> marko

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] RESET command seems pretty disjointed now

2007-04-26 Thread Neil Conway
On Tue, 2007-04-24 at 18:04 +0300, Marko Kreen wrote:
> Attached patch addresses all 3 comments.  As it will be
> top-level command, I put code into commands/discard.c

Applied with some minor tweaks -- thanks for the patch. I didn't bother
moving the regression tests out of guc.sql, although they don't really
belong there any longer.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Tom Lane
Jim Nasby <[EMAIL PROTECTED]> writes:
> So what happens if a backend is running with full_page_writes = off,  
> someone edits postgresql.conf to turns it on and forgets to reload/ 
> restart, and then we crash? You'll come up in recovery mode thinking  
> that f_p_w was turned on, when in fact it wasn't.

One of the advantages of the proposed patch is that it avoids having to
make any assumptions like that.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Zeugswetter Andreas ADI SD

> So what happens if a backend is running with full_page_writes 
> = off, someone edits postgresql.conf to turns it on and 
> forgets to reload/ restart, and then we crash? You'll come up 
> in recovery mode thinking that f_p_w was turned on, when in 
> fact it wasn't.
> 
> ISTM that we need to somehow log what the status of 
> full_page_writes is, if it's going to affect how recovery works.

Optimally recovery should do this when confronted with a full page image
only. The full page is in the same WAL record that first touches a page,
so this should not need to depend on a setting.

Andreas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Thu, Apr 26, 2007 at 06:28:29AM -0500, Andrew Dunstan wrote:
> If you commit to HEAD it will be automatically tested on the buildfarm.

True. But it might also break a lot of other archs without helping on
those troubled ones. I thought this way would be better.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Andrew Dunstan
Michael Meskes wrote:
>
> Attached you'll find a patch that should inline both functions and
> remove the APREF stuff. This successfully runs the regression suite on
> my Linux box. Please test it on those archs that needed special
> treatment before I commit.
>

If you commit to HEAD it will be automatically tested on the buildfarm.

cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Re: [Pgbuildfarm-members] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Dave Page
Andrew Dunstan wrote:
> Dave Page wrote:
>> This was another occurance of the strange create index failure on
>> Narwhal - unfortunately, despite having 'keep_error_builds' => 1 in my
>> BF config it seems to have removed the tree so I can't get the dump that
>> Tom wanted.
>>
>> Does anyone know why the keep_error_builds option didn't work in this
>> case? Or have I misinterpreted it's meaning?
>>
> 
> 
> Well, "anyone" will usually be me ;-)

Yeah, I know but I didn't want to presume :-)

> To the best of my knowledge and belief it does work. This is the code it
> should cause to be executed:
> 
>   my $timestr = strftime "%Y-%m-%d-%H:%M:%S", localtime($now);
>   move("$pgsql", "pgsqlkeep.$timestr");
>   move("inst", "instkeep.$timestr")   if (-d "inst") ;
> 
> Please see if you can trap the script stdout/stderr somewhere so that next
> time it fails you can send it to me.

I already log the run output - there's nothing unusual in there.

> How exactly do you have the buildfarm runs automated? Maybe that does a
> little cleaning of its own?

A simple batch script, run from the task scheduler which fires off the
script. It doesn't do any cleanup at all.

Regards, Dave.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
ARGH!

This time with patch.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
diff -ruN --exclude CVS /home/postgres/pgsql-ecpg/ecpglib/descriptor.c ecpglib/descriptor.c
--- /home/postgres/pgsql-ecpg/ecpglib/descriptor.c	2006-10-04 09:38:18.0 +0200
+++ ecpglib/descriptor.c	2007-04-26 12:02:20.0 +0200
@@ -507,7 +507,6 @@
 	do
 	{
 		enum ECPGdtype itemtype;
-		enum ECPGttype type;
 		const char *tobeinserted = NULL;
 		bool		malloced;
 
@@ -516,13 +515,29 @@
 		if (itemtype == ECPGd_EODT)
 			break;
 
-		type = va_arg(args, enum ECPGttype);
-#if defined(__GNUC__) && (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__))
-		ECPGget_variable(args, type, var, false);
-#else
-		ECPGget_variable(&args, type, var, false);
-#endif
+		var->type = va_arg(args, enum ECPGttype);
+		var->pointer = va_arg(args, char *);
 
+		var->varcharsize = va_arg(args, long);
+		var->arrsize = va_arg(args, long);
+		var->offset = va_arg(args, long);
+
+		if (var->arrsize == 0 || var->varcharsize == 0)
+			var->value = *((char **) (var->pointer));
+		else
+			var->value = var->pointer;
+
+		/*
+		 * negative values are used to indicate an array without given bounds
+		 */
+		/* reset to zero for us */
+		if (var->arrsize < 0)
+			var->arrsize = 0;
+		if (var->varcharsize < 0)
+			var->varcharsize = 0;
+
+		var->next = NULL;
+		
 		switch (itemtype)
 		{
 			case ECPGd_data:
diff -ruN --exclude CVS /home/postgres/pgsql-ecpg/ecpglib/execute.c ecpglib/execute.c
--- /home/postgres/pgsql-ecpg/ecpglib/execute.c	2007-03-29 14:04:34.0 +0200
+++ ecpglib/execute.c	2007-04-26 12:01:33.0 +0200
@@ -79,136 +79,6 @@
 	}
 }
 
-#if defined(__GNUC__) && (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__))
-#define APREF ap
-#else
-#define APREF *ap
-#endif
-
-void
-ECPGget_variable(va_list APREF, enum ECPGttype type, struct variable * var, bool indicator)
-{
-	var->type = type;
-	var->pointer = va_arg(APREF, char *);
-
-	var->varcharsize = va_arg(APREF, long);
-	var->arrsize = va_arg(APREF, long);
-	var->offset = va_arg(APREF, long);
-
-	if (var->arrsize == 0 || var->varcharsize == 0)
-		var->value = *((char **) (var->pointer));
-	else
-		var->value = var->pointer;
-
-	/*
-	 * negative values are used to indicate an array without given bounds
-	 */
-	/* reset to zero for us */
-	if (var->arrsize < 0)
-		var->arrsize = 0;
-	if (var->varcharsize < 0)
-		var->varcharsize = 0;
-
-	var->next = NULL;
-
-	if (indicator)
-	{
-		var->ind_type = va_arg(APREF, enum ECPGttype);
-		var->ind_pointer = va_arg(APREF, char *);
-		var->ind_varcharsize = va_arg(APREF, long);
-		var->ind_arrsize = va_arg(APREF, long);
-		var->ind_offset = va_arg(APREF, long);
-
-		if (var->ind_type != ECPGt_NO_INDICATOR
-			&& (var->ind_arrsize == 0 || var->ind_varcharsize == 0))
-			var->ind_value = *((char **) (var->ind_pointer));
-		else
-			var->ind_value = var->ind_pointer;
-
-		/*
-		 * negative values are used to indicate an array without given bounds
-		 */
-		/* reset to zero for us */
-		if (var->ind_arrsize < 0)
-			var->ind_arrsize = 0;
-		if (var->ind_varcharsize < 0)
-			var->ind_varcharsize = 0;
-	}
-}
-
-/*
- * create a list of variables
- * The variables are listed with input variables preceding outputvariables
- * The end of each group is marked by an end marker.
- * per variable we list:
- * type - as defined in ecpgtype.h
- * value - where to store the data
- * varcharsize - length of string in case we have a stringvariable, else 0
- * arraysize - 0 for pointer (we don't know the size of the array),
- * 1 for simple variable, size for arrays
- * offset - offset between ith and (i+1)th entry in an array,
- * normally that means sizeof(type)
- * ind_type - type of indicator variable
- * ind_value - pointer to indicator variable
- * ind_varcharsize - empty
- * ind_arraysize -	arraysize of indicator array
- * ind_offset - indicator offset
- */
-static bool
-create_statement(int lineno, int compat, int force_indicator, struct connection * connection, struct statement ** stmt, const char *query, va_list APREF)
-{
-	struct variable **list = &((*stmt)->inlist);
-	enum ECPGttype type;
-
-	if (!(*stmt = (struct statement *) ECPGalloc(sizeof(struct statement), lineno)))
-		return false;
-
-	(*stmt)->command = ECPGstrdup(query, lineno);
-	(*stmt)->connection = connection;
-	(*stmt)->lineno = lineno;
-	(*stmt)->compat = compat;
-	(*stmt)->force_indicator = force_indicator;
-
-	list = &((*stmt)->inlist);
-
-	type = va_arg(APREF, enum ECPGttype);
-
-	while (type != ECPGt_EORT)
-	{
-		if (type == ECPGt_EOIT)
-			list = &((*stmt)->outlist);
-		else
-		{
-			struct variable *var,
-	   *ptr;
-
-			if (!(var = (struct variable *) ECPGalloc(sizeof(struct variable), lineno)))
-return false;
-
-			ECPGget_variable(ap, type, var, true);

[HACKERS] Re: [Pgbuildfarm-members] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Andrew Dunstan
Dave Page wrote:
> This was another occurance of the strange create index failure on
> Narwhal - unfortunately, despite having 'keep_error_builds' => 1 in my
> BF config it seems to have removed the tree so I can't get the dump that
> Tom wanted.
>
> Does anyone know why the keep_error_builds option didn't work in this
> case? Or have I misinterpreted it's meaning?
>


Well, "anyone" will usually be me ;-)

To the best of my knowledge and belief it does work. This is the code it
should cause to be executed:

my $timestr = strftime "%Y-%m-%d-%H:%M:%S", localtime($now);
move("$pgsql", "pgsqlkeep.$timestr");
move("inst", "instkeep.$timestr")   if (-d "inst") ;

Please see if you can trap the script stdout/stderr somewhere so that next
time it fails you can send it to me.

How exactly do you have the buildfarm runs automated? Maybe that does a
little cleaning of its own?

cheers

andrew





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] too much WAL volume

2007-04-26 Thread Zeugswetter Andreas ADI SD

> > Writing to a different area was considered in pg, but there were
more 
> > negative issues than positive.
> > So imho pg_compresslog is the correct path forward. The current
> > discussion is only about whether we want a more complex
pg_compresslog 
> > and no change to current WAL, or an increased WAL size for a less 
> > complex implementation.
> > Both would be able to compress the WAL to the same "archive log"
size.
> 
> Huh?  As conceived, pg_compresslog does nothing to lower log 
> volume for general purposes, just on-disk storage size for 
> archiving.  It doesn't help us at all with the tremendous 
> amount of log we put out for an OLTP server, for example.

Ok, that is not related to the original discussion though.
I have thus changed the subject, and removed [PATCHES].

You cannot directly compare the pg WAL size with other db's since they
write parts to other areas (e.g. physical log in Informix). You would
need to include those writes in a fair comparison.
It is definitely not true, that writing to a different area has only
advantages. The consensus was, that writing the page images to the WAL
has more pro's. We could revisit the pros and cons though. 

Other options involve special OS and hardware (we already have that), or
accepting a high risc of needing a
restore after power outage (we don't have that, because we use no
mechanism to detect such a failure).

I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second. Recent developments like HOT seem
a lot more promising in this regard since they avoid IO.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Wed, Apr 25, 2007 at 04:38:30PM -0400, Tom Lane wrote:
> My recommendation is to get rid of the APREF hack, deal only in
> va_list not &va_list, and inline ECPGget_variable into the two
> places it's used to avoid the question of passing va_lists around
> after they've been modified.  The routine's not that big (especially
> seeing that only half of it is actually shared by the two callers)
> and it's just not worth the notational effort, let alone any portability
> risks, to factor it out.

Having spend countless hours debugging this stuff I fully agree with
you. It's not just ECPGget_variable though. I also had to inline
create_statement. This is only called once, so no big deal, but the
calling function gets bigger by quite a margin that way. It was
definitely easier to read with these functions, but again all this
hassle isn't worth it.

Attached you'll find a patch that should inline both functions and
remove the APREF stuff. This successfully runs the regression suite on
my Linux box. Please test it on those archs that needed special
treatment before I commit.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Wed, Apr 25, 2007 at 03:17:19PM -0400, Tom Lane wrote:
> Why in the world is that like that?  We don't have such a kluge
> anyplace else we use va_list.  stringinfo.c for instance has
> never needed any such thing.

I don't remember the exact details but this was added a long time ago
before 8.0 because we had some problems with one of the archs. I was
suprised about the differences back then too, but haven't checked
since.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Dave Page
This was another occurance of the strange create index failure on
Narwhal - unfortunately, despite having 'keep_error_builds' => 1 in my
BF config it seems to have removed the tree so I can't get the dump that
Tom wanted.

Does anyone know why the keep_error_builds option didn't work in this
case? Or have I misinterpreted it's meaning?

Regards, Dave.

 Original Message 
Subject: PGBuildfarm member narwhal Branch HEAD Status changed from OK
to InstallCheck failure
Date: Wed, 25 Apr 2007 19:34:34 -0700 (PDT)
From: PG Build Farm <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED],
[EMAIL PROTECTED]


The PGBuildfarm member narwhal had the following event on branch HEAD:

Status changed from OK to InstallCheck failure

The snapshot timestamp for the build that triggered this notification
is: 2007-04-26 02:00:01

The specs of this machine are:
OS:  Windows Server 2003 R2 / 5.2.3790
Arch: i686
Comp: GCC / 3.4.2 (mingw-special)

For more information, see
http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=narwhal&br=HEAD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

but it'd make it safe to use in non-WAL contexts (I think there are
other places where we know we are going to init the page and so a
physical read is a waste of time).



Is there? I can't think of any. Extending a relation doesn't count.


No, but re-using a free page in an index does.  I'm not sure which index
AMs know for sure the page is free, and which have to read it and check,
but I think there's at least some scope for that.


B-tree, GIN ans GiST read and check. I'm not sure how hash works. I 
think the latest bitmap index patch doesn't support reusing empty pages 
at all.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Jim Nasby

On Apr 25, 2007, at 2:48 PM, Heikki Linnakangas wrote:
In recovery, with full_pages_writes=on, we read in each page only  
to overwrite the contents with a full page image. That's a waste of  
time, and can have a surprisingly large effect on recovery time.


As a quick test on my laptop, I initialized a DBT-2 test with 5  
warehouses, and let it run for 2 minutes without think-times to  
generate some WAL. Then I did a "kill -9 postmaster", and took a  
copy of the data directory to use for testing recovery.


With CVS HEAD, the recovery took ~ 2 minutes. With the attached  
patch, it took 5 seconds. (yes, I used the same not-yet-recovered  
data directory in both tests, and cleared the os cache with "echo 1  
> /proc/sys/vm/drop_caches").


I was surprised how big a difference it makes, but when you think  
about it it's logical. Without the patch, it's doing roughly the  
same I/O as the test itself, reading in pages, modifying them, and  
writing them back. With the patch, all the reads are done  
sequentially from the WAL, and then written back in a batch at the  
end of the WAL replay which is a lot more efficient.


It's interesting that (with the patch) full_page_writes can  
*shorten* your recovery time. I've always thought it to have a  
purely negative effect on performance.


I'll leave it up to the jury if this tiny little change is  
appropriate after feature freeze...


While working on this, this comment in ReadBuffer caught my eye:


/*
 * During WAL recovery, the first access to any data page should
 * overwrite the whole page from the WAL; so a clobbered page
 * header is not reason to fail.  Hence, when InRecovery we may
 * always act as though zero_damaged_pages is ON.
 */
if (zero_damaged_pages || InRecovery)
{


But that assumption only holds if full_page_writes is enabled,  
right? I changed that in the attached patch as well, but if it  
isn't accepted that part of it should still be applied, I think.


So what happens if a backend is running with full_page_writes = off,  
someone edits postgresql.conf to turns it on and forgets to reload/ 
restart, and then we crash? You'll come up in recovery mode thinking  
that f_p_w was turned on, when in fact it wasn't.


ISTM that we need to somehow log what the status of full_page_writes  
is, if it's going to affect how recovery works.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Improving deadlock error messages

2007-04-26 Thread Jim Nasby

On Apr 23, 2007, at 11:38 PM, Tom Lane wrote:

Neil Conway <[EMAIL PROTECTED]> writes:

On Sat, 2007-04-21 at 19:43 -0400, Neil Conway wrote:

Attached is a very quick hack of a patch to do this.


Does anyone have any feedback on this approach? If people are  
satisfied
with this solution, I can get a cleaned up patch ready to apply  
shortly.


I'm really still opposed to the entire concept.  You're proposing  
to put

a lot of fragile-looking code into a seldom-exercised error path.
I fear bugs will survive a long time in there, and the net effect  
will be
that we get no information when we need it most.  The numeric  
printouts

may be ugly, but they are reliable.


If we're that worried about test coverage for deadlocks, what about  
adding a test to the regression tests? IIRC the framework can  
coordinate between multiple connections now...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Vacuum-full very slow

2007-04-26 Thread Simon Riggs
On Thu, 2007-04-26 at 00:13 +0200, Listmail wrote:
> By the way, about indexes :
> 
> When you have a small table (say, for a website, maybe a few
> tens of  
> megabytes max...) reindexing it takes just a few seconds, maybe
> 10-20  
> seconds.
> It could be interesting, performance-wise, to tell postgres
> not to bother  
> about crash-survivability of indexes on this table. Like temporary
> tables.  
> Write nothing to WAL. If it crashes, on recovery, postgres would
> reindex  
> the table.
> btree indexing is so fast on postgres that I'd definitely use
> this  
> feature.
> I'd rather trade a minute of recovery versus less disk IO for
> index  
> update.
> 
> You could even do that for whole tables (like, web sessions
> table) which  
> hold "perishable" data... 

That optimisation on mine/Heikki's todo for the next release.

In some cases it can speed up recovery, as well as mainline performance.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org