Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Richard Huxton
Tom Lane wrote:
The real point here is that omitting the per-command subtransaction
ought to be a hidden optimization, not something that intrudes to the
point of having unclean semantics when we can't do it.
Sorry to be stupid here, but I didn't understand this when it was 
disussed originally either. Why a subtransaction per command rather than 
one per function? If I've got this right, this is so the PL can tidy up 
behind itself and report/log an appropriate error?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Thomas Hallgren
Richard Huxton wrote:
Tom Lane wrote:
The real point here is that omitting the per-command subtransaction
ought to be a hidden optimization, not something that intrudes to the
point of having unclean semantics when we can't do it.

Sorry to be stupid here, but I didn't understand this when it was 
disussed originally either. Why a subtransaction per command rather 
than one per function? If I've got this right, this is so the PL can 
tidy up behind itself and report/log an appropriate error?
I don't understand this either. Why a subtransaction at all?
Don't get me wrong. I fully understand that a subtransaction would make 
error recovery possible. What I try to say is that the kind of error 
recovery that needs a subtransaction is fairly, or perhaps even very, rare.

We all agree that further calls to SPI must be prohibited if an error 
occurs when no subtransaction is active. Such an error can only result 
in one thing. The function must terminate and the error must be propagated.

The way most functions that I've seen is written, this is the most 
common behavior anyway. It's very uncommon that you want to do further 
database accesses after something has gone wrong. I admit that some 
special cases indeed do exist but I cannot for my life understand why 
those cases must incur a 25% overhead on everything else. Especially if 
there is an alternate way of handling them without making any sacrifice 
whatsoever on safety.

A function in PL/Java that calls to the backend and encounters an error 
can be 1 of 2 types:
1. If no subtransaction is active, the function will be completely and 
utterly blocked from doing further calls to the backend. When it 
returns, the error will be re-thrown.
2. When a subtransaction is active, the function will be blocked the 
same way as for #1 with one exception. A subtransaction rollback will go 
through and it will remove the block.

So, in Java I have the choice of writing:
 try
 {
 // do something
 }
 catch(SQLException e)
 {
// Clean up (but no backend calls) and terminate
 }
or I can write:
 Savepoint sp = myConn-setSavepoint(foo);
 try
 {
 // do something
 sp.commit();
  }
  catch(SQLException e)
  {
 sp.rollback();
 // Handle error and continue execution.
  }
All cases are covered, there's no subtransaction overhead (unless you 
really want it), the semantics are clean, and it's 100% safe. What's 
wrong with this approach?

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-30 Thread Zeugswetter Andreas DAZ SD

 I think I recall that lseek may have a negative effect on some OS's
 readahead calculations (probably only systems that cannot handle an
 lseek to the next page eighter) ? Do you think we should cache the
 last value to avoid the syscall ?
 
 We really can't, since the point of doing it is to find out whether any
 other backends have extended the file since we last looked.  Also, IIRC

Well yes. The value would need to be kept ajour by the backends that 
extend, with an interlocked increment. We would thus need to keep the 
count in shmem.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Kris Jurka


On Tue, 30 Nov 2004, Greg Stark wrote:

 
 Andrew Dunstan [EMAIL PROTECTED] writes:
 
  The advantage of having it in COPY is that it can be done serverside
  direct from the file system. For massive bulk loads that might be a
  plus, although I don't know what the protocol+socket overhead is.
 
 Actually even if you use client-side COPY it's *still* more efficient than any
 more general client-side alternative.

The idea would be to still use COPY just from a program that did 
additional processing, not as direct SQL.

 As Tom pointed out to me a while back, neither the protocol nor libpq allow
 for having multiple queries in flight simultaneously. That makes it impossible
 to stream large quantities of data to the server efficiently. Each record
 requires a round-trip and context switch overhead.

Multiplexing queries is different than having multiple queries in flight.  
You can have multiple queries on the wire now, they are just processed 
sequentially.

 In an ideal world the client should be able to queue up enough records to fill
 the socket buffers and allow the kernel to switch to a more batch oriented
 context switch mode where the server can process large numbers of records
 before switching back to the client. Ideally this would apply to any kind of
 query execution.

This is possible now with the V3 protocol (and used by the JDBC driver).  
For an executeBatch() on a PreparedStatement, the driver sends a parse
message, then any number of bind/execute pairs, but with a buffered stream
nothing happens until a Sync message is sent and the stream is flushed.  
Then it collects the results of all of the executes.

Kris Jurka

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Opinions on Usenet ...

2004-11-30 Thread Vince Vielhaber
On Mon, 29 Nov 2004, Marc G. Fournier wrote:

 If there were a comp.databases.postgresql.hackers newsgroup created and
 carried by all the news servers ... would you move to using it vs using
 the mailing lists?

No.  (yes, I'm still here :)

Vince.
-- 
 Fast, inexpensive internet service 56k and beyond!  http://www.pop4.net/
   http://www.meanstreamradio.com   http://www.unknown-artists.com
  Online radio: It's not file sharing, it's just radio.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan

Greg Stark wrote:
Personally I find the current CSV support inadequate. It seems pointless to
support CSV if it can't load data exported from Excel, which seems like the
main use case. 
 

OK, I'm starting to get mildly annoyed now. We have identified one 
failure case connected with multiline fields. Even in the multiline 
case, I expect that the embedded newlines will usually match those of 
the CSV file, in which case there will be no failure. It's a very big 
step from there to the far more general can't load data exported from 
Excel. Or did you have some other limitation in mind?

FWIW, I don't make a habit of using multiline fields in my spreadsheets 
- and some users I have spoken to aren't even aware that you can have 
them at all.

cheers
andrew
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Large objects through ODBS

2004-11-30 Thread Bojidar Mihajlov
Hi all!
I need to operate with large objects through ODBC in
C/C++ program. How can I do that?



__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Large objects through ODBS

2004-11-30 Thread Joshua D. Drake
Bojidar Mihajlov wrote:
Hi all!
I need to operate with large objects through ODBC in
C/C++ program. How can I do that?
 

Look at the contrib lo data type.
Sincerely,
Joshua D. Drake

		
__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-30 Thread Rupa Schomaker


On 11/29/2004 10:49 AM, Greg Stark wrote:
 I'll point out other databases end up treading the same ground. Oracle started
 with a well defined rules-based system that was too inflexible to handle
 complex queries. So they went to a cost-based optimizer much like Postgres's
 current optimizer. But DBAs resisted for a long time precisely because they
 couldn't control it or predict its behaviour as well. Now they have a plan
 stability system where you can plan queries using the cost based optimizer but
 then store the plans for future use. You can even take the plans and store
 them and load them on development systems for testing.

I can attest to this.  I work (Middlware, not DBA stuff) with fairly
large oracle databases (40T, billions of rows).  The data is added in
chunks (tablespaces) and in general do not materially affect the
distribution of data.  However, oracle would many times suddenly take a
plan and shove it in a new sub-optimal query path after adding the data.

The solution was to

 1) fix the stats and/or stored outline in a staging area manually (DBA)

or

 2) hint the query in the middleware (uggh -- my group MW)

Once good, move the stored outlines to the production hardware -- all is
fixed.

For the most part we fix using option 2 cause it is generally easier to
hint the query than to fix the stored outline (though our DBAs say they
can).

Using stored outlines has gone a long way to ensure stability on our
systems.


 
 Their system is awfully kludgy though. Postgres can probably do much better.
 

-- 
 -Rupa


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Opinions on Usenet ...

2004-11-30 Thread Rupa Schomaker


On 11/29/2004 11:03 AM, Marc G. Fournier wrote:
 The USENET community seems to think that there would be a mass exodus
 from the lists to usenet ... based on past discussions concerning moving
 some stuff out of email to stuff like bug trackers, I don't believe this
 to be the case, but am curious what the opinion of other developers
 happens to be ... would a USENET group actually be preferrable?

I simply gate the maillist traffic in to sn (small news server) and use
that.  Not sure I wan't to rely on my warez/dvd repository for real
information...

But that is just me :)

-- 
 -Rupa


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Ben . Young
 
 
 Greg Stark wrote:
 
 Personally I find the current CSV support inadequate. It seems 
pointless to
 support CSV if it can't load data exported from Excel, which seems like 
the
 main use case. 
  
 
 
 OK, I'm starting to get mildly annoyed now. We have identified one 
 failure case connected with multiline fields. Even in the multiline 
 case, I expect that the embedded newlines will usually match those of 
 the CSV file, in which case there will be no failure. It's a very big 
 step from there to the far more general can't load data exported from 
 Excel. Or did you have some other limitation in mind?
 
 FWIW, I don't make a habit of using multiline fields in my spreadsheets 
 - and some users I have spoken to aren't even aware that you can have 
 them at all.
 

I am normally more of a lurker on these lists, but I thought you had 
better know 
that when we developed CSV import/export for an application at my last 
company
we discovered that Excel can't always even read the CSV that _it_ has 
output! 
(With embedded newlines a particular problem) 

It is far more reliable if you output your data as an HTML table, in which 
case it 
practically always gets it right. Perhaps Postgres could support this as 
an import/
export mechanism as I have found it to be far less error prone than CSV?

Cheers,
Ben Young


 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Opinions on Usenet ...

2004-11-30 Thread Vince Vielhaber
On Mon, 29 Nov 2004, Marc G. Fournier wrote:


 If there were a comp.databases.postgresql.hackers newsgroup created and
 carried by all the news servers ... would you move to using it vs using
 the mailing lists?

Trying this again with the right From address...

No.  (and yes, I'm still here :)


Vince.
-- 
 Fast, inexpensive internet service 56k and beyond!  http://www.pop4.net/
   http://www.meanstreamradio.com   http://www.unknown-artists.com
  Online radio: It's not file sharing, it's just radio.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Column n.nsptablespace does not exist error

2004-11-30 Thread Johan Wehtje
Thanks Neil,
I will just have to hassle EMS to upgrade :)
Cheers
Johan Wehtje
Neil Conway wrote:
On Tue, 2004-11-30 at 17:54 +1100, Johan Wehtje wrote:
 

I am getting the error Column n.nsptablespace does not exist in my 
application when I connect using my Administrative tool. This only 
happens with Version 8, but it does crash my application, does anyone 
have any ideas ?
   

You need to upgrade your admin tool -- that column was removed from the
system catalogs in beta5. See:
http://archives.postgresql.org/pgsql-hackers/2004-11/msg00987.php
-Neil

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
.
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Tom Lane
Richard Huxton [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The real point here is that omitting the per-command subtransaction
 ought to be a hidden optimization, not something that intrudes to the
 point of having unclean semantics when we can't do it.

 Sorry to be stupid here, but I didn't understand this when it was 
 disussed originally either. Why a subtransaction per command rather than 
 one per function?

So that when the Tcl programmer writes catch around a SPI command,
or the Perl programmer writes eval around a SPI command, they see
sensible behavior.  A reasonable person would expect to be able to use
the language's standard error-trapping constructs to trap any error
thrown by a SPI call and then continue processing (a la plpgsql
exception blocks).  Before 8.0 it was impossible to support this
behavior, and what we actually did was, in effect, to seal off the Tcl
or Perl function so that it couldn't touch the database state --- after
the first SPI error all subsequent SPI operations would fail immediately
until control exited the Tcl or Perl function, whereupon the error would
be re-thrown.  So you could try to trap an error but you couldn't do
anything useful after having done so, and you couldn't prevent it from
aborting the surrounding transaction.

I feel that behavior was obviously bogus and cannot be justified simply
on grounds of efficiency.  A wise man once said I can make this program
arbitrarily fast ... if it doesn't have to give the right answer;
I think that applies here.  The semantics I want to see are that
catch/eval can trap errors and continue processing, and given the tools
we have at the moment that requires a subtransaction per SPI call.  We
can think about ways to optimize this later, but I'm not putting up with
the broken semantics any longer than I have to.

In the case of Perl I suspect it is reasonably possible to determine
whether there is an eval surrounding the call or not, although we
might have to get more friendly with Perl's internal data structures
than a purist would like.  In the case of Tcl I'm not sure this is
really going to be feasible :-(, because AFAICS the interpreter state
is encoded as a series of return addresses buried on the stack; and
even if you could detect the standard catch function you couldn't be
sure what other custom-built Tcl statements might have catch-like
functionality.  But perhaps for Tcl we could think in terms of
optimizations like continuing one subtransaction across multiple SPI
commands as long as there's no failure.  Jan also suggested the
possibility of replacing the standard catch command, which might be
good enough (though the prospect of nonstandard catch-like commands
worries me).

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Mark Wong [EMAIL PROTECTED] writes:
 I have some initial results using 8.0beta5 with our OLTP workload.
 http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/
 throughput: 4076.97

 Do people really only look at the throughput numbers? Looking at those
 graphs it seems that while most of the OLTP transactions are fulfilled in
 subpar response times, there are still significant numbers that take as much
 as 30s to fulfil.

 Is this just a consequence of the type of queries being tested and the data
 distribution? Or is Postgres handling queries that could run consistently fast
 but for some reason generating large latencies sometimes?

Given the regular shape of the first graph (transactions/minute),
there's every reason to think that the slowdowns are caused by
checkpoint I/O storms.  It would be worth the trouble to experiment
with adjusting the bgwriter parameters to even out the flow of write
operations.  (I think I've already pointed out that the current defaults
for the bgwriter seem mighty conservative.)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan

[EMAIL PROTECTED] wrote:
I am normally more of a lurker on these lists, but I thought you had 
better know 
that when we developed CSV import/export for an application at my last 
company
we discovered that Excel can't always even read the CSV that _it_ has 
output! 
(With embedded newlines a particular problem) 
 


Quite so.  This is not about being perfect, and there will be failures. 
But, absent this problem the feature should work reasonably well.

Note that Excel is not the only kid on the block - if it were there 
would be a good case for avoiding CSV anyway and instead reading/writing 
the excel files directly. We included support for CSVs because, 
nothwithstanding how braindead the format is, is is still widely used in 
data exchange.

It is far more reliable if you output your data as an HTML table, in which 
case it 
practically always gets it right. Perhaps Postgres could support this as 
an import/
export mechanism as I have found it to be far less error prone than CSV?
 

That's probably possibly but rather ugly. A well defined XML format 
would be far better (don't we have something like that in contrib?). 
Things for a bulk-export facility, I think.

cheers
andrew

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Rod Taylor
On Mon, 2004-11-29 at 16:01 -0800, Mark Wong wrote:
 I have some initial results using 8.0beta5 with our OLTP workload.
 Off the bat I see about a 23% improvement in overall throughput.  The
 most significant thing I've noticed was in the oprofile report where
 FunctionCall2 and hash_seq_search have moved down the profile a bit.
 
 Also, I have libc with symbols now so we can see what it's doing with
 in the oprofile output.
 
 8.0beta5 results:
   http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/
   throughput: 4076.97
 
 8.0beta4 results:
   http://www.osdl.org/projects/dbt2dev/results/dev4-010/191/
   throughput: 3323.07

It appears that WAL archiving (archive_command) is configured in 191 and
disabled (unset) in 199. Perhaps this accounts for some of the
difference?

-- 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Greg Stark

Andrew Dunstan [EMAIL PROTECTED] writes:

 FWIW, I don't make a habit of using multiline fields in my spreadsheets - and
 some users I have spoken to aren't even aware that you can have them at all.

Unfortunately I don't get a choice. I offer a field on the web site where
users can upload an excel sheet. Some of the fields of my database are
expected to have multi-line text in them. So I expect virtually all the
uploads to have multi-line fields in them.

So as far as I'm concerned, this import system is simply unusable. I have to
write a program to do the import. Since I was always planning to write such a
program I'm not too disappointed though.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Tom Lane
Thomas Hallgren [EMAIL PROTECTED] writes:
 I don't understand this either. Why a subtransaction at all?

 Don't get me wrong. I fully understand that a subtransaction would make 
 error recovery possible. What I try to say is that the kind of error 
 recovery that needs a subtransaction is fairly, or perhaps even very, rare.

On what evidence do you base that claim?  It's true there are no
existing Tcl or Perl functions that do error recovery from SPI
operations, because it doesn't work in existing releases.  That does
not mean the demand is not there.  We certainly got beat up on often
enough about the lack of error trapping in plpgsql.

 or I can write:

   Savepoint sp = myConn-setSavepoint(foo);
   try
   {
   // do something
   sp.commit();
}
catch(SQLException e)
{
   sp.rollback();

   // Handle error and continue execution.
}

[ shrug... ]  If you intend to design pljava that way I can't stop you.
But I think it's a bogus design, because (a) it puts extra burden on the
function author who's already got enough things to worry about, and
(b) since you can't support arbitrary rollback patterns, you have to
contort the semantics of Savepoint objects with restrictions that are
both hard to design correctly and complicated to enforce.

I don't believe you should do language design on the basis of avoiding
a 25% overhead, especially not when there's every reason to think that
number can be reduced in future releases.  I got it down from 50% to 25%
in one afternoon, doing nothing that seemed too risky for late beta.
I think there's plenty more that can be done there when we have more
time to work on it.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [Testperf-general] Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Mark Wong
On Tue, Nov 30, 2004 at 07:12:10AM +, Simon Riggs wrote:
 If you look at the graph of New Order response time distribution, the
 higher result gives much more frequent sub-second response for 8.0beta5
 and the hump at around 23secs has moved down to 14secs. Notably, the
 payment transaction and stock level transaction have almost identical
 response time peaks in both cases. Perhaps some interaction between them
 has been slowing us down? Now its gone...
 
 The results seem to be significantly different, so I believe the
 results. Well done Mark - great new graphs. Any chance we could see the
 graphs showing 0.5 sec bins on the x axis, with all data  0.5 sec
 removed from the graph so we can show the tail? Or point me at the data?

The data files used to generate the charts are here:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/driver/

For each transaction:
delivery.data
new_order.data
order_status.data
payment.data
stock_level.data

Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Mark Wong
On Tue, Nov 30, 2004 at 08:34:20AM +0100, Michael Paesold wrote:
 Mark Wong wrote:
 
 
 I have some initial results using 8.0beta5 with our OLTP workload.
  Off the bat I see about a 23% improvement in overall throughput.  The
  most significant thing I've noticed was in the oprofile report where
  FunctionCall2 and hash_seq_search have moved down the profile a bit.
 
  Also, I have libc with symbols now so we can see what it's doing with
  in the oprofile output.
 
  8.0beta5 results:
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/
  throughput: 4076.97
 
  8.0beta4 results:
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/191/
  throughput: 3323.07
 
 
 Is it possible that there are some other differences effecting the result?
 At Table Blocks Read there is a history table in #191, but it does not 
 show up in #199. Just a thought,... but can you explain, Mark?
 
 Best Regards,
 Michael 

Ah, the chart differences are due to a faulty script.  That should be
cleared up and updated shortly.  As for other differences to the beta4
test, I ran the beta5 test for 30 minutes as opposed to 60,
random_page_cost=2 instead of 4, and I have archving disabled.  I've
generally found the latter two settings to have minimal effects but I
can always reverify.

Mark

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Mark Wong
On Tue, Nov 30, 2004 at 10:57:02AM -0500, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Mark Wong [EMAIL PROTECTED] writes:
  I have some initial results using 8.0beta5 with our OLTP workload.
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/
  throughput: 4076.97
 
  Do people really only look at the throughput numbers? Looking at those
  graphs it seems that while most of the OLTP transactions are fulfilled in
  subpar response times, there are still significant numbers that take as much
  as 30s to fulfil.
 
  Is this just a consequence of the type of queries being tested and the data
  distribution? Or is Postgres handling queries that could run consistently 
  fast
  but for some reason generating large latencies sometimes?
 
 Given the regular shape of the first graph (transactions/minute),
 there's every reason to think that the slowdowns are caused by
 checkpoint I/O storms.  It would be worth the trouble to experiment
 with adjusting the bgwriter parameters to even out the flow of write
 operations.  (I think I've already pointed out that the current defaults
 for the bgwriter seem mighty conservative.)
 
   regards, tom lane

I do have bgwriter_delay increased to 10, per previous
recommendation, which did smooth out the throughput graph
considerably.  I can continue to adjust those settings.

Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Andrew Dunstan

Tom Lane wrote:
In the case of Perl I suspect it is reasonably possible to determine
whether there is an eval surrounding the call or not, although we
might have to get more friendly with Perl's internal data structures
than a purist would like.  
 

Not really very hard. (caller(0))[3] should have the value (eval) if 
you are in an eval. There might also be some ways of getting this via 
the perlguts API although I'm not aware of it. Of course, if you're in a 
subroutine which is in turn called from an eval things get trickier, so 
we might have to walk the stack frames a bit.

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Mark Wong
On Tue, Nov 30, 2004 at 11:03:03AM -0500, Rod Taylor wrote:
 On Mon, 2004-11-29 at 16:01 -0800, Mark Wong wrote:
  I have some initial results using 8.0beta5 with our OLTP workload.
  Off the bat I see about a 23% improvement in overall throughput.  The
  most significant thing I've noticed was in the oprofile report where
  FunctionCall2 and hash_seq_search have moved down the profile a bit.
  
  Also, I have libc with symbols now so we can see what it's doing with
  in the oprofile output.
  
  8.0beta5 results:
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/
  throughput: 4076.97
  
  8.0beta4 results:
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/191/
  throughput: 3323.07
 
 It appears that WAL archiving (archive_command) is configured in 191 and
 disabled (unset) in 199. Perhaps this accounts for some of the
 difference?
 

I've found the archiving to be about a 1% overhead and in all my
random testing since then I haven't seen evidence otherwise.  Here's
a linke to that message:
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00430.php

But I'll try again too as things may have changed.

Mark

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] VACUUM FULL FREEZE is unsafe

2004-11-30 Thread Jan Wieck
On 11/27/2004 7:40 PM, Tom Lane wrote:
Thomas F.O'Connell [EMAIL PROTECTED] writes:
So why not have VACUUM FULL FREEZE just do what you propose: VACUUM 
FULL then VACUUM FREEZE.
The objective is to make it more safe, not less so.  Doing that would
require rewriting a whole bunch of code, which I am not up for at this
stage of the release cycle.
If the proper fix is too invasive for 8.0, then making FULL and FREEZE 
mutually exclusive is IMHO the right thing to do for 8.0.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Opinions on Usenet ...

2004-11-30 Thread Jan Wieck
On 11/29/2004 2:03 PM, Marc G. Fournier wrote:
If there were a comp.databases.postgresql.hackers newsgroup created and 
carried by all the news servers ... would you move to using it vs using 
the mailing lists?
Certainly not.
Jan
The USENET community seems to think that there would be a mass exodus from 
the lists to usenet ... based on past discussions concerning moving some 
stuff out of email to stuff like bug trackers, I don't believe this to be 
the case, but am curious what the opinion of other developers happens to 
be ... would a USENET group actually be preferrable?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Tom Lane
Mark Wong [EMAIL PROTECTED] writes:
 I do have bgwriter_delay increased to 10, per previous
 recommendation, which did smooth out the throughput graph
 considerably.  I can continue to adjust those settings.

Please try a variety of settings and post your results.  It would give
us some hard data to help in deciding what the defaults ought to be.
(I really doubt that the current defaults are reasonable...)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Thomas Hallgren
Tom Lane wrote:
On what evidence do you base that claim?  It's true there are no
existing Tcl or Perl functions that do error recovery from SPI
operations, because it doesn't work in existing releases.  That does
not mean the demand is not there.  We certainly got beat up on often
enough about the lack of error trapping in plpgsql.
 

Lack of error trapping is one thing. To state that all error trapping 
will do further accesses to the database is another altogether. I don't 
have evidence for my claim since subtransactions hasn't been available 
for that long but it's a pretty strong hunch. And the fact that all 
current PostgreSQL functions out there works this way today should count 
for something. Your suggestion will make the current code base 
significantly slower, IMO for no reason.

[ shrug... ]  If you intend to design pljava that way I can't stop you.
But I think it's a bogus design, because (a) it puts extra burden on the
function author who's already got enough things to worry about
So it's an extra burden to create a savepoint, and commit/rollback 
depending on the outcome? I'm sorry, but I have to disagree with that. I 
think it's a powerful concept that developers will want to exploit. 
Confusing try/catch with subtransactions is bogus and not an option for 
me as I don't have the liberty of changing the language.

A strong argument for my design is that if I where to write similar code 
in the client using a the JDBC driver, this is exactly what I'd have to 
do. Why should code look any different just because I move it to the 
backend?

So, I can't see the extra burden at all. This approach brings clarity, 
no magic, and it enables ports of languages where SQL access has been 
standardized to actually conform to that standard. That's most certainly 
not bogus!

(b) since you can't support arbitrary rollback patterns, you have to
contort the semantics of Savepoint objects with restrictions that are
both hard to design correctly and complicated to enforce.
 

On the contrary. It's very easy to enforce and PL/Java already does 
this. The design is simple and clean. Savepoints are prohibited to live 
beyond the invocation where they where created. If a savepoint is still 
active when an invocation exits, the savepoint is released or rolled 
back (depending on a GUC setting) and a warning is printed.

Here I have a couple of questions to you:
From your statement it sounds like you want to use the subtransactions 
solely in a hidden mechanism and completely remove the ability to use 
them from the function developer. Is that a correct interpretation?

Another question relating to a statement you made earlier. You claim 
that an SPI call should check to see if it it is in a subtransaction and 
only enter a new one if that's not the case. How do you in that case 
intend to keep track of where the subtransaction started? I.e. how far 
up in nesting levels do you need to jump before you reach the right place?

My argument is that whenever possible, you must let the creator of a 
subtransaction have the responsibility to commit or roll it back.

I don't believe you should do language design on the basis of avoiding
a 25% overhead
I don't do language design. I'm adhering to the JDBC standard and I have 
no way of enforcing magic code to be executed during try/catch. 
Meanwhile, I really want PL/Java developers to have the ability to make 
full use of savepoints.

I got it down from 50% to 25%
in one afternoon, doing nothing that seemed too risky for late beta.
I think there's plenty more that can be done there when we have more
time to work on it.
 

That's great. But even if you come down to 10% overhead it doesn't 
really change anything.

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [Testperf-general] Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Josh Berkus
Tom,

  I do have bgwriter_delay increased to 10, per previous
  recommendation, which did smooth out the throughput graph
  considerably. I can continue to adjust those settings.

 Please try a variety of settings and post your results. It would give
 us some hard data to help in deciding what the defaults ought to be.
 (I really doubt that the current defaults are reasonable...)

Doing a systematic progression test with bgwriter has been on my TODO list for 
a while.  Unfortunately, the Scalable Test Platform, which is designed for 
short runs with no vaccuum isn't the best place for it.  Mark and OSDL 
allocated us a dedicated machine but I've been tied up with paid work for the 
last 4 weeks and unable to finish the setup.

The idea is to have a machine that we can do 4-hour runs, with vacuum, of 
different bgwriter, checkpoint, etc. settings.  I've also built a 200GB DSS 
database for doing some significant testing with DSS workloads after some of 
the discussions around shared_buffers in November.

If anyone here has a great desire to write database-backed GUC test scripts 
(pref in Perl) then please give me a buzz on the testperf project 
(www.pgfoundry.org/testperf).

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Greg Stark wrote:
 
 Personally I find the current CSV support inadequate. It seems pointless to
 support CSV if it can't load data exported from Excel, which seems like the
 main use case. 
   
 
 
 OK, I'm starting to get mildly annoyed now. We have identified one 
 failure case connected with multiline fields. Even in the multiline 
 case, I expect that the embedded newlines will usually match those of 
 the CSV file, in which case there will be no failure. It's a very big 
 step from there to the far more general can't load data exported from 
 Excel. Or did you have some other limitation in mind?
 
 FWIW, I don't make a habit of using multiline fields in my spreadsheets 
 - and some users I have spoken to aren't even aware that you can have 
 them at all.

I am wondering if one good solution would be to pre-process the input
stream in copy.c to convert newline to \n and carriage return to \r and
double data backslashes and tell copy.c to interpret those like it does
for normal text COPY files.  That way, the changes to copy.c might be
minimal; basically, place a filter in front of the CSV file that cleans
up the input so it can be more easily processed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Increasing the length of

2004-11-30 Thread Bruce Momjian

Great idea.  Added to TODO:

* Make log_min_duration_statement output when the duration is reached rather
  than when the statement completes

  This prints long queries while they are running, making trouble shooting
  easier.  Also, it eliminates the need for log_statement because it
  would now be the same as a log_min_duration_statement of zero.

---

Simon Riggs wrote:
 On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote: 
  On Wed, Nov 10, 2004 at 09:52:17PM +, Simon Riggs wrote:
   On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Isn't that:
log_min_duration_statement (integer)
   
   That gets written when a statement completes, not during execution.
  
  I've been following this thread, and I was thinking the same thing. 
  I wonder how much work it'd be to have another log setting -- say
  log_statement_after_min_duration (integer) -- which did what Simon
  wants.  That'd more than satisfy my need, for sure.  Might the cost
  of that be too high, though?
 
 I think this is a great idea.
 
 ...Rather than invent a new GUC, I think this is the solution:
 
 log_min_duration_statement writes to log at end of execution, if
 execution time is greater than that threshold. Let's move that piece of
 code so it is executed as the query progresses. That way, you get
 notified that a problem query is occurring NOW, rather than it has
 occurred.
 
 The code already has such a timer check, for statement_timeout, in
 backend/storage/lmgr/proc.c. We could reuse this timer to go off at
 log_min_duration_statement and then log query if still executing. (If
 log_min_duration_statement = statement_timeout, we would skip that
 step.) We would then reset the timer so that it then goes off at where
 it does now, at statement_timeout. So, same piece of code, used twice...
 
 That way you can set up 2 limits, with three bands of actions:
 
 Between 0 and log_min_duration_statement 
 - logs nothing
 
 Between log_min_duration_statement and statement_timeout
 - statement written to log, though execution continues...
 
 At statement_timeout
 - statement cancelled
 
 We'd just need a small piece of code to set timer correctly first, then
 another piece to record state change and reset timer again. Lift and
 drop the existing code from end-of-execution.
 
 This then:
 - solves the *problem query* diagnosis issue, as originally raised by
 Sean and seconded by myself and Greg
 - makes the answer exactly what Tom proposed - look in the logs
 - doesn't make any changes to the technical innards of UDP and pgstats.c
 - no administrative interface changes, just slightly changed behaviour -
 existing users mostly wouldn't even notice we'd done it...
 
 Thoughts?
 
 Easy enough change to be included as a hot fix for 8.0: no new system
 code, no new interface code, just same behaviour at different time.
 
 
 -- 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 8.0beta5 results w/ dbt2

2004-11-30 Thread Mark Wong
On Tue, Nov 30, 2004 at 02:00:29AM -0500, Greg Stark wrote:
 Mark Wong [EMAIL PROTECTED] writes:
 
  I have some initial results using 8.0beta5 with our OLTP workload.
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/199/
  throughput: 4076.97
 
 Do people really only look at the throughput numbers? Looking at those
 graphs it seems that while most of the OLTP transactions are fulfilled in
 subpar response times, there are still significant numbers that take as much
 as 30s to fulfil.
 
 Is this just a consequence of the type of queries being tested and the data
 distribution? Or is Postgres handling queries that could run consistently fast
 but for some reason generating large latencies sometimes?
 
 I'm concerned because in my experience with web sites, once the database
 responds slowly for even a small fraction of the requests, the web server
 falls behind in handling http requests and a catastrophic failure builds.
 
 It seems to me that reporting maximum, or at least the 95% confidence interval
 (95% of queries executed between 50ms-20s) would be more useful than an
 overall average. 
 
 Personally I would be happier with an average of 200ms but an interval of
 100-300ms than an average of 100ms but an interval of 50ms-20s. Consistency
 can be more important than sheer speed.
 

Looking at just the throughput number is oversimplying it a bit.  The
scale factor (size of the database) limits what your maximum
throughput can be with constraints on think times (delays between
transaction requests) and the number of terminals simulated, which is
also dictated by the size of the database.  So given the throughput
with a scale factor (600 in these tests) you can infer whether or not
the response times are reasonable or not.  At the 600 warehouse scale
factor, we could theoretically hit about 7200 new-order transactions
per minute.  The math is roughly 12 * warehouses.

I do agree that reporting max response times and a confidence
interval (I have been meaning to report a 90th percentile number)
would be informative in addition to a mean.  Instead I included the
distribution charts in the mean time...

Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Increasing the length of

2004-11-30 Thread David Parker
I've been using log_min_duration_statement = 0 to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

- DAP

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
Sent: Tuesday, November 30, 2004 1:20 PM
To: Simon Riggs
Cc: Andrew Sullivan; [EMAIL PROTECTED]
Subject: Re: [HACKERS] Increasing the length of


Great idea.  Added to TODO:

* Make log_min_duration_statement output when the duration is 
reached rather
  than when the statement completes

  This prints long queries while they are running, making 
trouble shooting
  easier.  Also, it eliminates the need for log_statement because it
  would now be the same as a log_min_duration_statement of zero.

---


Simon Riggs wrote:
 On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote: 
  On Wed, Nov 10, 2004 at 09:52:17PM +, Simon Riggs wrote:
   On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Isn't that:
log_min_duration_statement (integer)
   
   That gets written when a statement completes, not during 
execution.
  
  I've been following this thread, and I was thinking the 
same thing. 
  I wonder how much work it'd be to have another log setting -- say 
  log_statement_after_min_duration (integer) -- which did what Simon 
  wants.  That'd more than satisfy my need, for sure.  Might 
the cost 
  of that be too high, though?
 
 I think this is a great idea.
 
 ...Rather than invent a new GUC, I think this is the solution:
 
 log_min_duration_statement writes to log at end of execution, if 
 execution time is greater than that threshold. Let's move that piece 
 of code so it is executed as the query progresses. That way, you get 
 notified that a problem query is occurring NOW, rather than it has 
 occurred.
 
 The code already has such a timer check, for statement_timeout, in 
 backend/storage/lmgr/proc.c. We could reuse this timer to go off at 
 log_min_duration_statement and then log query if still 
executing. (If 
 log_min_duration_statement = statement_timeout, we would skip that
 step.) We would then reset the timer so that it then goes 
off at where 
 it does now, at statement_timeout. So, same piece of code, 
used twice...
 
 That way you can set up 2 limits, with three bands of actions:
 
 Between 0 and log_min_duration_statement
 - logs nothing
 
 Between log_min_duration_statement and statement_timeout
 - statement written to log, though execution continues...
 
 At statement_timeout
 - statement cancelled
 
 We'd just need a small piece of code to set timer correctly first, 
 then another piece to record state change and reset timer 
again. Lift 
 and drop the existing code from end-of-execution.
 
 This then:
 - solves the *problem query* diagnosis issue, as originally 
raised by 
 Sean and seconded by myself and Greg
 - makes the answer exactly what Tom proposed - look in the logs
 - doesn't make any changes to the technical innards of UDP and 
 pgstats.c
 - no administrative interface changes, just slightly changed 
behaviour 
 - existing users mostly wouldn't even notice we'd done it...
 
 Thoughts?
 
 Easy enough change to be included as a hot fix for 8.0: no 
new system 
 code, no new interface code, just same behaviour at different time.
 
 
 --
 Best Regards, Simon Riggs
 
 
 ---(end of 
 broadcast)---
 TIP 9: the planner will ignore your desire to choose an 
index scan if your
   joining column's datatypes do not match
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, 
Pennsylvania 19073

---(end of 
broadcast)---
TIP 9: the planner will ignore your desire to choose an index 
scan if your
  joining column's datatypes do not match


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan

Bruce Momjian wrote:
I am wondering if one good solution would be to pre-process the input
stream in copy.c to convert newline to \n and carriage return to \r and
double data backslashes and tell copy.c to interpret those like it does
for normal text COPY files.  That way, the changes to copy.c might be
minimal; basically, place a filter in front of the CSV file that cleans
up the input so it can be more easily processed.
 

This would have to parse the input stream, because you would need to 
know which CRs and LFs were part of the data stream and so should be 
escaped, and which really ended data lines and so should be left alone. 
However, while the idea is basically sound, parsing the stream twice 
seems crazy. My argument has been that at this stage in the dev cycle we 
should document the limitation, maybe issue a warning as you want, and 
make the more invasive code changes to fix it properly in 8.1. If you 
don't want to wait, then following your train of thought a bit, ISTM 
that the correct solution is a routine for CSV mode that combines the 
functions of CopyReadAttributeCSV() and CopyReadLine(). Then we'd have a 
genuine and fast fix for Greg's and Darcy's problem.

cheers
andrew
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Increasing the length of

2004-11-30 Thread Bruce Momjian
David Parker wrote:
 I've been using log_min_duration_statement = 0 to get durations on all
 SQL statements for the purposes of performance tuning, because this logs
 the duration on the same line as the statement. My reading of this TODO
 is that now log_min_duration_statement = 0 would give me the statements
 but no total duration?

Oh, sorry, you are right.  I forgot about the duration part!  I got so
excited I forgot.

TODO item removed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Bruce Momjian wrote:
 
 I am wondering if one good solution would be to pre-process the input
 stream in copy.c to convert newline to \n and carriage return to \r and
 double data backslashes and tell copy.c to interpret those like it does
 for normal text COPY files.  That way, the changes to copy.c might be
 minimal; basically, place a filter in front of the CSV file that cleans
 up the input so it can be more easily processed.
   
 
 
 This would have to parse the input stream, because you would need to 
 know which CRs and LFs were part of the data stream and so should be 
 escaped, and which really ended data lines and so should be left alone. 
 However, while the idea is basically sound, parsing the stream twice 
 seems crazy. My argument has been that at this stage in the dev cycle we 
 should document the limitation, maybe issue a warning as you want, and 
 make the more invasive code changes to fix it properly in 8.1. If you 

OK, right.

 don't want to wait, then following your train of thought a bit, ISTM 
 that the correct solution is a routine for CSV mode that combines the 
 functions of CopyReadAttributeCSV() and CopyReadLine(). Then we'd have a 
 genuine and fast fix for Greg's and Darcy's problem.

We are fine for 8.0, except for the warning, and you think we can fix it
perfectly in 8.1, good.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] createdb failed

2004-11-30 Thread alex yap
Hi, 

recently i need to use pg in my project. everything going ok till when i want to createdb
it appear this :

Warning : could not remove database directory "/var/postgresql/data/base/17147"
Detail: Failing system command was : rm -rf '/var/postgresql/data/base/17147'
Error: could not initialize database directory; delete failed as well
detail : failing system command was : cp -r '/var/postgresql/data/base/1' '/var/postgresql/data/base/17147'

please anyone 'master' here help , my deadline is near.. please.

alex


		Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses.

Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Tom Lane
Thomas Hallgren [EMAIL PROTECTED] writes:
  From your statement it sounds like you want to use the subtransactions 
 solely in a hidden mechanism and completely remove the ability to use 
 them from the function developer. Is that a correct interpretation?

No; I would like to develop the ability to specify savepoints in pltcl
and plperl, so that already-executed SPI commands can be rolled back at
need.  But that is a feature for later --- it's way too late to think
about it for 8.0.  Moreover, having that will not remove the requirement
for the state after catching a SPI error to be sane.

The fundamental point you are missing, IMHO, is that a savepoint is a
mechanism for rolling back *already executed* SPI commands when the
function author wishes that to happen.  A failure in an individual
command should not leave the function in a broken state.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread James William Pye
While your message was directed at Thomas, I think I share Thomas'
position; well, for the most part.

On Tue, 2004-11-30 at 11:21 -0500, Tom Lane wrote:
 But I think it's a bogus design, because (a) it puts extra burden on the
 function author who's already got enough things to worry about, and

Simply put, IMO, a subtransaction is != an exception, and shouldn't be
treated as one.

If the author wishes to worry about transaction management that is his
worry. I don't feel the extra burden is significant enough to justify
hacking around in the Python interpreter(assuming that it's possible in
the first place).

Personally, I think the decision is fine for plpgsql, but not for
Python, or just about any other language. plpgsql is a special case,
IMO.

 (b) since you can't support arbitrary rollback patterns, you have to
 contort the semantics of Savepoint objects with restrictions that are
 both hard to design correctly and complicated to enforce.

Hrm, isn't this what savepoint levels are supposed to do? Impose those
restrictions?
I'm guessing Postgres doesn't have savepoint levels yet, per lack of
response to my message inquiring about them(well, a savepoint scoping
facility), and poking around xact.h not revealing anything either.


I think I may hold a more of a hold nose stance here than Thomas. I am
not sure if I want to implement savepoint/rollback restrictions as I
can't help but feel this is something Postgres should handle; not me or
any other PL or C Function author.

plpy being an untrusted language, I *ultimately* do not have control
over this. I can only specify things within my code. I *cannot* stop a
user from making an extension module that draws interfaces to those
routines that may rollback to a savepoint defined by the caller. (Not a
great point, as a user could also try to dereference a NULL pointer from
an extension module as well. ;)

I feel if I were to implement such restrictions/regulations it would be
analogous to a security guard trying to enforce the law, whereas a real
police officer is needed.. ;-)

-- 
Regards,
James William Pye


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Thomas Hallgren
James William Pye wrote:
I think I may hold a more of a hold nose stance here than Thomas. I am
not sure if I want to implement savepoint/rollback restrictions as I
can't help but feel this is something Postgres should handle; not me or
any other PL or C Function author.
 

I agree with this but it was simple enough to implement. I'll of course 
remove my own implementation should PostgreSQL handle this in the future .

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Tom Lane
James William Pye [EMAIL PROTECTED] writes:
 plpy being an untrusted language, I *ultimately* do not have control
 over this. I can only specify things within my code. I *cannot* stop a
 user from making an extension module that draws interfaces to those
 routines that may rollback to a savepoint defined by the caller.

In which case, whether it works or not is his problem not yours ;-)
This is a straw-man argument, as is the entire discussion IMHO.
Wrapping each individual SPI command in a subtransaction IN NO WAY
prevents us from adding programmer-controllable savepoint features
to the PL languages later.  It simply ensures that we have somewhat
sane error recovery behavior in the meantime.  The only valid argument
against doing it is the one of added overhead, and I already gave my
responses to that one.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Increasing the length of

2004-11-30 Thread Jim C. Nasby
Could we come up with a compromise then? I guess maybe another setting
that says log any query when it hits more than x amount of time. (I'd
also argue you should get a NOTICE or WARNING when this exceeds the
query timeout time).

A perhapse more friendly alternative would be a way to query to get this
information in real-time, but that probably goes back into the
discussion about the length of data made available in pg_stat_activity.

On Tue, Nov 30, 2004 at 02:32:05PM -0500, Bruce Momjian wrote:
 David Parker wrote:
  I've been using log_min_duration_statement = 0 to get durations on all
  SQL statements for the purposes of performance tuning, because this logs
  the duration on the same line as the statement. My reading of this TODO
  is that now log_min_duration_statement = 0 would give me the statements
  but no total duration?
 
 Oh, sorry, you are right.  I forgot about the duration part!  I got so
 excited I forgot.
 
 TODO item removed.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] nodeAgg perf tweak

2004-11-30 Thread Neil Conway
I noticed that we have a bottleneck in aggregate performance in
advance_transition_function(): according to callgrind, doing datumCopy()
and pfree() for every tuple produced by the transition function is
pretty expensive. Some queries bare this out:

dvl=# SELECT W.element_id, count(W.element_ID) FROM watch_list_element W
GROUP by W.element_id ORDER by count(W.element_ID) LIMIT 5;
 element_id | count
+---
  65525 | 1
 163816 | 1
  16341 | 1
 131023 | 1
  65469 | 1
(5 rows)

Time: 176.723 ms

dvl=# select count(*) from watch_list_element;
 count

 138044
(1 row)

Time: 94.246 ms

I've attached a quick and dirty hack that avoids the need to palloc()
and pfree() for every tuple produced by the aggregate's transition
function. This results in:

dvl=# SELECT W.element_id, count(W.element_ID) FROM watch_list_element W
GROUP by W.element_id ORDER by count(W.element_ID) LIMIT 5;
 element_id | count   
+---  
  65525 | 1   
 163816 | 1   
  16341 | 1   
 131023 | 1   
  65469 | 1   
(5 rows)  
  
Time: 154.378 ms  

dvl=# select count(*) from watch_list_element;
 count
  
 138044   
(1 row)   
  
Time: 73.975 ms   

I can reproduce this performance difference consistently. I thought this
might have been attributable to memory checking overhead because
assertions were enabled, but that doesn't appear to be the case (the
above results are without --enable-cassert).

The attached patch invokes the transition function in the current memory
context. I don't think that's right: a memory leak in an aggregate's
transition function would be problematic when we're invoked from a
per-query memory context, as is the case with advance_aggregates().
Perhaps we need an additional short-lived memory context in
AggStatePerAggData: we could invoke the transition function in that
context, and reset it once per, say, 1000 tuples.

Alternatively we could just mandate that aggregate transition function's
not leak memory and then invoke the transition function in, say, the
aggregate's memory context, but that seems a little fragile.

Comments?

-Neil

# 
# patch src/backend/executor/nodeAgg.c
#  from [851fd2d59a89ee2e3c23a1ca8fdbf4d466f98d26]
#to [532ce100a0de1288fb23acc8de3a6bcac0982ec3]
# 
--- src/backend/executor/nodeAgg.c
+++ src/backend/executor/nodeAgg.c
@@ -350,10 +350,10 @@
 			return;
 		}
 	}
-
+#if 0
 	/* We run the transition functions in per-input-tuple memory context */
 	oldContext = MemoryContextSwitchTo(aggstate-tmpcontext-ecxt_per_tuple_memory);
-
+#endif
 	/*
 	 * OK to call the transition function
 	 *
@@ -375,6 +375,7 @@
 
 	newVal = FunctionCallInvoke(fcinfo);
 
+#if 0
 	/*
 	 * If pass-by-ref datatype, must copy the new value into aggcontext
 	 * and pfree the prior transValue.	But if transfn returned a pointer
@@ -393,11 +394,13 @@
 		if (!pergroupstate-transValueIsNull)
 			pfree(DatumGetPointer(pergroupstate-transValue));
 	}
+#endif
 
 	pergroupstate-transValue = newVal;
 	pergroupstate-transValueIsNull = fcinfo.isnull;
-
+#if 0
 	MemoryContextSwitchTo(oldContext);
+#endif
 }
 
 /*

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Increasing the length of

2004-11-30 Thread Bruce Momjian
Jim C. Nasby wrote:
 Could we come up with a compromise then? I guess maybe another setting
 that says log any query when it hits more than x amount of time. (I'd
 also argue you should get a NOTICE or WARNING when this exceeds the
 query timeout time).
 
 A perhapse more friendly alternative would be a way to query to get this
 information in real-time, but that probably goes back into the
 discussion about the length of data made available in pg_stat_activity.

Yes.  I don't see a huge win for adding another GUC variable.

---


 
 On Tue, Nov 30, 2004 at 02:32:05PM -0500, Bruce Momjian wrote:
  David Parker wrote:
   I've been using log_min_duration_statement = 0 to get durations on all
   SQL statements for the purposes of performance tuning, because this logs
   the duration on the same line as the statement. My reading of this TODO
   is that now log_min_duration_statement = 0 would give me the statements
   but no total duration?
  
  Oh, sorry, you are right.  I forgot about the duration part!  I got so
  excited I forgot.
  
  TODO item removed.
  
  -- 
Bruce Momjian|  http://candle.pha.pa.us
[EMAIL PROTECTED]   |  (610) 359-1001
+  If your life is a hard drive, |  13 Roberts Road
+  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
  
  ---(end of broadcast)---
  TIP 6: Have you searched our list archives?
  
 http://archives.postgresql.org
  
 
 -- 
 Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
 Give your computer some brain candy! www.distributed.net Team #1828
 
 Windows: Where do you want to go today?
 Linux: Where do you want to go tomorrow?
 FreeBSD: Are you guys coming, or what?
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] nodeAgg perf tweak

2004-11-30 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 I've attached a quick and dirty hack that avoids the need to palloc()
 and pfree() for every tuple produced by the aggregate's transition
 function.

And how badly does it leak memory?  I do not believe this patch is
tenable.

Something that occurred to me the other morning in the shower is that we
could trivially inline MemoryContextSwitchTo() when using gcc, much as
you did for list_length().  I haven't gotten around to doing it but it
seems like a free percent-or-two improvement.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] libpq and psql not on same page about SIGPIPE

2004-11-30 Thread Bruce Momjian
Tom Lane wrote:
 libpq compiled with --enable-thread-safety thinks it can set the SIGPIPE
 signal handler.  It thinks once is enough.
 
 psql thinks it can arbitrarily flip the signal handler between SIG_IGN
 and SIG_DFL.  Ergo, after the first use of the pager for output, libpq's
 SIGPIPE handling will be broken.
 
 I submit that psql is unlikely to be the only program that does this,
 and therefore that libpq must be considered broken, not psql.

I have researched possible fixes for our threading sigpipe handling in
libpq.  Basically, we need to ignore SIGPIPE in socket send() (and
SSL_write) because if the backend dies unexpectedly, the process will
die.  libpq would rather trap the failure.

In 7.4.X we set ignore for SIGPIPE before write and reset it after
write, but that doesn't work for threading because it affects all
threads, not just the thread using libpq.

Our current setup is wrong because an application could change SIGPIPE
for its own purposes (like psql does) and remove our custom thread
handler for sigpipe.

The best solution seems to be one suggested by Manfred in November of
2003:

 signal handlers are a process property, not a thread property - that 
 code is broken for multi-threaded apps.
 At least that's how I understand the opengroup man page, and a quick 
 google confirmed that:
 http://groups.google.de/groups?selm=353662BF.9D70F63A%40brighttiger.com
 
 I haven't found a reliable thread-safe approach yet:
 My first idea was block with pthread_sigmask, after send check if 
 pending with sigpending, and then delete with sigwait, and restore 
 blocked state. But that breaks if SIGPIPE is blocked and a signal is 
 already pending: there is no way to remove our additional SIGPIPE. I 
 don't see how we can avoid destroying the realtime signal info.

His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask.  Seems
we could also check errno for SIGPIPE rather than calling sigpending.

He has a concern about an application that already blocked SIGPIPE and
has a pending SIGPIPE signal waiting already.  One idea would be to
check for sigpending() before the send() and clear the signal only if
SIGPIPE wasn't pending before the call.  I realize that if our send()
also generates a SIGPIPE it would remove the previous realtime signal
info but that seems a minor problem.

Comments?  This seems like our only solution.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] nodeAgg perf tweak

2004-11-30 Thread Neil Conway
On Tue, 2004-11-30 at 23:15 -0500, Tom Lane wrote:
 And how badly does it leak memory?  I do not believe this patch is
 tenable.

Did you read the rest of my mail?

 Something that occurred to me the other morning in the shower is that we
 could trivially inline MemoryContextSwitchTo() when using gcc, much as
 you did for list_length().  I haven't gotten around to doing it but it
 seems like a free percent-or-two improvement.

Yeah, it actually occurred to me as well this would be worth doing. It's
not relevant to this particular example though.

-Neil



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Error handling in plperl and pltcl

2004-11-30 Thread Thomas Hallgren
Tom Lane wrote:
The fundamental point you are missing, IMHO, is that a savepoint is a
mechanism for rolling back *already executed* SPI commands when the
function author wishes that to happen.
Of course. That's why it's imperative that it is the developer that 
defines the boundaries. I forsee that it will be very common that the 
author wishes this to happen due to a failure of some kind. But sure, 
there might be other reasons too.

A failure in an individual
command should not leave the function in a broken state.
 

Well, if the function doesn't continue, there's not much point in doing 
repair work, is there? And that's the essence of the whole discussion.

You say:
Let's always take the overhead of adding a subtransaction so that the 
caller will be able to return to a known state, regardless if he wants 
to do so.

I say:
Let the caller decide when to add this overhead since he is the one who 
knows a) when it's indeed needed at all and b) where to best define the 
boundaries.

Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] [GENERAL] USENET vs Mailing Lists Poll ...

2004-11-30 Thread Joel
As long as the web page maintainers are going to the trouble of taking a
survey, might I (at the risk of being tarred and feathered :-p) suggest
a more thorough survey?

Suggested questions:

(1) If there were a USENET newsfeed, under comp.databases.postgresql.*,
of one or more of the current postgresql mailing lists, I would

(a) use USENET primarily,
(b) use both USENET and the mailing lists,
(c) use the mailing lists primarily,
(d) unsubsribe from the mailing lists and use neither, or 
(e) not sure at this time.

(2) If there were a separate USENET comp.databases.postgresql newsgroup
created, I would

(a) use the separate USENET newsgroup primarily,
(b) use both the separate USENET newsgroup and the mailing lists,
(c) use the mailing lists primarily,
(d) unsubsribe from the mailing lists and use neither, or 
(e) not sure at this time.

(3) Concerning USENET, I would prefer

(a) that the mailing lists be gated to USENET,
(b) that the mailing lists and USENET be kept seperate,
(c) that USENET go take a leap ;-/, or
(d) not sure at this time.

(4) If the mailing lists are gated to USENET, I would prefer

(a) that the current SPAM moderation policy apply to both,
(b) that no moderation occur on either USENET or the lists,
(c) that kooks who post to USENET be tarred and feathered 8-*, or
(d) not sure at this time.

Please not that this is not an attempt at a survey, see 3c and 4c. It is
only a suggestion.

--
Joel Rees   [EMAIL PROTECTED]
digitcom, inc.   株式会社デジコム
Kobe, Japan   +81-78-672-8800
** http://www.ddcom.co.jp **


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Adding Reply-To: listname to Lists configuration ...

2004-11-30 Thread Bruno Wolff III
On Mon, Nov 29, 2004 at 12:49:46 +,
  Chris Green [EMAIL PROTECTED] wrote:
 
 This is a perpetual problem, if people all used the same MUA and
 (assuming it has the capability) all used the 'reply to list' command
 to reply to the list everything would be wonderful!  :-)

I think using mail-followup-to is better than having people do reply to list.

I think the main benefit to having reply-to point to the list is for supporting
clueless users on lists (who don't seem to understand the difference between
reply to sender and reply to all) and I don't think we have too many of those
here.

When I am subscribed to lists that force reply-to to point to the list,
I have my mail filter remove those headers so that things will work
normally (other than not allowing a sender to use reply-to of their own).

Reply-to would be especially bad for the postgres lists as nonsubscribers
can post and that the list servers are often slow.

People who don't want separate copies of messages should set the
mail-followup-to header to indicate that preference. This isn't perfect
since not all mail clients support this and some set up is required to
make your client aware of the list. It is also possible for mailing list
software to handle this preference for you (by not sending copies to addresses
on the list that appear in the recipient headers), but I don't know if the
software in use has that capability.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster