date:20060613

Re: [HACKERS] Ranges for well-ordered types

2006-06-13 Thread Michael Glaesemann

Thanks to everyone for the feedback that I've received so far. It's  
clear that there's interest in this.


On Jun 12, 2006, at 3:22 , Josh Berkus wrote:

I do think Jim is right, though, in that we may want to look for  
portions of
the functionality which are achievable in the context of one  
PostgreSQL

version, unless you're going to be working full-time on this patch.


I definitely agree with implementing it in parts. I doubt it's  
possible, but perhaps a first bit might make it into 8.2 :)


In real-world calendaring applications, I *certainly* see the need  
for a

successor function.  However, that would require being able to define
timestamps with a variable precision, e.g. TIMESTAMP('5 minutes').   
This, by
itself would be a significant effort, yet useful ... maybe that's  
where to

start?


As mentioned in an earlier email, I think calendaring applications in  
particular would benefit from timestamp precisions of less than 1  
second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However,  
I think this is a thorny problem. To elaborate, I believe the  
precision has to be relative to some "baseline". From 12:00, 30  
minute precision would presumably allow 12:00, 12:30, 13:00, 13:30,  
and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and  
so on. But these are relative to the time zone they're in. While  
12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision,  
that same timestamp is 4:30 in Tehran (+3:30) if I got the math  
right. Is 4:30 a timestamp value with 1 hour precision? Because of  
this, I think timestamp precisions of less than 1 second (timestamp 
(0)) require storing the time zone as part of the timestamp value.


Pushing this even further, would we allow arbitrary precision? For  
example, would 45-minute precision be allowed? In that case, I  
believe we'd need to go further than storing just the time zone with  
the timestamp value. The timestamp value would have to be relative to  
some baseline timestamp to be able to calculate whether or not the  
difference between any particular timestamp and the baseline  
timestamp is integral. Perhaps this could be accomplished using  
domains and some additional checking function? I'm not sure. It's  
enough to make me want to forget about the idea of disallowing any  
precision that is not an evenly divided into the next larger "time  
part": any precision between 0 seconds and 1 minute would have to be  
a number of seconds evenly divided into 60; between 1 hour and 1 day,  
precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12  
hours.


I've been able to discuss the issue of timestamp precision without  
bringing up successor functions or ranges at all, and indeed I think  
it's orthogonal to the range implementation. I think they're both  
concepts that should be included in PostgreSQL, but as for myself,  
I'm more interested in the range implementation than the the  
timestamp precision issue.


By the way, anyone care to weigh in on what term we should use when  
discussing this? Josh has used PERIOD. Should we go with that for now?


A somewhat related issue: would we want any implementation to follow  
(at least part) of the not-yet-standard SQL/Temporal draft? Or would  
it be more desirable to steer clear of using any terms/syntax that  
was included in an attempt to prevent any possible conflict with a  
future SQL spec?


You're probably going to have to give up on B-Tree indexes for  
PERIODs, and
look towards GiST.  For one thing, I would see UNIQUE in the  
context of a

PERIOD defined as non-overlapping.  e.g.:


I think that a non-overlapping constraint goes above and beyond what  
UNIQUE requires. In my opinion, UNIQUE should test for equality,  
rather than non-overlapping, as that keeps the meaning of UNIQUE  
consistent across all types and may actually be useful in some  
instances. I do think it would be convenient to have some sort of  
syntax that would provide a non-overlapping constraint rather than  
having to code up a constraint trigger every time you wanted to do  
this. As Martijn pointed out, when GiST can be used for a UNIQUE  
constraint, we should be able to define the non-overlapping  
constraint quite easily. So this could be thought of as a third  
orthogonal issue for ranges, the first two being the range type  
constructor and timestamp precision < 1 second. Any one of these  
three could be done independently and improve PostgreSQL. In  
combination they are definitely a very nice package.


On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:

Date ranges are really closed open as well (as finite sets of  
isolated points
are both open and closed). The only oddity would be that the date  
used to

indicate the open end of the range might not be what the user expects.


I think it's definitely a matter of interpretation. [2006-01-01,  
2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.  
Who's to say which is the "r

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> This is just nonsense. There is not the slightest reason that CSV data 
> cannot be embedded in a text dump nor exist as the data members of a 
> custom or tar dump with the corresponding COPY statements modified 
> accordingly.

Well, the really *core* question here is whether we trust the stability
of the CSV format definition (and code) enough to want to rely on it for
data dump/restore purposes.  I'm still a few years away from that,
myself.  AFAICT the raison d'etre of the CSV code is "emit whatever it
takes to satisfy this, that, and the other broken Microsoft application".
That's fine as an export tool, but as a dump/reload tool, nyet.  If you
put it in pg_dump you're just handing neophytes another foot-gun.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby

On Tue, Jun 13, 2006 at 11:13:55PM -0300, Marc G. Fournier wrote:
> >BTW, there's another FBSD performance odditiy I've run across. Running
> >
> >pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &
> >
> >which dumps the email_contrib table to bzip2 then to disk, the OS won't 
> >use more than 1 CPU on an SMP system... unless the data is cached. 
> >According to both gstat and systat -v, the system isn't I/O bound; both 
> >are reporting the RAID10 with that table on it as only about 10% busy. If 
> >I let that command run for a bit then cancel it and re-start it so that 
> >the beginning of that table is in cache, it will use one entire CPU for 
> >bzip2, which is what I'd expect to happen.
> 
> What version of FreeBSD are you dealing with here?  I'm guessing at least 
> 6.x, but just figured I'd clarify ...

FreeBSD 6.0-STABLE #6: Fri Dec  9 19:14:19 UTC 2005
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Tom Lane

Bruce Momjian  writes:
> Right.  What if the postmaster signals the backend once a second to do
> their reporting.  I am not sure what the final solution will be, but we
> _need_ one based on the performance numbers I and others have seen. 
> Could we have PGPROC have a reporting boolean that is set every second
> and somehow checked by each backend?

I don't see any point in involving the postmaster in it.  What might be
interesting is to replace the current backend timer-interrupt handling
by a free-running cyclic interrupt every N milliseconds (the resolution
of the statement_timeout and deadlock_check delays would then be no
better than N milliseconds, since those actions would occur at the next
cyclic interrupt after the desired time expires).  We could possibly
drive stats reports and ps_status updates from this, as well as sampling
EXPLAIN ANALYZE if anyone cares to pursue that.

Complaints I can foresee:

* lots of cycles wasted in idle backends.  Possibly a backend that's not
received any command for a second or two could shut down its interrupt
until it next gets a command.

* not clear whether the interrupt happens when waiting for I/O.  I
already mentioned that this would be a problem for EXPLAIN ANALYZE,
but it might be no big deal for the other uses.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Marc G. Fournier


On Tue, 13 Jun 2006, Jim Nasby wrote:



On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.


Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(


Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs.  It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.


BTW, there's another FBSD performance odditiy I've run across. Running

pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &

which dumps the email_contrib table to bzip2 then to disk, the OS won't use 
more than 1 CPU on an SMP system... unless the data is cached. According to 
both gstat and systat -v, the system isn't I/O bound; both are reporting the 
RAID10 with that table on it as only about 10% busy. If I let that command 
run for a bit then cancel it and re-start it so that the beginning of that 
table is in cache, it will use one entire CPU for bzip2, which is what I'd 
expect to happen.


What version of FreeBSD are you dealing with here?  I'm guessing at least 
6.x, but just figured I'd clarify ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim Nasby



On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.


Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with  
a lot

of procs in either semwait or semlock. :(


Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs.  It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.


BTW, there's another FBSD performance odditiy I've run across. Running

pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &

which dumps the email_contrib table to bzip2 then to disk, the OS  
won't use more than 1 CPU on an SMP system... unless the data is  
cached. According to both gstat and systat -v, the system isn't I/O  
bound; both are reporting the RAID10 with that table on it as only  
about 10% busy. If I let that command run for a bit then cancel it  
and re-start it so that the beginning of that table is in cache, it  
will use one entire CPU for bzip2, which is what I'd expect to happen.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan




Bruce Momjian wrote:


pg_dump CSV TODO item removed until we come up with something everyone
can agree on.

 



That's a pity.

Just to show you how little is involved in what I was suggesting, a 
prototype patch is attached - it's 182 lines of context diff, which is 
pretty small for a new feature. It took me about an hour to write and I 
have tested it against the regression db in both text and binary dump 
modes, where it works without a hitch.


cheers

andrew


Index: src/bin/pg_dump/pg_dump.c
===
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.438
diff -c -r1.438 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	9 Jun 2006 19:46:09 -	1.438
--- src/bin/pg_dump/pg_dump.c	14 Jun 2006 00:32:03 -
***
*** 113,118 
--- 113,123 
  /* flag to turn on/off dollar quoting */
  static int	disable_dollar_quoting = 0;
  
+ /* flag to control if using CSv */
+ static bool use_csv = false;
+ 
+ /* holder for CSV options */
+ static PQExpBuffer csv_opts;
  
  static void help(const char *progname);
  static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid);
***
*** 251,256 
--- 256,265 
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
+ 		/* long options with no short version */
+ 		{"csv", no_argument, NULL ,2},
+ 		{"csv-option",required_argument, NULL, 3},
+ 
  		{NULL, 0, NULL, 0}
  	};
  	int			optindex;
***
*** 285,290 
--- 294,301 
  		}
  	}
  
+ 	csv_opts = createPQExpBuffer();
+ 
  	while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:oOp:RsS:t:uU:vWxX:Z:",
  			long_options, &optindex)) != -1)
  	{
***
*** 419,424 
--- 430,462 
  break;
  /* This covers the long options equivalent to -X xxx. */
  
+ 			case 2:  /* csv */
+ use_csv = true;
+ break;
+ 
+ 			case 3: /* csv-option */
+ if (strcmp(optarg, "singlequote") == 0)
+ {
+ 	appendPQExpBuffer(csv_opts,"QUOTE AS  ");
+ }
+ else if (strcmp(optarg, "tabdelimiter") == 0)
+ {
+ 	appendPQExpBuffer(csv_opts,"DELIMITER AS E'\\t' ");
+ }
+ else if (strcmp(optarg, "header") == 0)
+ {
+ 	appendPQExpBuffer(csv_opts,"HEADER ");
+ }
+ else
+ {
+ 	fprintf(stderr,
+ 			_("%s: invalid csv option -- %s\n"),
+ 			progname, optarg);
+ 	fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+ 	exit(1);
+ }
+ 	
+ 
  			case 0:
  break;
  
***
*** 463,468 
--- 501,518 
  		exit(1);
  	}
  
+ 	if ( use_csv == true && dumpInserts == true)
+ 	{
+ 		write_msg(NULL, "INSERT (-d, -D) and CSV (--csv) options cannot be used together\n");
+ 		exit(1);
+ 	}
+ 
+ 	if ( use_csv == false && strlen(csv_opts->data) > 0)
+ 	{
+ 		write_msg(NULL, "You must specify --csv to use --csv-option\n");
+ 		exit(1);
+ 	}
+ 
  	/* open the output file */
  	switch (format[0])
  	{
***
*** 714,719 
--- 764,771 
  			 "   use SESSION AUTHORIZATION commands instead of\n"
  			 "   OWNER TO commands\n"));
  
+ 	printf(_("  --csvuse CSV mode\n"));
+ 	printf(_("  --csv-option=opt one of header, tabdelimiter or singlequote\n"));
  	printf(_("\nConnection options:\n"));
  	printf(_("  -h, --host=HOSTNAME  database server host or socket directory\n"));
  	printf(_("  -p, --port=PORT  database server port number\n"));
***
*** 881,898 
  
  	if (oids && hasoids)
  	{
! 		appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout;",
  		  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  		 classname),
  		  column_list);
  	}
  	else
  	{
! 		appendPQExpBuffer(q, "COPY %s %s TO stdout;",
  		  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  		 classname),
  		  column_list);
  	}
  	res = PQexec(g_conn, q->data);
  	check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
  	PQclear(res);
--- 933,955 
  
  	if (oids && hasoids)
  	{
! 		appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout",
  		  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  		 classname),
  		  column_list);
  	}
  	else
  	{
! 		appendPQExpBuffer(q, "COPY %s %s TO stdout",
  		  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  		 classname),
  		  column_list);
  	}
+ 	if (use_csv)
+ 	{
+ 		appendPQExpBuffer(q, " CSV %s", csv_opts->data);
+ 	}
+ 	appendPQExpBuffer(q, ";");
  	res = PQexec(g_conn, q->data);
  	check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
  	PQclear(res);
***
*** 1139,1147 
  		/* must use 2 steps here 'cause fmtId is nonreentrant */
  		appendPQExpBuffer(copyBuf, "COPY %s ",
  		  fmtId(tbinfo->dobj.name));
! 		appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
  		  fmtCopyColumnList(tbinfo)

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Tom Lane

Martijn van Oosterhout  writes:
> What I was thinking is why we couldn't just store the information in a
> global shared system table that is only read on config reload. You
> could have a few columns, maybe the first being a list name, which is
> referenced from a GUC.

Hmmm ... if we keep the notion of a GUC that identifies a set of
compatible timezone names, then a table with a primary key of
(tz_set_name, tz_name) doesn't seem quite so awful.  The main
remaining objection I can see is that the postmaster couldn't use
it, only backends.  Now this doesn't matter much as far as timestamp
operations go because I don't think the postmaster does any operations
that need TZ data --- but what of verifying that the GUC variable has
a valid value in postgresql.conf at startup?  If you're willing to
abandon sanity checking on that string, it might work.

One interesting thought about a system table is that it could be
referenced through a syscache, which'd have the nice property that only
the (probably few) values actually referenced in a given session need to
get loaded.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Jim C. Nasby

On Tue, Jun 13, 2006 at 11:11:26PM +0200, Martijn van Oosterhout wrote:
> On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote:
> > Martijn van Oosterhout  writes:
> > > Any particular reason this can't be a normal table in pg_catalog which
> > > you can select/update.
> > 
> > That doesn't do anything to help with one of the main problems: that
> > we have at least two (maybe more) alternative sets of names that people
> > might want as default.
> 
> 
> 
> I think my actual point was something else. We currently get calls from
> people trying to administer machines that it's annoying that various
> configuration information is stored in files, beyond the easy reach of
> SQL.
> 
> What I was thinking is why we couldn't just store the information in a
> global shared system table that is only read on config reload. You
> could have a few columns, maybe the first being a list name, which is
> referenced from a GUC.
> 
> If you issue a config reload during a VACUUM FULL, I guess that might
> be an issue, yes. I was just thinking people might appreciate being
> able to configure the timezones without opening a shell... Similarly,
> it would also give a way for user-interfaces to get a list of available
> valid timezones and their actual meanings, which is currently
> impossible.

ISTM that's an issue that affects all configuration stuff, not just the
timezones; if we're going to come up with a way to manage settings
without touching a file, it should work for everything.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby

On Tue, Jun 13, 2006 at 05:05:31PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Jim C. Nasby wrote:
> > >> Excellent. Did I miss discussion of that or have you not mentioned it?
> > >> I'm curious as to how you're fixing it...
> > 
> > > The patches are in the hold queue:
> > >   http://momjian.postgresql.org/cgi-bin/pgpatches_hold
> > 
> > That's talking about the stats code, which has approximately zip to do
> > with setproctitle (ps_status.c).  But IIRC that patch is on hold because
> 
> I thought the bug reporter was asking about the stats code was well.
 
It did get brought up...

> > As far as Kris' complaint goes, one thing that might be interesting is
> > to delay both the setproctitle call and the stats command message send
> > until the current command has been running a little while (say 100ms
> > or so).  The main objection I see to this is that it replaces a kernel
> > call that actually does some work with a kernel call to start a timer,
> > plus possibly a later kernel call to actually do the work.  Not clear
> > that there's a win there.  (If you're using statement_timeout it might
> > not matter, but if you aren't...)
> > 
> > Also not clear how you make the necessary actions happen when the timer
> > expires --- I seriously doubt it'd be safe to try to do either action
> > directly in a signal handler.
> 
> Right.  What if the postmaster signals the backend once a second to do
> their reporting.  I am not sure what the final solution will be, but we
> _need_ one based on the performance numbers I and others have seen. 
> Could we have PGPROC have a reporting boolean that is set every second
> and somehow checked by each backend?

One second might be a bit more delay than some folks want... it would be
nice if this was tuneable. Also, what would the overhead on this look
like if there's a large number of idle backends?

It does sound more appealing than setting a timer every time you start a
transaction, though...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Martijn van Oosterhout

On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Any particular reason this can't be a normal table in pg_catalog which
> > you can select/update.
> 
> That doesn't do anything to help with one of the main problems: that
> we have at least two (maybe more) alternative sets of names that people
> might want as default.

I think my actual point was something else. We currently get calls from
people trying to administer machines that it's annoying that various
configuration information is stored in files, beyond the easy reach of
SQL.

What I was thinking is why we couldn't just store the information in a
global shared system table that is only read on config reload. You
could have a few columns, maybe the first being a list name, which is
referenced from a GUC.

If you issue a config reload during a VACUUM FULL, I guess that might
be an issue, yes. I was just thinking people might appreciate being
able to configure the timezones without opening a shell... Similarly,
it would also give a way for user-interfaces to get a list of available
valid timezones and their actual meanings, which is currently
impossible.

Just a thought really...
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Bruce Momjian


pg_dump CSV TODO item removed until we come up with something everyone
can agree on.

---

Joshua D. Drake wrote:
> > 
> > Would that be adequate, or do we really want to reimplement and maintain 
> > all
> > the output format complexity in our own code, in C?
> 
> I think the point is that we should provide a native implementation 
> because not everyone is crazy enough to use perl (blatant jab ;)). I 
> would never expect a customer to write a perl or python script just to 
> get their data in what is widely considered a standard business format 
> that can be imported by their userland application.
> 
> The people on the hackers list, are NOT the target for this feature. The 
> people on general, admin and novice are.
> 
> Sincerely,
> 
> Joshua D. Drake
> 
> 
> 
> 
> > 
> > Cheers,
> >   Steve
> > 
> > ---(end of broadcast)---
> > TIP 4: Have you searched our list archives?
> > 
> >   http://archives.postgresql.org
> > 
> 
> 
> -- 
> 
>  === The PostgreSQL Company: Command Prompt, Inc. ===
>Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
>Providing the most comprehensive  PostgreSQL solutions since 1997
>   http://www.commandprompt.com/
> 
> 
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Jim C. Nasby wrote:
> >> Excellent. Did I miss discussion of that or have you not mentioned it?
> >> I'm curious as to how you're fixing it...
> 
> > The patches are in the hold queue:
> > http://momjian.postgresql.org/cgi-bin/pgpatches_hold
> 
> That's talking about the stats code, which has approximately zip to do
> with setproctitle (ps_status.c).  But IIRC that patch is on hold because

I thought the bug reporter was asking about the stats code was well.

> nobody particularly liked the approach it's taking.  I think we should
> investigate rewriting the stats communication architecture entirely ---
> in particular, do we really need the stats buffer process at all?  It'd
> be interesting to see what happens if we just make the collector process
> read the UDP socket directly.  Or alternatively drop the UDP socket in

Agreed, that's what I would prefer, and tested something like that, but
even pulling the packet into the buffer and throwing them away had
significant overhead, so I think the timeout trick has to be employed as
well as going to a single process.

> favor of having the backends write directly to the collector process'
> input pipe (not sure if this would port to Windows though).
> 
> As far as Kris' complaint goes, one thing that might be interesting is
> to delay both the setproctitle call and the stats command message send
> until the current command has been running a little while (say 100ms
> or so).  The main objection I see to this is that it replaces a kernel
> call that actually does some work with a kernel call to start a timer,
> plus possibly a later kernel call to actually do the work.  Not clear
> that there's a win there.  (If you're using statement_timeout it might
> not matter, but if you aren't...)
> 
> Also not clear how you make the necessary actions happen when the timer
> expires --- I seriously doubt it'd be safe to try to do either action
> directly in a signal handler.

Right.  What if the postmaster signals the backend once a second to do
their reporting.  I am not sure what the final solution will be, but we
_need_ one based on the performance numbers I and others have seen. 
Could we have PGPROC have a reporting boolean that is set every second
and somehow checked by each backend?

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Magnus Hagander

> That's talking about the stats code, which has approximately 
> zip to do with setproctitle (ps_status.c).  But IIRC that 
> patch is on hold because nobody particularly liked the 
> approach it's taking.  I think we should investigate 
> rewriting the stats communication architecture entirely --- 
> in particular, do we really need the stats buffer process at 
> all?  It'd be interesting to see what happens if we just make 
> the collector process read the UDP socket directly.  Or 
> alternatively drop the UDP socket in favor of having the 
> backends write directly to the collector process'
> input pipe (not sure if this would port to Windows though).

(Yes,  Iremember saying I was planning to look at this. As is probably
obvious by now, I haven't had the time to do that (yet)).

As for your question, it will be a bit painful to port to windows. We
did have a lot of problems with the pgstat pipe in the initial porting
work, and I'm not convinced that there aren't some small issues still
lurking there under heavy load. The point is that the whole concept of
sharing socket descriptors doesn't really play well between processes on
Windows.

Using UDP would make that a whole lot better. Without knowing anything,
I would assume the overhead of a localhost UDP packet isn't very large
on a reasonably modern platform.


//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Martijn van Oosterhout

On Tue, Jun 13, 2006 at 04:35:24PM -0400, Tom Lane wrote:
> ...  The main objection I see to this is that it replaces a kernel
> call that actually does some work with a kernel call to start a timer,
> plus possibly a later kernel call to actually do the work.  Not clear
> that there's a win there.
 
And ofcourse it's an almost guarenteed loss on systems that don't
require a syscall to set the proc title.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Tom Lane

Bruce Momjian  writes:
> Jim C. Nasby wrote:
>> Excellent. Did I miss discussion of that or have you not mentioned it?
>> I'm curious as to how you're fixing it...

> The patches are in the hold queue:
>   http://momjian.postgresql.org/cgi-bin/pgpatches_hold

That's talking about the stats code, which has approximately zip to do
with setproctitle (ps_status.c).  But IIRC that patch is on hold because
nobody particularly liked the approach it's taking.  I think we should
investigate rewriting the stats communication architecture entirely ---
in particular, do we really need the stats buffer process at all?  It'd
be interesting to see what happens if we just make the collector process
read the UDP socket directly.  Or alternatively drop the UDP socket in
favor of having the backends write directly to the collector process'
input pipe (not sure if this would port to Windows though).

As far as Kris' complaint goes, one thing that might be interesting is
to delay both the setproctitle call and the stats command message send
until the current command has been running a little while (say 100ms
or so).  The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work.  Not clear
that there's a win there.  (If you're using statement_timeout it might
not matter, but if you aren't...)

Also not clear how you make the necessary actions happen when the timer
expires --- I seriously doubt it'd be safe to try to do either action
directly in a signal handler.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Proposal for debugging of server-side stored procedures

2006-06-13 Thread Lukas Smith


Hi,

I was just talking to Derick the author of DBGp and I realized this 
might be a topic for a joint effort among all open source RDBMS. I think 
it would be awesome if we could get a common protocol setup for stored 
procedure debugging.


regards,
Lukas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Bruce Momjian

Jim C. Nasby wrote:
> On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > FWIW, getting turning off stats_command_string substantially reduced
> > > this contention, so it appears the issue is somewhere in the stats code.
> > > This code sends stats messages to a different process via a socket (or
> > > is it UDP?), with the intention that if the system gets heavily loaded
> > > we'll lose some stats in the interest of not bogging down all the
> > > backends. It seems that doesn't work so hot on FreeBSD. :(
> > 
> > I am working on a patch for 8.2 to fix that for all platforms.
> 
> Excellent. Did I miss discussion of that or have you not mentioned it?
> I'm curious as to how you're fixing it...

The patches are in the hold queue:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Title is "Stats collector performance improvement".  I need someone to
test my patches on a non-BSD platform to move forward.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby

On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > FWIW, getting turning off stats_command_string substantially reduced
> > this contention, so it appears the issue is somewhere in the stats code.
> > This code sends stats messages to a different process via a socket (or
> > is it UDP?), with the intention that if the system gets heavily loaded
> > we'll lose some stats in the interest of not bogging down all the
> > backends. It seems that doesn't work so hot on FreeBSD. :(
> 
> I am working on a patch for 8.2 to fix that for all platforms.

Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Bruce Momjian

Jim C. Nasby wrote:
> FWIW, getting turning off stats_command_string substantially reduced
> this contention, so it appears the issue is somewhere in the stats code.
> This code sends stats messages to a different process via a socket (or
> is it UDP?), with the intention that if the system gets heavily loaded
> we'll lose some stats in the interest of not bogging down all the
> backends. It seems that doesn't work so hot on FreeBSD. :(

I am working on a patch for 8.2 to fix that for all platforms.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Simon Riggs

On Tue, 2006-06-13 at 15:00 +0200, Martijn van Oosterhout wrote:

> What you might be able to do is to reduce its effect. The thing that
> occurs to me is to make hint bit changes only mark a page half-dirty.
> If the page is evicted because the space is needed in the buffer cache,
> it can be just dropped. However, the bgwriter will write it as normal.
> Hence, setting hint bits will become a sort of background operation,
> done when there's time.

Yes, I think that's a very good idea. What that introduces is the
concept of "dirty priority" - i.e. some pages are more important to
write out quickly than others. If the bgwriter cleans the higher
priority ones first it should do a better job of keeping the bufferpool
clean. That needs some work on it before its a fully fledged proposal.

> It seems to me that if a large table is loaded in a single transaction,
> the check for if the transaction is committed should be cheap because
> it's checking the same transaction id over and over.

I was considering this the other day. My original idea was to set the
xmin to be FrozenTransaction when loading a table with COPY in the same
transaction as the one that loaded it but that didn't work because of
MVCC violation. If we just set the hint bit to show XMIN_COMMITTED then
we need not worry about MVCC violations, since the xmin will still be
higher than any pre-existing snapshots.

I think Tom had a problem with that because it relied on file truncation
on recovery, but I'm not clear on why exactly?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby

On Mon, Jun 12, 2006 at 11:38:01AM -0400, Kris Kennaway wrote:
> On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
> > On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
> > > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
> > > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> > > > > Kris Kennaway <[EMAIL PROTECTED]> writes:
> > > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > > > > >> Let's see the evidence.
> > > > > 
> > > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls 
> > > > > > per
> > > > > > DB query) are causing contention on the Giant lock 25% of the time 
> > > > > > on
> > > > > > a dual p4 + HTT.  Disabling process title setting completely gives 
> > > > > > an
> > > > > > 8% peak performance boost to the super-smack select benchmark.
> > > > > 
> > > > > I think you misunderstood me: I asked for evidence, not 
> > > > > interpretation.
> > > > > What are you measuring, and with what tool, and what are the numbers?
> > > > > On what benchmark case?  And what did you do to "disable process title
> > > > > setting completely"?
> > > > > 
> > > > > The reason I'm being doubting Thomas here is that I've never seen any
> > > > 
> > > > Ba-da-bum!
> > > > 
> > > > > indication on any other platform that ps_status is a major bottleneck.
> > > > > Now maybe FreeBSD really sucks, or maybe you're onto something of
> > > > > interest, but let's see the proof in a form that someone else can
> > > > > check and reproduce.
> > > > 
> > > > It's also important to find out what version of FreeBSD this is. A lot
> > > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
> > > > possible this isn't an issue in newer versions.
> >  
> > Can you provide the actual commands you used to setup and run the test?
> 
> I actually forget all the steps I needed to do to get super-smack
> working with postgresql since it required a lot of trial and error for
> a database newbie like me (compiling it from the
> benchmarks/super-smack port was trivial, but unlike mysql it required
> configuring the database by hand - this should hopefully be more
> obvious to someone familiar with pgsql though).
> 
> It would be great if someone on your end could make this easier, BTW -
> e.g. at least document the steps.  Also super-smack should be changed
> to allow use via a local socket with pgsql (this is the default with
> mysql) - this avoids measuring network stack overhead.
 
Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic (based
on TPC-C and TPC-H).
> > > FYI, the biggest source of contention is via semop() - it might be
> > > possible to optimize that some more in FreeBSD, I don't know.
> > 
> > Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
> > of procs in either semwait or semlock. :(
> 
> Part of that is Giant contention again; for example on 6.x semop() and
> setproctitle() both want to acquire it, so they'll fight with each
> other and with anything else on the system that wants Giant
> (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
> issue here going forward, but there is still as much lock contention
> just between semop() calls running on different CPUs.  It may be
> possible for someone to implement more fine-grained locking here, but
> I don't know if there is available interest.

FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan




Steve Atkins wrote:



Would that be adequate, or do we really want to reimplement and  
maintain all

the output format complexity in our own code, in C?




Code to produce CSVs is there, now, today, and has been since 8.0.  That 
is *not* what is at issue here. If you want to debate whether or not it 
should be in the backend you are 2 years too late. The ONLY question 
here is about whether or not to have it enabled as an option in pg_dump.


try these and see the interesting results:
 COPY pg_class TO '/tmp/pg_class.csv' CSV;
or in psql
 \copy pg_class to '/tmp/pg_class.csv' csv

As for XML which was also mentioned, you should be aware that there is a 
Google Summer of Code project to implement SQL/XML, so watch this space. 
(As for why that needs to be in the backend, see previous discussions)


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Steve Atkins



On Jun 13, 2006, at 9:47 AM, Joshua D. Drake wrote:

Would that be adequate, or do we really want to reimplement and  
maintain all

the output format complexity in our own code, in C?


I think the point is that we should provide a native implementation  
because not everyone is crazy enough to use perl (blatant jab ;)).  
I would never expect a customer to write a perl or python script  
just to get their data in what is widely considered a standard  
business format that can be imported by their userland application.


That wasn't what I was discussing, on two levels. Firstly, I wasn't  
suggesting that the end user write anything, secondly I was talking  
about the other output formats discussed (Excel, HTML...) rather than  
just CSV.




The people on the hackers list, are NOT the target for this  
feature. The people on general, admin and novice are.


I was referring to the other export formats mentioned (Excel,  
HTML...). We already support CSV export of single tables with the  
bundled software, don't we?


Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Bill Bartlett

> From: Rod Taylor [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, June 13, 2006 11:31 AM
> 
> 
> On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote:
> > Can't -- the main production database is over at a CoLo site with 
> > access only available via SSH, and tightly-restricted SSH at that. 
> > Generally one of the developers will SSH over to the 
> server, pull out 
> > whatever data is needed into a text file via psql or 
> pg_dump, scp the 
> > file(s) back here and send them to the user.
> 
> I don't get it. If you can use psql then you already have csv support.
> 
> psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv

Ah - RTF-UPDATED-M on my part.  Most of my systems are still in PG 7.4.x
databases so I tend to stick with the 7.x docs, and I didn't notice the
"WITH CSV" option added in 8.0.  That, plus temp tables, will be very
useful.

However, I also agree with the need for a new "pg_query / pg_export"
program. A program geared solely towards exporting the results of a
query would allow many of the options that are needed for the
ever-growing variety of output formats (XML, CSV, HTML, XHTML, etc.) and
details for each format without needing to clutter up pg_dump with
things that really having nothing to do with backing up and restoring
data.  It could also allow a large range of options related to getting
data out (where, order by), many of which have also been discussed for
pg_dump recently.

- Bill

> 
> > > -Original Message-
> > > From: Joshua D. Drake [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, June 12, 2006 4:15 PM
> > > To: Bill Bartlett
> > > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
> > > Subject: Re: [HACKERS] CSV mode option for pg_dump
> > > 
> > > 
> > > Bill Bartlett wrote:
> > > > Here's me speaking up -- I'd definitely use it!   As a 
> > > quick way to pull
> > > > data into Excel to do basic reports or analysis, a CSV
> > > format would be
> > > > great.
> > > 
> > > Why not just use ODBC?
> > > 
> > > Joshua D. Drake
> > > --
> > > 
> > >  === The PostgreSQL Company: Command Prompt, Inc. ===
> > >Sales/Support: +1.503.667.4564 || 24x7/Emergency:
> > > +1.800.492.2240
> > >Providing the most comprehensive  PostgreSQL solutions
> > > since 1997
> > >   http://www.commandprompt.com/
> > > 
> > > 
> -- 
> 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Joshua D. Drake



Would that be adequate, or do we really want to reimplement and maintain 
all

the output format complexity in our own code, in C?


I think the point is that we should provide a native implementation 
because not everyone is crazy enough to use perl (blatant jab ;)). I 
would never expect a customer to write a perl or python script just to 
get their data in what is widely considered a standard business format 
that can be imported by their userland application.


The people on the hackers list, are NOT the target for this feature. The 
people on general, admin and novice are.


Sincerely,

Joshua D. Drake






Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan




PFC wrote:



From what I gather, the CSV format dump would only contain data.
I think pg_dump is the friend of pg_restore. It dumps everything  
including user defined functions, types, schemas etc. CSV does not 
fit  with this.


   



This is just nonsense. There is not the slightest reason that CSV data 
cannot be embedded in a text dump nor exist as the data members of a 
custom or tar dump with the corresponding COPY statements modified 
accordingly.


Really, let's get the facts straight, please.

cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread PFC



From what I gather, the CSV format dump would only contain data.
	I think pg_dump is the friend of pg_restore. It dumps everything  
including user defined functions, types, schemas etc. CSV does not fit  
with this.


	Besides, people will probably want to dump into CSV the result of any  
query, to load it into excel, not just the full contents of a table.


	So, why not create a separate tool, someone suggested pg_query for that,  
I second it.
	This tool would take a query and format options, and would output a file  
in whatever format chosen by the user (CSV, COPY format, xml, whatever)


	A script language (python) can be used, which will significantly shorten  
development times and allow easy modularity, as it is easier to add a  
module to a python program than a C program.
	I would vote for Python because I love it and it has a very good postgres  
adapter (psycopg2) which knows how to convers every postgres type to a  
native language type (yes, even multidimensional arrays of BOX get  
converted). And it's really fast at retrieving large volumes of data.


	So you have a stable, fast tool for backup and restore (pg_dump) and a  
rapidly evolving, user-friendly and extendable tool for exporting data,  
and everyone is happy.


	Mr Momijan talks about adding modular functionality to pg_dump. Is it  
really necessary ? What is the objective ? Is it to reuse code in pg_dump  
? I guess not ; if a user wants to dump, for instance, all the tables in a  
schema, implementing this logic in python is only a few lines of code  
(select from information_schema...)


	To be realistic, output format modules should be written in script  
languages. Noone sane is eager to do string manipulation in C. Thus these  
modules would have to somehow fit with pg_dump, maybe with a pipe or  
something. This means designing another protocol. Reimplementing in a  
scripting langage the parts of pg_dump which will be reused by this  
project (mainly, enumerating tables and stuff) will be far easier.


Just look.

Python 2.4.2 (#1, Mar 30 2006, 14:34:35)
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

...opens a db connection...


c.execute( "SELECT * FROM test.csv" )
data = c.fetchall()
data
[[1, datetime.date(2006, 6, 13), 'this\tcontains\ttabulations'], [2,  
datetime.date(2006, 6, 13), "this'contains'quotes"], [3,  
datetime.date(2006, 6, 13), 'this"contains"double quotes']]

import csv, sys
c = csv.writer( sys.stdout, dialect = csv.excel )
c.writerows( data )

1,2006-06-13,this   containstabulations
2,2006-06-13,this'contains'quotes
3,2006-06-13,"this""contains""double quotes"

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Joshua D. Drake


Rod Taylor wrote:

On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote:

Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.


I don't get it. If you can use psql then you already have csv support.

psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv


If you data looks like this:

foo barbaz  bing

You are o.k. You have three columns, tab delimited.

However if you data looks like this:

foo bar baz bing

You have a problem.

foo is one column
bar and baz are a single column
bing is a single column

How does excel know that bar	baz is a single column? It doesn't because 
you told it to delimit on tabs and thus you have four columns as far as 
Excel is concerned.


An alternative although I don't know what kind of headaches it would 
cause is to have a text delimiter as well as a field delimter, e.g;


"foo" "bar   baz"   "bing"

Sincerely,

Joshua D. Drake







-Original Message-
From: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 12, 2006 4:15 PM

To: Bill Bartlett
Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
Subject: Re: [HACKERS] CSV mode option for pg_dump


Bill Bartlett wrote:
Here's me speaking up -- I'd definitely use it!   As a 

quick way to pull
data into Excel to do basic reports or analysis, a CSV 
format would be 

great.

Why not just use ODBC?

Joshua D. Drake
--

 === The PostgreSQL Company: Command Prompt, Inc. ===
   Sales/Support: +1.503.667.4564 || 24x7/Emergency: 
+1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions 
since 1997

  http://www.commandprompt.com/




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly




--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Steve Atkins



On Jun 13, 2006, at 7:34 AM, Martijn van Oosterhout wrote:


What this tells me is that we need a tool somewhere between psql and
pg_dump, say, pgquery. It's sole purpose in life is to generate output
from various queries. Because it's a seperate tool there's no question
of psql or pg_dump being able to parse them.

While you're at it, you could add modules to support many different
output styles, like CSV, XML, Excel format, HTML, etc.

This I beleive would take the load off psql to provide many different
output styles, as well as the load off pg_dump to produce
parsable-by-third-party output.

Thoughts?


Perl+DBD::Pg+CPAN does almost all of this already. Lots of support
for countless different output formats, and mostly fairly well battle- 
tested.


I suspect that a perl script to do all that would be dominated by  
commandline

option parsing, as all the hard work is in existing modules.

Would that be adequate, or do we really want to reimplement and  
maintain all

the output format complexity in our own code, in C?

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Tom Lane

Martijn van Oosterhout  writes:
> Any particular reason this can't be a normal table in pg_catalog which
> you can select/update.

That doesn't do anything to help with one of the main problems: that
we have at least two (maybe more) alternative sets of names that people
might want as default.  Getting rid of "australian_timezones" is fine,
but we can't do it by saying "all you aussies have to hack the standard
list according to your own ideas".  I don't expect that very many people
will actually need to make custom timezone name lists --- if we find
they do, we'll need to work harder on the default lists.  So the design
center should be "select one of a few predefined lists", not "hack away
on system catalog until you like it".  Especially not if they have to do
it in template0, template1, postgres, etc.  Basically, a GUC variable is
just about the right paradigm for this, a system catalog isn't.

I'd also be a bit worried about performance issues, eg, whether VACUUM
FULL on such a table would bring datetime operations to a halt.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Rod Taylor

On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote:
> Can't -- the main production database is over at a CoLo site with access
> only available via SSH, and tightly-restricted SSH at that. Generally
> one of the developers will SSH over to the server, pull out whatever
> data is needed into a text file via psql or pg_dump, scp the file(s)
> back here and send them to the user.

I don't get it. If you can use psql then you already have csv support.

psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv

> > -Original Message-
> > From: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
> > Sent: Monday, June 12, 2006 4:15 PM
> > To: Bill Bartlett
> > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
> > Subject: Re: [HACKERS] CSV mode option for pg_dump
> > 
> > 
> > Bill Bartlett wrote:
> > > Here's me speaking up -- I'd definitely use it!   As a 
> > quick way to pull
> > > data into Excel to do basic reports or analysis, a CSV 
> > format would be 
> > > great.
> > 
> > Why not just use ODBC?
> > 
> > Joshua D. Drake
> > -- 
> > 
> >  === The PostgreSQL Company: Command Prompt, Inc. ===
> >Sales/Support: +1.503.667.4564 || 24x7/Emergency: 
> > +1.800.492.2240
> >Providing the most comprehensive  PostgreSQL solutions 
> > since 1997
> >   http://www.commandprompt.com/
> > 
> > 
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 
-- 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan




Martijn van Oosterhout wrote:



Side note: In my experience Excel happily slurps up tab delimited
output, so I'm not sure why all of this is an issue in the first place.

 



I guess you experience doesn't run to data that has embedded tabs, for 
example.


There really is a reason we did this in the first place, and it wasn't 
for fun.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-13 Thread Simon Riggs

On Tue, 2006-06-13 at 10:02 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Elsewhere, it has been discussed that we might hold the number of blocks
> > in a relation in shared memory. Does that idea now fall down, or is it
> > complementary to this?
> 
> It's been the case for some time that the planner uses
> RelationGetNumberOfBlocks() to determine true rel size.  The only reason
> relpages is still stored at all is that it's used to approximate true
> number of tuples via
>   true_ntuples = (reltuples/relpages) * true_npages
> ie, assuming that the tuple density is still what it was at the last
> VACUUM or ANALYZE.  So you can't fool the system with a totally made-up
> relation size anyway.  (This too is moderately annoying for planner
> testing, but it seems the only way to get the planner to react when a
> table's been filled without an immediate vacuum/analyze.)
> 
> The only point of tracking rel size in shared memory would be to avoid
> the costs of lseek() kernel calls in RelationGetNumberOfBlocks.

Yes, understood. With the second point to allow them to be separately
set for PGSQL developer testing of optimizer, and application dev
testing of SQL and/or what/if scenarios.

> >> The main thing we are trying to accomplish here is to decouple
> >> transactional and nontransactional updates to a pg_class row.
> 
> > With the goal being avoiding table bloat??
> 
> No, with the goal being correctness.  If you have a freeze/unfreeze
> mechanism then unfreezing a relation is an action that must NOT be
> rolled back if your transaction (or any other one for that matter) later
> aborts.  The tuples you put into it meanwhile need to be vacuumed anyway.
> So you can't mark it unfrozen in an uncommitted pg_class entry that
> might never become committed.
> 
> > For me, freezing is last step before writing to WORM media, so there is
> > never an unfreeze step.
> 
> That is not what Alvaro is after.  Nor anyone else here. 

So what is unfreeze for again?

> I have not
> heard anyone mention WORM media for Postgres in *years*.

Oh? Big requirements for archive these days, much more so than before.
This will allow years of data in a seamless on-line/near-line
partitioned table set. Lots of people want that: .gov, .mil, .com

More modern equivalent: a MAID archive system for WORO data

> It strikes me though that automatic UNFREEZE isn't necessarily the
> requirement.  What if VACUUM FREEZE causes the table to become
> effectively read-only, and you need an explicit UNFREEZE command to
> put it back into a read-write state?  Then UNFREEZE could be a
> transactional operation, and most of these issues go away.  

That works for me. Very much preferred.

> The case
> where this doesn't work conveniently is copying a frozen database
> (viz template0), but maybe biting the bullet and finding a way to do
> prep work in a freshly made database is the answer for that.  We've
> certainly seen plenty of other possible uses for post-CREATE processing
> in a new database.
> 
> Another reason for not doing unfreeze automatically is that as the patch
> stands, any database user can force unfreezing of any table, whether he
> has any access rights on it or not (because the LockTable will happen
> before we check access rights, I believe).  This is probably Not Good.
> Ideally I think FREEZE/UNFREEZE would be owner-permission-required.

Seems like a plan.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan




Volkan YAZICI wrote:


On Jun 13 10:20, Bruce Momjian wrote:
 


Good point.  The number of CSV options would be hard to support for
pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
just support the default behavior?
   



IMHO, it might be better if we'd support a syntax like

 pg_dump --csv=opt0,para0:opt2,opt3

This can save us from the pg_dump parameter pollution a little bit.

Furthermore, I think CSV format for the dump files can be maintained
better under an external project. (pgFoundry?) By this way, main
developers will be able to cope with their own core problems while
other users/developers can contribute on the CSV code easily. And if
any user will ever want to get CSV functionality in the pg_dump,
he/she will just issue a --csv parameter (with the above syntax) and
pg_dump will make a suitable dlopen() call for the related (CSV)
module. Anyway, this is just an idea for modularity; but the main
thing I try to underline is to give pg_dump a module functionality for
similar problems.

 



There are some problems with this, though:

. FORCE QUOTE is table specific, and COPY will barf if you name a column 
that isn't on the table. Providing for this option would involve lots 
more code in pg_dump, as we'd have to filter the list according to the 
column names in each table.


. specifying arbitrary chars for quote, escape and delimiter could be 
tricky from the command line, especially if you want to specify a tab 
delimiter or backslash escape.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Tom Lane

Joachim Wieland <[EMAIL PROTECTED]> writes:
> The timezone definition files should be read at server start but should they
> also be read at SIGHUP? If so, should they be read only by the postmaster or
> by all backends?

Presumably the name of the definition file to use will be a GUC
variable.  I would expect the code to re-read the file any time the
variable's value is changed.  In the case of a change via postgresql.conf
this would automatically happen in all backends as well as the
postmaster.  You'll need to make it follow the semantics already in use
for errors in postgresql.conf, viz:
  * error detected during postmaster startup -> report error and quit
  * error detected during postmaster reload -> log message, ignore new setting
  * error detected during backend reload -> debug message, ignore new setting

As far as the appropriate contents of the files go, I'd suggest *not*
trying to account for every abbreviation mentioned in the zic database;
lots of them are surely uninteresting, and anyone who does want Nome Time
will now be able to add it for himself.  The more abbreviations you try
to understand, the less chance you have of detecting plain old errors.

IIRC, the conflicts we've actually heard about in practice are IST
(Israel vs India) and Aussie vs. USA zone names.  So it might work to
have two base definition files, one for Europe/Americas (with USA and
Israel names) and one for Far East (with Aussie and Indian names).

I am not sure where Tom Lockhart got the list of timezone names that's
currently hardwired in datetime.c, but for sure you needn't treat it as
being graven on stone tablets.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Joshua D. Drake


Bruce Momjian wrote:

Good point.  The number of CSV options would be hard to support for
pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
just support the default behavior?


Although I don't see a real need for the feature, I do think that if we 
were to support "1" (well two if you include the already tab delimited) 
csv output it would be a large amount of bloat.


Perhaps we could pick "1" output, say comma delimted with quoted fields?

"foo","bar   ","baz"

Joshua D. Drake



---

Tom Lane wrote:

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options.  If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one.  Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.

Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful.  But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.

My opinions would be less negative if I thought that CSV were a
well-defined format that would never change.  I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format.  I think we'll just live to regret it if we do.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match






--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Tom Lane

"Luke Lonergan" <[EMAIL PROTECTED]> writes:
> What you are seeing is the now infamous "Postgres writes a table one more
> time after loading" behavior.

> Tom - can you explain what's going on with this?

Setting XMIN_COMMITTED hint bits.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan




Tom Lane wrote:


there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one.  
 



This is a bit unfair. The feature was introduced in 8.0, and slightly 
enhanced in 8.1. There have not been any additional CSV features this 
release cycle unless my memory is worse than I thought, and I at least 
have said previously that I will be resistant to the addition of further 
CSV options.


My thoughts regarding options for pg_dump was actually to provide a much 
smaller set than the full set available with COPY, specifically to 
provide for using a single rather than a double quote char, and optional 
header lines - no alternate escape or delimiter, and no FORCE QUOTE  
(FORCE NOT NULL isn't relevant as it is only useful for non-postgres 
derived data). At least that would be a reasonable starting point, and 
would I believe cater for the vast majority of uses.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Volkan YAZICI

On Jun 13 10:20, Bruce Momjian wrote:
> 
> Good point.  The number of CSV options would be hard to support for
> pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
> just support the default behavior?

IMHO, it might be better if we'd support a syntax like

  pg_dump --csv=opt0,para0:opt2,opt3

This can save us from the pg_dump parameter pollution a little bit.

Furthermore, I think CSV format for the dump files can be maintained
better under an external project. (pgFoundry?) By this way, main
developers will be able to cope with their own core problems while
other users/developers can contribute on the CSV code easily. And if
any user will ever want to get CSV functionality in the pg_dump,
he/she will just issue a --csv parameter (with the above syntax) and
pg_dump will make a suitable dlopen() call for the related (CSV)
module. Anyway, this is just an idea for modularity; but the main
thing I try to underline is to give pg_dump a module functionality for
similar problems.

Regards.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_get_INDEXdef - opclass

2006-06-13 Thread Dave Page

 

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: 13 June 2006 14:42
> To: Dave Page
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] pg_get_INDEXdef - opclass 
> 
> "Dave Page"  writes:
> > Right, but how can I conditionally join with pg_opclass 
> based on whether
> > or not the opclass specified for the column is the default for that
> > type?
> 
> Check pg_opclass.opcdefault and compare pg_opclass.opcintype to the
> index column's datatype (which you'd get from its pg_attribute row
> ... whether it's an expression is irrelevant).

Ahh right - thanks. I got it into my head that there was no pg_attribute
row for an expression. Must be the sun...

Regards, Dave.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Martijn van Oosterhout

On Tue, Jun 13, 2006 at 10:20:53AM -0400, Bruce Momjian wrote:
> 
> Good point.  The number of CSV options would be hard to support for
> pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
> just support the default behavior?

What this tells me is that we need a tool somewhere between psql and
pg_dump, say, pgquery. It's sole purpose in life is to generate output
from various queries. Because it's a seperate tool there's no question
of psql or pg_dump being able to parse them.

While you're at it, you could add modules to support many different
output styles, like CSV, XML, Excel format, HTML, etc.

This I beleive would take the load off psql to provide many different
output styles, as well as the load off pg_dump to produce
parsable-by-third-party output.

Thoughts?

Side note: In my experience Excel happily slurps up tab delimited
output, so I'm not sure why all of this is an issue in the first place.

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Bruce Momjian


Good point.  The number of CSV options would be hard to support for
pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
just support the default behavior?

---

Tom Lane wrote:
> "Andrew Dunstan" <[EMAIL PROTECTED]> writes:
> > I wish I could understand why people are so keen to make other people turn
> > handsprings in order to avoid a feature which, as Bruce points out, is
> > already on the TODO list, and which, by my 10 minute analysis, would involve
> > almost trivial code impact and risk. If this involved major impact I might
> > understand, but it really doesn't.
> 
> Supporting all of the CSV options in pg_dump would involve major bloat
> in its option set, and it already has far too many options.  If it were
> just a matter of adding a "--csv" switch I wouldn't be complaining, but
> there are half a dozen more sub-options, and it seems like every time we
> turn around someone is finding a reason for another one.  Propagating
> all that cruft through pg_dump would be a PITA, and updating it to track
> future additions would be too.
> 
> Furthermore, the entire rationale for the feature is predicated on the
> claim that programs other than pg_restore might find it useful.  But
> this conveniently ignores the fact that if there are any such programs
> in existence, what this will really do is BREAK them, because they won't
> be able to cope with all the variants that pass for CSV.
> 
> My opinions would be less negative if I thought that CSV were a
> well-defined format that would never change.  I don't believe that it
> has either property, however, and so I'm against letting it get into our
> dump file format.  I think we'll just live to regret it if we do.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Tom Lane

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:
> I wish I could understand why people are so keen to make other people turn
> handsprings in order to avoid a feature which, as Bruce points out, is
> already on the TODO list, and which, by my 10 minute analysis, would involve
> almost trivial code impact and risk. If this involved major impact I might
> understand, but it really doesn't.

Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options.  If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one.  Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.

Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful.  But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.

My opinions would be less negative if I thought that CSV were a
well-defined format that would never change.  I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format.  I think we'll just live to regret it if we do.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-13 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> Elsewhere, it has been discussed that we might hold the number of blocks
> in a relation in shared memory. Does that idea now fall down, or is it
> complementary to this?

It's been the case for some time that the planner uses
RelationGetNumberOfBlocks() to determine true rel size.  The only reason
relpages is still stored at all is that it's used to approximate true
number of tuples via
true_ntuples = (reltuples/relpages) * true_npages
ie, assuming that the tuple density is still what it was at the last
VACUUM or ANALYZE.  So you can't fool the system with a totally made-up
relation size anyway.  (This too is moderately annoying for planner
testing, but it seems the only way to get the planner to react when a
table's been filled without an immediate vacuum/analyze.)

The only point of tracking rel size in shared memory would be to avoid
the costs of lseek() kernel calls in RelationGetNumberOfBlocks.

>> The main thing we are trying to accomplish here is to decouple
>> transactional and nontransactional updates to a pg_class row.

> With the goal being avoiding table bloat??

No, with the goal being correctness.  If you have a freeze/unfreeze
mechanism then unfreezing a relation is an action that must NOT be
rolled back if your transaction (or any other one for that matter) later
aborts.  The tuples you put into it meanwhile need to be vacuumed anyway.
So you can't mark it unfrozen in an uncommitted pg_class entry that
might never become committed.

> For me, freezing is last step before writing to WORM media, so there is
> never an unfreeze step.

That is not what Alvaro is after.  Nor anyone else here.  I have not
heard anyone mention WORM media for Postgres in *years*.

It strikes me though that automatic UNFREEZE isn't necessarily the
requirement.  What if VACUUM FREEZE causes the table to become
effectively read-only, and you need an explicit UNFREEZE command to
put it back into a read-write state?  Then UNFREEZE could be a
transactional operation, and most of these issues go away.  The case
where this doesn't work conveniently is copying a frozen database
(viz template0), but maybe biting the bullet and finding a way to do
prep work in a freshly made database is the answer for that.  We've
certainly seen plenty of other possible uses for post-CREATE processing
in a new database.

Another reason for not doing unfreeze automatically is that as the patch
stands, any database user can force unfreezing of any table, whether he
has any access rights on it or not (because the LockTable will happen
before we check access rights, I believe).  This is probably Not Good.
Ideally I think FREEZE/UNFREEZE would be owner-permission-required.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] pg_get_INDEXdef - opclass

2006-06-13 Thread Tom Lane

"Dave Page"  writes:
> Right, but how can I conditionally join with pg_opclass based on whether
> or not the opclass specified for the column is the default for that
> type?

Check pg_opclass.opcdefault and compare pg_opclass.opcintype to the
index column's datatype (which you'd get from its pg_attribute row
... whether it's an expression is irrelevant).

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Luke Lonergan

Martin,

On 6/13/06 6:00 AM, "Martijn van Oosterhout"  wrote:

> What you might be able to do is to reduce its effect. The thing that
> occurs to me is to make hint bit changes only mark a page half-dirty.
> If the page is evicted because the space is needed in the buffer cache,
> it can be just dropped. However, the bgwriter will write it as normal.
> Hence, setting hint bits will become a sort of background operation,
> done when there's time.
> 
> It seems to me that if a large table is loaded in a single transaction,
> the check for if the transaction is committed should be cheap because
> it's checking the same transaction id over and over.

Thanks for the tip - it's now in the mill for mulling over and finding
someone with the time to work something in...

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Bruce Momjian

Andrew Dunstan wrote:
> Matthew T. OConnor said:
> > Bill Bartlett wrote:
> >> Can't -- the main production database is over at a CoLo site with
> >> access only available via SSH, and tightly-restricted SSH at that.
> >> Generally one of the developers will SSH over to the server, pull out
> >> whatever data is needed into a text file via psql or pg_dump, scp the
> >> file(s) back here and send them to the user.
> >
> > ODBC over an SSH tunnnel?
> 
> 
> I wish I could understand why people are so keen to make other people turn
> handsprings in order to avoid a feature which, as Bruce points out, is
> already on the TODO list, and which, by my 10 minute analysis, would involve
> almost trivial code impact and risk. If this involved major impact I might
> understand, but it really doesn't.
> 
> I know many people work in a Postgres only world. I wish everybody did, and
> then we could just forget about things like CSV. They don't, so we can't.
> 
> I think I have said this before, but I'll say it again. From time to time
> people thank me for things I have done for Postgres. The two things that
> stand out BY FAR on the list of these are CSV import/export and dollar
> quoting. This is a widely used feature.

I think the bottom line is that "ease of use" isn't as high enough on
the project's priority list as you (and others) think it should be.

I personally feel as you do that we should value "ease of use" more.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Andrew Dunstan

Matthew T. OConnor said:
> Bill Bartlett wrote:
>> Can't -- the main production database is over at a CoLo site with
>> access only available via SSH, and tightly-restricted SSH at that.
>> Generally one of the developers will SSH over to the server, pull out
>> whatever data is needed into a text file via psql or pg_dump, scp the
>> file(s) back here and send them to the user.
>
> ODBC over an SSH tunnnel?

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

I know many people work in a Postgres only world. I wish everybody did, and
then we could just forget about things like CSV. They don't, so we can't.

I think I have said this before, but I'll say it again. From time to time
people thank me for things I have done for Postgres. The two things that
stand out BY FAR on the list of these are CSV import/export and dollar
quoting. This is a widely used feature.

cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Martijn van Oosterhout

On Tue, Jun 13, 2006 at 05:46:23AM -0700, Luke Lonergan wrote:
> Martin,
> 
> On 6/13/06 5:42 AM, "Martijn van Oosterhout"  wrote:
> 
> > Is it possible it may have something to do with the hint bits? There's
> > are a bunch of bits in the header to deal with speeding up of MVCC
> > tests. Maybe changing those bits marks the page dirty and forces a
> > write?
> 
> Yes, that's it.  Writing data twice (three times including WAL) is a Very
> Bad Thing for large data work - how should we fix it?

Well, I don't think you want to get rid of it entirely because
otherwise forevermore, every lookup in that table will require a check
to see if the transaction is committed. So at some point the hint bit
needs to be set and/or the xmin frozen (vacuum freeze does that for
example).

What you might be able to do is to reduce its effect. The thing that
occurs to me is to make hint bit changes only mark a page half-dirty.
If the page is evicted because the space is needed in the buffer cache,
it can be just dropped. However, the bgwriter will write it as normal.
Hence, setting hint bits will become a sort of background operation,
done when there's time.

It seems to me that if a large table is loaded in a single transaction,
the check for if the transaction is committed should be cheap because
it's checking the same transaction id over and over.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Martijn van Oosterhout

On Tue, Jun 13, 2006 at 02:20:09PM +0200, Joachim Wieland wrote:
> I looked into the timezone specifications and basically extracted a list of
> existing offsets from the zic database.
> 
> My proposed format for the timezone files is something like this:

Any particular reason this can't be a normal table in pg_catalog which
you can select/update.

> Another problem is that lots of the timezone names that are hardcoded into
> the backend seem to be way outdated or just doubtable, many of them do not
> show up in the zic database.

I've been trying to convince people for a while now that the
appropriate tz string for australia is AEST/ACST/AWST but no-one seems
convinced yet. Hence, I never actually specify timezones and all my
timestamps are inserted as GMT.

IMHO, you should simply setup the table so that it is backward
compatable and let people edit it themselves. You're never going to be
able to convince anyone that people arn't relying on it exactly the way
it is now. The most important thing is to get rid of the
"australian_timezones" hack, everything else is bonus.

> The timezone definition files should be read at server start but should they
> also be read at SIGHUP? If so, should they be read only by the postmaster or
> by all backends?

Good question...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Luke Lonergan

Martin,

On 6/13/06 5:42 AM, "Martijn van Oosterhout"  wrote:

> Is it possible it may have something to do with the hint bits? There's
> are a bunch of bits in the header to deal with speeding up of MVCC
> tests. Maybe changing those bits marks the page dirty and forces a
> write?

Yes, that's it.  Writing data twice (three times including WAL) is a Very
Bad Thing for large data work - how should we fix it?

- Luke

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Martijn van Oosterhout

On Tue, Jun 13, 2006 at 04:54:05AM -0700, Luke Lonergan wrote:
> > Experimental results here suggest that for larger tables Linux seems
> > to detect a seq-scan and not bother caching.   It's very reproducible
> > for me here to do a reboot and not see the full speedup on a seq_scan
> > until the third time I run a query.su
> 
> What you are seeing is the now infamous "Postgres writes a table one more
> time after loading" behavior.
> 
> Simon Riggs once dug into it to find the root cause, and I no longer recall
> exactly why, but after you've loaded data, the first seq scan will re-write
> some large portion of the data while doing the initial scan. This wreaks
> havoc on normal benchmarking practices.

Is it possible it may have something to do with the hint bits? There's
are a bunch of bits in the header to deal with speeding up of MVCC
tests. Maybe changing those bits marks the page dirty and forces a
write?

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

[HACKERS] timezones to own config file

2006-06-13 Thread Joachim Wieland

I looked into the timezone specifications and basically extracted a list of
existing offsets from the zic database.

My proposed format for the timezone files is something like this:

HADT   -32400 D  # Hawaii-Aleutain Daylight Time
 # (America/Adak)
HAST   -36000# Hawaii-Aleutain Standard Time
 # (America/Adak)

That is, the abbreviation, the offset in seconds, optionally a D to mark
daylight saving times (goes into tm->is_dst), the name of the timezone and
the full zic names that use this timezone.

I also made the extracting script find all conflicts and commented them
manually as shown here. Most of the conflicts are between America and Asia.

# CONFLICT! ADT is not unique
# Other timezones:
#  - ADT: Arabic Daylight Time (Asia)
ADT-10800 D  # Atlantic Daylight Time
 # (America/Glace_Bay)
 # (America/Goose_Bay)
 # (America/Halifax)
 # (America/Thule)
 # (Atlantic/Bermuda)

However, even within all "America/..." names, there are conflicts. For
example CST is used as US Central Time and as Cuba Central Standard Time.
While US Central time is UTC-6h, Cuba Central Standard Time is UTC-5h.

Another problem is that lots of the timezone names that are hardcoded into
the backend seem to be way outdated or just doubtable, many of them do not
show up in the zic database.

For example NT (Nome Time) seemed to have existed until 1967, America/Nome
is listed in the zic database at AKDT/AKST which is Alaska Daylight/Standard
Time. Other examples:

JAYT, Jayapura Time: Asia/Jayapura is listed as EIT (East Indonesia Time) in
the zic database.

JAVT, Java Time (07:00? see JT): zic database says that it is outdated and
was used until 1932.
JT, Java Time (07:30? see JAVT): I did not find a proof that this is really
+7.5 hours, some sources say it's just 7 hours.

HMT is the strangest of the bunch, I have found the name "Heard and
Mc.Donald Time" but with a different offset. I could not find a reference to
some "Hellas"-Time as indicated in the comment.

So could we remove some of those on the grounds that they do not seem to
be used any more (please correct me here if someone knows more) and that
you can easily add offsets for those if you need them?

With the same argument we could even remove timezones like BDST (British
Double Summer Time), DNT (Dansk Normal Tid), FST (French Summer Time), NOR
(Norway Standard Time), SWT (Swedish Winter Time). Could anybody from those
countries comment on whether or not those are still used or just outdated? I
figure that most of those countries have moved since long to the more common
timezone names...

Ok, after all this has been sorted out I propose to make different files for
the different continents and let the user specify with a guc which ones he
wants to use.

I could think of three possible ways:

1) (See Toms idea in
http://archives.postgresql.org/pgsql-hackers/2006-05/msg01048.php )

Conflicts within one set can just be commented - we would try to include
whatever will probably be used by the majority of users and comment the
other one(s). Conflicts between two sets would show up when postmaster gets
started, it would complain about different definitions for the same
timezone. An American who wants to use some Asian timezones would have to
work through both files and comment conflicting timezones on one side or the
other to make postmaster start up without errors.

2) Find out which timezones do not conflict, put them in a set and load this
by default. Create other sets that are conflicting but that have some
"override" capability with regard to previous timezone definitions. Decide
on the default value for the guc (could point to American timezones for
example). An Australian could either select only the Australian file or
could specify "America, Australia" and the Australian set overrides the
American timezones in case of conflicts. This way, most people do not have
to make changes and those who have to can specify their "override"-file and
keep all the rest, including non-conflicting timezones from a conflicting
timezone set.

3) Combine both, let the user specify the guc variable as "A, B, C" and look
into C first, then in B and then in A *thinking*  Right now I actually
think that the "overriding" idea is not that intuitive, most people would
probably expect that this is a list of priorities, so A overrides B which
overrides C.

What do you think?

Having a larger token table in datetime.c does not seem to affect
performance all that much. I did parsing tests with 2 million timestamps
equally distributed over all timezone abbreviations that I had loaded
previously and the difference of 154 timezones in comparsion to other runs
with just 35 was at about ~120ms (on my quite slow laptop computer).

The timezone definition files should be read at server start but should they
also be read at SIGHUP? If so, should they be rea

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Matthew T. O'Connor


Bill Bartlett wrote:

Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.


ODBC over an SSH tunnnel?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Luke Lonergan

Ron,

On 6/8/06 11:49 AM, "Ron Mayer" <[EMAIL PROTECTED]> wrote:

> Experimental results here suggest that for larger tables Linux seems
> to detect a seq-scan and not bother caching.   It's very reproducible
> for me here to do a reboot and not see the full speedup on a seq_scan
> until the third time I run a query.su

What you are seeing is the now infamous "Postgres writes a table one more
time after loading" behavior.

Simon Riggs once dug into it to find the root cause, and I no longer recall
exactly why, but after you've loaded data, the first seq scan will re-write
some large portion of the data while doing the initial scan. This wreaks
havoc on normal benchmarking practices.

Tom - can you explain what's going on with this?  It seems to write more
than just the contents of the WAL, so it's not a flush of the WAL writes
AFAICT.

- Luke 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Kris Kennaway

On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
> On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
> > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
> > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> > > > Kris Kennaway <[EMAIL PROTECTED]> writes:
> > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > > > >> Let's see the evidence.
> > > > 
> > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> > > > > DB query) are causing contention on the Giant lock 25% of the time on
> > > > > a dual p4 + HTT.  Disabling process title setting completely gives an
> > > > > 8% peak performance boost to the super-smack select benchmark.
> > > > 
> > > > I think you misunderstood me: I asked for evidence, not interpretation.
> > > > What are you measuring, and with what tool, and what are the numbers?
> > > > On what benchmark case?  And what did you do to "disable process title
> > > > setting completely"?
> > > > 
> > > > The reason I'm being doubting Thomas here is that I've never seen any
> > > 
> > > Ba-da-bum!
> > > 
> > > > indication on any other platform that ps_status is a major bottleneck.
> > > > Now maybe FreeBSD really sucks, or maybe you're onto something of
> > > > interest, but let's see the proof in a form that someone else can
> > > > check and reproduce.
> > > 
> > > It's also important to find out what version of FreeBSD this is. A lot
> > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
> > > possible this isn't an issue in newer versions.
>  
> Can you provide the actual commands you used to setup and run the test?

I actually forget all the steps I needed to do to get super-smack
working with postgresql since it required a lot of trial and error for
a database newbie like me (compiling it from the
benchmarks/super-smack port was trivial, but unlike mysql it required
configuring the database by hand - this should hopefully be more
obvious to someone familiar with pgsql though).

It would be great if someone on your end could make this easier, BTW -
e.g. at least document the steps.  Also super-smack should be changed
to allow use via a local socket with pgsql (this is the default with
mysql) - this avoids measuring network stack overhead.

The only thing I had to change on FreeBSD was to edit the
select-key.smack and change "localhost" to "127.0.0.1" in two
locations to avoid possibly using IPv6 transport.

> This would allow others to duplicate your results and debug the
> situation on their own. This is also important because we've generally
> found HTT to be a loss (except on Windows), so it'd be good to see what
> impact this has on AMD hardware. It would also be very useful to have
> the raw test result numbers you obtained.

They were posted previously.  This is Intel hardware (AMD doesn't do
HTT), and I didn't retest without HTT.  I'll try to do so if I have
the time (however previously when profiling mysql, HTT did give a small
positive change).

> > It's still true in 6.x and 7.x.  I have a patch that removes Giant
> > from the sysctl in question, and I have also removed it from another
> > relevant part of the kernel (semop() is bogusly acquiring Giant when
> > it is supposed to be mpsafe).
>  
> What affect did that patch have on the numbers? And where is it, in case
> anyone here wants to try it?

I didn't yet retest with the patch.  It's in my perforce branch:

  
http://perforce.freebsd.org/changeList.cgi?FSPC=//depot/user/kris/contention/...

although you probably need a combination of the changes in order for
it to be usable.

> > When it's possible to commit that patch (should be in time for 7.0,
> > but not sure if it will make it into 6.2) it will eliminate the worst
> > part of the problem, but it still leaves postgresql making thousands
> > of syscalls per second to flip its process titles back and forth,
> > which needs to be looked at carefully for a performance impact.  For
> > now, users of FreeBSD who want that 8% should turn it off though (or
> > maybe one of the alternative methods is usable).
> 
> We have a similar issue internally with stats_command_string. The issue
> is that it's very helpful to be able to see what a 'long running' (more
> than a second or so) statement is doing, but there's very little value
> in doing all that work for a statement that's only going to run for a
> few ms. If there's a very cheap way to set some kind of a timer that
> would update this information once a statement's been around long enough
> that might be a way to handle this (I don't know if we're already using
> ALRM in the backend, or if that's a cheap enough solution).

I don't know what the best way to implement it would be, but limiting
the frequency of these updates does seem to be the way to go.

> > FYI, the biggest source of contention is via semop() - it might be
> > possible to optimize that some more in FreeBSD, I don't know.
> 
> Yeah, I've see

Re: [HACKERS] longjmp in psql considered harmful

2006-06-13 Thread Martijn van Oosterhout

On Mon, Jun 12, 2006 at 08:14:01PM -0400, Tom Lane wrote:
> I had interpreted the readline documentation to mean that readline would
> discard a partially typed line upon catching SIGINT.  Experimentation
> shows that this is not so, at least not with the version of readline I
> use here.  It does catch the signal and reset some internal state, but
> the partially typed line is NOT discarded.  Grumble.

Yeah, the documentation in readline there was pretty obtuse, but since
it didn't explicitly include typed characters as state, I figured it
didn't clear the line.

> I'll work on reviewing and applying the patch.  I don't much like the
> side-effects on the /scripts directory though ... there must be a better
> way than that.  Is it sane to declare the flag variable in print.c?

The problem is basically that some of those files are symlinked across
the tree and included from various different places. Some places
include print.c but don't include the signal handler stuff, which left
we with linker errors about undefined symbols.

In psql the symbol comes from common.c and so I also added it to
scripts/common.c, which cleared up all the errors. This would allow
psql to work and all other programs that included print.c would only
ever see zero in that variable.

Maybe some other arrangement is possible. Maybe like you suggest,
declare the symbol in print.c and make the declaration in common.c an
extern. The end result is the same though.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-13 Thread Simon Riggs

On Mon, 2006-06-12 at 19:15 -0400, Tom Lane wrote:
> [ moving to -hackers to get some more eyeballs on the question ]
> 
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Sun, 2006-06-11 at 17:53 -0400, Alvaro Herrera wrote:
> >> Here I repost the patch to implement non-transactional catalogs, the
> >> first of which is pg_ntclass, intended to hold the non-transactional
> >> info about pg_class (reltuples, relpages).
> 
> > Will a user be able to update reltuples and relpages manually?
> 
> No, which is a tad annoying now that you mention it.  I'm not sure that
> there's any very good reason for users to want to do that, though.  Once
> or twice I've hacked those fields manually to set up test cases for the
> planner, which is why I'd be annoyed to lose the ability --- but does it
> really matter to users?  (Especially in view of the fact that the
> planner no longer trusts relpages anyway.)

No need to have an SQL route. A special function call would suffice.

I'd like to be able to set up a test database that has the statistics
copied from the live system. A schema only pg_dump with mods is all I
need, but it sounds like we're moving away from that. We can then
perform various what-ifs on the design.

Elsewhere, it has been discussed that we might hold the number of blocks
in a relation in shared memory. Does that idea now fall down, or is it
complementary to this? i.e. would we replace ANALYZE's relpages with an
accurate relpages for the planner?

> It does seem like rather a lot of mechanism and overhead though,
> especially in view of Alvaro's worries about the non-cacheability of
> pg_class_nt rows.  I wonder whether we shouldn't take two steps back
> and rethink.

Review, yes. Could still be the best way.

> The main thing we are trying to accomplish here is to decouple
> transactional and nontransactional updates to a pg_class row.

With the goal being avoiding table bloat??

> Is there another way to do that?  Do we need complete decoupling?

> It strikes me that the only case where we absolutely must not lose a
> nontransactional update is where we are un-freezing a frozen rel.

Not sure why you'd want to do that, assuming I've understood you.

For me, freezing is last step before writing to WORM media, so there is
never an unfreeze step.

> If we could guarantee that un-freezing happens before any transactional
> update within a particular transaction, then maybe we could have that.
> Manual updates to pg_class seem like they'd risk breaking such a
> guarantee, but maybe there's a way around that.  Personally I'd be
> willing to live with commands that try to modify a frozen rel erroring
> out if they see the current pg_class row is uncommitted.

Sounds OK. It's a major state change after all.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] pg_get_INDEXdef - opclass

2006-06-13 Thread Dave Page

 

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: 12 June 2006 18:32
> To: Dave Page
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] pg_get_INDEXdef - opclass 
> 
> "Dave Page"  writes:
> >> Following a pgAdmin bug report, I noticed that 
> pg_get_viewdef doesn't
> >> return the opclass when called for a specific column (in 8.1 
> >> at least) -
> 
> > Bah, I mean pg_get_indexdef of course :-) 
> 
> This is intentional --- whoever asked for the per-column variant of
> the indexdef function wanted it that way.  It seems reasonable to me:
> you can extract the opclass name with a simple join against
> pg_index.indclass[N], when you need it, whereas if the function
> sometimes included an opclass name that would tend to break apps that
> weren't expecting it.  OTOH, getting the expression for an expression
> column would be seriously painful if there were no function to do it.

Right, but how can I conditionally join with pg_opclass based on whether
or not the opclass specified for the column is the default for that
type? For a base type index column I can probably do that with some SQL,
but what about cases where the index column is an expression?

Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

60 matches

Mail list logo