from:"Steve Atkins"

Re: [HACKERS] QSoC proposal: date_trunc supporting intervals

2014-03-20 Thread Steve Atkins

On Mar 20, 2014, at 1:24 PM, Thom Brown t...@linux.com wrote:

On 20 March 2014 20:07, Josh Berkus j...@agliodbs.com wrote:
On 03/20/2014 09:56 AM, Alexandr wrote:
Here is the text of my proposal which I've applied to GSoC.
(and link
https://docs.google.com/document/d/1vBjQzhFT_fgoIkoEP5TVeyFA6ggsYlLq76tghGVUD6A/edit?usp=sharing)

Any suggestions and comments are welcome.
Because I don't know the code of PostgreSQL well I decide not to
participate is QSoC with previous proposal (rewrite pg_dump and
pg_restore as libraries). But I'm very interested to participate in QSoC
2014 as a part of PostgreSQL. So It's my new proposal.

Per my comments on the GSOC app, it looks good, but I'd like to see some
stretch goals if you are able to implement the new function before
GSOC is over. For example, one thing which has been frequently
requested is functions to display intervals in the unit of your choice
... for example, convert 1 day to 14400 seconds.

This is definitely something I've wanted in the past, like getting the
number of minutes between 2 timestamps without converting to seconds
since epoch then doing a subtraction.

It’d be nice, but isn’t it impossible with anything similar to the existing
interval
type (as you lose data when you convert to an interval that you can’t get back)?

Subtracting to get an interval, then converting that interval to seconds or
minutes
could give you a value that’s wildly different from the right answer.

Cheers,
Steve

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-08-10 Thread Steve Atkins


On Aug 10, 2010, at 8:23 AM, Stephen Frost wrote:

 * Peter Eisentraut (pete...@gmx.net) wrote:
 On mån, 2010-08-09 at 13:56 -0500, Kevin Grittner wrote:
 Some IP addresses have several host names, including in reverse
 lookup; how is that handled?
 
 This is not possible, or at least the C library APIs don't expose it.
 Compare the getnameinfo() and getaddrinfo() man pages, for example.
 
 Don't know how it happens at a technical level, but I've definitely seen
 it happen before..  Particularly with Windows domains where they don't
 have clean-up reverse DNS enabled.  Manifests itself by having
 different host names show up on successive requests...  Evil in any
 case.


Multiple hostnames for a given IP address are supported just fine
by the DNS. Some C library APIs support this just fine, others
(such as getnameinfo) have been simplified to make them more
pleasant to use for the common case of displaying a text representation
of an IP address in a friendly manner with simple code, at the expense
of actually returning correct data.

So getnameinfo() is not suitable for this particular usage. If an
IP address has multiple hostnames then what getnameinfo() will
return isn't well-defined (and I believe there's been a trickle of
bugs in implementations such that sometimes they won't return
any hostname if there are multiple ones configured in the DNS).

Any approach to restrict based on hostnames will either need to
just work with forward DNS resolution of hostnames configured
in pg_hba.conf to create a list of IP addresses to compare against
an incoming connection, or it'll need to use a more general
interface to get the reverse DNS of an incoming connection (e.g.
gethostbyaddr(), less elegant as that is) before checking forward
DNS.

The former approach won't work if we want to support wildcard
hostnames (accept connections from *.example.com) - and
that's the only useful functionality that adding hostname based
ACLs provides, I think. If we want to do that, we need to use
gethostbyaddr() to get all the claimed hostnames via reverse
DNS, and for each of those that matches our ACL do a
getaddrinfo() to check it resolves to the connecting IP.

This is something that's pretty common to do in the email world,
so stealing some robust code from there might be an idea.

Cheers,
  Steve


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SHOW TABLES

2010-07-16 Thread Steve Atkins


On Jul 16, 2010, at 8:11 AM, Simon Riggs wrote:

 On Fri, 2010-07-16 at 14:07 +0100, Thom Brown wrote:
 
 The problem is people are stating different requirements.
 
 - to make it easy for new users of psql
 - to simplify fetching basic database information from any client application
 - to ease transition between MySQL and PostgreSQL
 
 Close, but I didn't state any of those as you have them.
 
 I want to make it easy for newbies to get access to obvious things like
 a list of tables, from *any* interactive application, wherever they
 exist. There are many and various apps and not all of them work the
 same. (The Windows installer ships two, for example). It would be nice
 to tell people just type SHOW TABLES and have it be true 100% of the
 time. They can remember that, or at least will try it if they can't
 remember anything at all about our RDBMS.

In pretty much any GUI application the expected way to see a list
of tables is not going to involve typing anything anywhere. Either
the list of tables is going to be shown all the time (common) or
there'll be a menu or toolbar option to show them. 

There may not be anywhere obvious to type in a command, and if there is
the output of a server-side implementation of show tables would
likely be displayed like the contents of a table, rather than as
names of tables - so all the metadata is going to be off. Things
like the context menu for each row of the result having operations
for modifying the contents of a table, rather than the operations
for modifying a table. It'll offer DML operations where you'd expect,
and want, DDL in other words.

Cheers,
  Steve


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: Add setting to make '+' on \d implicit

2010-04-23 Thread Steve Atkins


On Apr 23, 2010, at 11:28 AM, Tom Lane wrote:

 Ross J. Reedstrom reeds...@rice.edu writes:
 On Fri, Apr 23, 2010 at 10:58:40AM -0500, Terry Brown wrote:
 So the proposal would be:
 
 \d+ does as it has always done, no change
 \d- (new) always behaves like 'old' \d
 \d  acts as 'old' \d or as \d+, depending on the setting of 
 'verbose_describe', set via \pset.
 
 Hmm, what about all the other + variants? Would this setting affect
 them? I'd suggest perhaps it should.
 
 If we were to do something like that, it would certainly have to affect
 every \d variant that has a + option.  Which is probably not a very good
 idea --- in many cases that's a very expensive/verbose option.  I can't
 get excited about this proposal, personally.
 
 What the OP actually seemed to care about was database object comments.
 I could see somebody who relied heavily on comments wanting his comments
 to be included in all display commands, even without the + option.
 Maybe a configuration variable along the lines of 'always_show_comments'
 would be a better design.

Or more generally an ability to set aliases via .psqlrc similar to \set, maybe?

\alias \d- = \d
\alias \d = \d+

Cheers,
  Steve
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: more practical view on function's source code

2010-03-22 Thread Steve Atkins


On Mar 22, 2010, at 2:23 PM, Peter Eisentraut wrote:

 On sön, 2010-03-21 at 20:40 -0400, Robert Haas wrote:
   \ef function-name line-number
 with suitable magic to get the editor to place the cursor at that line.
 I suspect this wouldn't be too hard to do with emacs --- what do you
 think about vi?
 
 Well, in vi you can just do vi +linenum filename.
 
 I think that's a pretty widely spread convention.  A quick test shows
 that all of emacs, vi, joe, and nano support this.  Of course there are
 editors that don't support it, so we'll have to distinguish that
 somehow, but it won't be too complicated to support a few of the common
 editors.

Would an environment variable be the general
purpose answer? Much the same as TEXEDIT or LESSEDIT.

export PSQLEDIT='mate -w -l %d %s'
or
export PSQLEDIT='emacs +%d %s'

Cheers,
  Steve


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores

2010-03-12 Thread Steve Atkins


On Mar 12, 2010, at 5:18 PM, Tom Lane wrote:

 Bruce Momjian br...@momjian.us writes:
 Well, I think the big question is whether we need to honor RFC 5322
 (http://www.rfc-editor.org/rfc/rfc5322.txt). Wikipedia says these are
 all valid characters:
 
http://en.wikipedia.org/wiki/E-mail_address
 
* Uppercase and lowercase English letters (a-z, A-Z)
* Digits 0 to 9
* Characters ! # $ %  ' * + - / = ? ^ _ ` { | } ~
* Character . (dot, period, full stop) provided that it is not the
  first or last character, and provided also that it does not appear two
  or more times consecutively.
 
 That's an awful lot of special characters.  For the RFC's purposes,
 it's not hard to be flexible because in an email message there is
 external context telling where to expect an address.  I think if we
 tried to allow all of those in email addresses in tsearch, we'd have
 email addresses gobbling up a whole lot of adjacent text, to nobody's
 benefit.
 
 I can see the case for adding + because that's fairly common as Alvaro
 notes, but I think we should be very circumspect about going farther.

I've been working with recognizing email addresses in text for
years, with many millions of documents processed. Recognizing
them in text is a very different problem to validating them or sanitizing
them. Using the RFC spec to match things that might be an email
address isn't a great idea in the wild, so +1 on the circumspect.

I've found that /[a-z0-9_][^\@\\s]{0,80})@/ is good at finding local parts
of real email addresses in free text in the wild, without getting being
too prone to grab things that just look vaguely like email addresses. Obviously
there are some things it'll match that aren't email addresses, and some
email addresses it won't match, but for indexing it's been really pretty
good when combined with a good regex for domain parts[1].

Cheers,
  Steve

[1] 
([a-z0-9_][^\@\\s]{0,80})@([a-z0-9._-]{0,252}\\.(?:[a-z]{2}|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro|travel|jobs|mobi|tel|cat)

(Before you point out all the ways that differs from the RFC specs for
an email address, yes, I know, but that's the point. Real world usage
is not the same as RFC spec.) [2]

[2] This is the simplified version - the full version is marginally more
selective, at the expense of being much more complex.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] build issues on Win32

2010-03-11 Thread Steve Atkins


On Mar 11, 2010, at 1:06 PM, Tom Lane wrote:
 
 
 BTW, I'm not sure I buy the argument that commercial software requires
 static linking.  Red Hat would be as interested in that market as
 anybody, and as I said, they don't think it's necessary to ship static
 libraries (with a *very* short list of exceptions).

Trying to support a binary distributed[1] application on a dozen or more
different distributions is fairly painful (I'm building on four Debians,
eight Ubuntus, several RHELs, three Solarises and a few others
right now), and the biggest problem is the different distributions
don't package all the libraries you need, and when they do they
include a wide range of versions, often woefully obsolete.

It's a serious pain. The woefully obsolete bit also means that you
have to code to the lowest common denominator - you cannot
use any feature more recent than the obsolete library that the 
oldest version of RHEL you support has, nor any feature that's
been deprecated in the latest bleeding edge Ubuntu release

(There are some APIs where an old feature has been replaced
by a new feature, and the old one deprecated and removed
over a timeframe short enough that conservative distributions
only have the old one and more bleeding edge distributions
only have the new one. libpq is stable enough and backwards
compatible enough that it's never been an issue there).

Building applications mostly statically linked avoids the vast
majority of those problems. It's not the only way to do it, and
I don't think it's ever the best way to do it as it can lead to
a bunch of other problems but it's often the easiest fix for
getting cross-platform support working right now, and I
can understand why people want to be able to do it.

Cheers,
  Steve

[1] Binary distributed is not the same as commercial, though
a lot of commercial code is distributed as binaries so it's
one place where the problem comes up. Commercial
software is also more limited in it's ability to just blame
the OS vendor and ignore the issue.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tie user processes to postmaster was:(Re: [HACKERS] scheduler in core)

2010-02-23 Thread Steve Atkins


On Feb 22, 2010, at 9:02 PM, Tom Lane wrote:

 Alvaro Herrera alvhe...@commandprompt.com writes:
 Regarding hooks or events, I think postmaster should be kept simple:
 launch at start, reset at crash recovery, kill at stop.  Salt and pepper
 allowed but that's about it -- more complex ingredients are out of the
 question due to added code to postmaster, which we want to be as robust
 as possible and thus not able to cook much of anything else.
 
 This is exactly why I think the whole proposal is a nonstarter.  It is
 necessarily pushing more complexity into the postmaster, which means
 an overall reduction in system reliability.  There are some things
 I'm willing to accept extra postmaster complexity for, but I say again
 that not one single one of the arguments made in this thread are
 convincing reasons to take that risk.

Would having a higher level process manager be adequate - one
that spawns the postmaster and a list of associated processes
(queue manager, job scheduler, random user daemons that are
used for database application maintenance). It sounds like
something like that would be able to start up and shut down
an entire family of daemons, of which the postmaster is the major
one, gracefully.

It could also be developed almost independently of core code,
at most it might benefit from a way for the postmaster to tell it
when it's started up successfully.

Cheers,
  Steve


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Listen / Notify rewrite

2009-11-12 Thread Steve Atkins


On Nov 12, 2009, at 5:57 PM, Robert Haas wrote:

 On Thu, Nov 12, 2009 at 8:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/12/09 8:30 AM, Tom Lane wrote:
 So while a payload string for NOTIFY has been on the to-do list since
 forever, I have to think that Greg's got a good point questioning
 whether it is actually a good idea.
 
 Sure, people will abuse it as a queue.  But people abuse arrays when
 they should be using child tables, use composite types to make data
 non-atomic, and use dblink when they really should be using schema.
 Does the potential for misuse mean that we should drop the features?  No.
 
 I agree.  We frequently reject features on the basis that someone
 might do something stupid with them.  It's lame and counterproductive,
 and we should stop.  The world contains infinite amounts of lameness,
 but that's the world's problem, not ours.  There is zero evidence that
 this feature is only useful for stupid purposes, and some evidence
 (namely, the opinions of esteemed community members) that it is useful
 for at least some non-stupid purposes.

Speaking as a consumer of this feature, my (mild) concern is not that by
adding functionality it will make it possible to do new things badly, it's that
it might make it harder (or impossible) to do currently supported things as 
well.

I like the current notify. It's a good match for handling table based
queues or for app-level-cache invalidation. Any changes that make
it less good at that would be a step backwards. The more complex
and flexible and heavier the changes, the more that concerns me.

(Though a small payload - I'd settle for at least an integer - would be
convenient, to allow invalidation of 20 different caches without needing
20 different LISTENs).

Cheers,
  Steve


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-04 Thread Steve Atkins



On Sep 4, 2008, at 6:29 AM, Andrew Sullivan wrote:


On Thu, Sep 04, 2008 at 01:26:44AM +0300, Hannu Krosing wrote:

So Andrews opinion was that Mb (meaning Mbit) is different from MB  
(for

megabyte) and that if someone thinks that we define shared buffers in
megabits can get confused and order wrong kind of network card ?


I know it's fun to point and laugh instead of giving an argument, but
the above is not what I said.  What I said is that there is a
technical difference between at least some of these units, and one
that is relevant in some contexts where we have good reason to believe
Postgres is used.  So it seems to me that there is at least a _prima
facie_ reason in favour of making case-based decisions.  Your argument
against that appears to be, Well, people can be sloppy.


Settings in postgresql.conf are currently case-insensitive. Except
for the units.


Alvaro's suggestion seems to me to be a better one.  It is customary,
in servers with large complicated configuration systems, for the
server to come with a tool that validates the configuration file
before you try to load it.  Postfix does this; apache does it; so does
BIND.  Heck, even NSD (which is way less configurable than BIND) does
this.  Offering such a tool provides considerable more benefit than
the questionable one of allowing people to type whatever they want
into the configuration file and suppose that the server will by magic
know what they meant.


How would such a tool cope with, for example, shared_buffers
being set to one eighth the size the DBA intended, due to their
use of Mb rather than MB? Both of which are perfectly valid
units to use to set shared buffers, even though we only support
one right now. If the answer to that is something along the lines
of we don't support megaabits for shared_buffers, and never will because
nobody in their right mind would ever intend to use megabits
to set their shared buffer size... that's a useful datapoint when
it comes to designing for usability.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: new border setting in psql

2008-08-25 Thread Steve Atkins



On Aug 24, 2008, at 6:16 AM, Merlin Moncure wrote:

On Sun, Aug 24, 2008 at 2:00 AM, D'Arcy J.M. Cain [EMAIL PROTECTED]  
wrote:

On Sat, 23 Aug 2008 14:57:50 -0400
Tom Lane [EMAIL PROTECTED] wrote:
Also, having now looked at the proposed patch, it seems clear that  
it
isn't addressing the issue of quoting/escaping at all; so I wonder  
how

this can be considered to be a safely machine-readable format.


It's not a machine readable format.  It is a simple display with more
border lines.  Just like border 2 is like border 1 with more  
border

lines.  I'm just following the progression.



Personally I think it's rather nice to be able to have some extra
flexibility in how psql prints out data.  Maybe, instead of the dry
and uninformative 'border 2', there could be a set of ouput control
options.  Maybe I want the text aligned but with no border for
example.


Or maybe you want an external filter that eats csv or xml and excretes
what you want to see on the screen.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-08-18 Thread Steve Atkins



On Aug 18, 2008, at 1:05 AM, Magnus Hagander wrote:


Josh Berkus wrote:

Steve,


First pass is done. Needs a little cleanup before sharing. I spent a
fair while down OS-specific-hardware-queries rathole, but I'm  
better now.


Gods, I hope you gave up on that.  You want to use SIGAR or  
something.


If it's going to be C++, and reasonably cross platform, and a pg tool,
why not try to build something as a module in pgAdmin? Certainly going
to get you a larger exposure... And I'm sure the pgAdmin team would be
happy to have it!


I'm attempting to build it as something that can be used in several
places. Where there's most need for it is as an install time option
in installers, particularly on Windows.

There's no reason the same underlying code couldn't also go into
pgAdmin, of course. At the moment the code is a bit Qt specific,
reducing that is part of the cleanup.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-08-17 Thread Steve Atkins



On Aug 17, 2008, at 1:48 PM, Greg Smith wrote:


On Wed, 13 Aug 2008, Michael Nacos wrote:

Hi there... Configuration autotuning is something I am really  
interested in.
I have seen this page: http://wiki.postgresql.org/wiki/ 
GUCS_Overhaul and
a couple of emails mentioning this, so I wanted to ask is someone  
already

on it? If yes, I'd like to contribute.


Good time to give a status report on what's been going on with all  
this.


With some help I just finished off an answer to problem #1 there  
recently, Most people have no idea how to set these.  There was  
some concern here that work was being done on config tools without a  
clear vision of what was going to be tuned.  See http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server 
 for an intro on how to set the 18 most important parameters (+7  
logging parameters) based on the best information I'm aware of.


Circa June, Steve Atkins was looking into writing a C++/Qt GUI  
tuning interface application, with the idea that somebody else would  
figure out the actual smarts to the tuning effort.  Don't know where  
that's at.


First pass is done. Needs a little cleanup before sharing. I spent a  
fair while down OS-specific-hardware-queries rathole, but I'm better  
now.


Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Plugin system like Firefox

2008-08-12 Thread Steve Atkins



On Aug 12, 2008, at 12:42 PM, Bruce Momjian wrote:


Markus Wanner wrote:

Hi,

Robert Haas wrote:

I can't speak for anyone else, but I much prefer packages that make
use of my operating system's package management system rather than
rolling their own.  If I need a perl package that I can't get  
through

yum, I build my own RPMs rather than installing through CPAN.


I very much agree to that (well, s/RPM/DEB/).

But AFAIK we also need to provide packages for OSes without a package
management system. Windows being the most popular such OS.

Which probably means we should provide something that can work on its
own *or* through another package management system (very much like  
CPAN

and others, again).


We don't get economies of scale without an OS-agnostic way of  
installing

packages.  I realize many prefer their OS-native packaging system, but
that isn't the target audience of a packaging system that will  
increase

adoption.


If the OS-agnostic version is designed appropriately then turning
it (mechanically) into an OS-specific one should be possible.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-12 Thread Steve Atkins



On Jun 12, 2008, at 11:21 AM, Bruce Momjian wrote:


Josh Berkus wrote:

Bruce,

I am concerned that each wizard is going to have to duplicate the  
same

logic each time, and adjust to release-based changes.


I think that's a feature, not a bug.  Right now, I'm not at all  
convinced that
my algorithms for setting the various major dials are great (I just  
think
that nobody yet has better).  So I think we should *encourage*  
people to

write their own wizards until we find one that works reasonably well.


I am thinking a web-based wizard would make the most sense.


There's a definite need for an interactive GUI wizard (bundle with the  
Windows and OS X installers, at least).


And a commandline wizard would certainly be nice, both interactive and  
non-interactive. Mostly for including in install scripts on unix  
platforms.


And a web-based wizard would be useful too.

And all of them would benefit from being able to both modify an  
existing configuration file, and to generate one from scratch.


It looks like it's going to be reasonably easy to abstract away the  
interface to the user such that the first two (and likely the third)  
can be built from the same codebase, meaning that the smarts about how  
to set the various GUC settings (based on RAM available, estimates of  
database size and usage) can be maintained in one place.


Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-06 Thread Steve Atkins



On Jun 6, 2008, at 12:22 PM, Greg Smith wrote:


On Fri, 6 Jun 2008, Peter Eisentraut wrote:

- What settings do newbies (or anyone else) typically need to  
change?

Please post a list.
- What values would you set those settings to?  Please provide a  
description
for arriving at a value, which can later be transformed into code.   
Note that
in some cases, not even the documentation provides more than  
handwaving help.


Josh's spreadsheet at http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc 
 provides five different models for setting the most critical  
parameters based on different types of workloads.  Everyone can  
quibble over the fine tuning, but having a good starter set of  
reasonable settings for these parameters is a solved problem.



 It's just painful to build a tool to apply the available expert  
knowledge that is already around.


I hope to have something in a week or so that's a first cut at that  
tool. I'm aiming at a GUI tool at first, as I think that's the main  
need, though once the basics are done an interactive or non- 
interactive CLI version shouldn't be a big deal.


Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Steve Atkins



On Jun 4, 2008, at 1:57 PM, Tom Lane wrote:


* Can we build a configuration wizard to tell newbies what settings
they need to tweak?


Probably. Given the demographics of a lot of the newbies is
Windows this likely needs to be a pointy-clicky sort of thing
if it's going to be widely useful.

Doing it in a smart way is likely hard, but even a very simple,
crude one would likely be helpful. Doing it as a standalone
GUI editor / Wizard wouldn't be too hard, but it sounds more
like something that should be integrated in pgadmin, perhaps?

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Steve Atkins



On Jun 4, 2008, at 5:23 PM, Tom Lane wrote:


Andrew Dunstan [EMAIL PROTECTED] writes:

Tom Lane wrote:
* Can we build a configuration wizard to tell newbies what  
settings

they need to tweak?


That would trump all the other suggestions conclusively. Anyone  
good at

expert systems?


How far could we get with the answers to just three questions:

* How many concurrent queries do you expect to have?

* How much RAM space are you willing to let Postgres use?

* How much overhead disk space are you willing to let Postgres use?

concurrent queries drives max_connections, obviously, and RAM space
would drive shared_buffers and effective_cache_size, and both of them
would be needed to size work_mem.  The third one is a bit weird but
I don't see any other good way to set the checkpoint parameters.

If those aren't enough questions, what else must we ask?  Or maybe  
they

aren't the right questions at all --- maybe we should ask is this a
dedicated machine or not and try to extrapolate everything else from
what we (hopefully) can find out about the hardware.


I'd be interested in putting together a framework+GUI client to do this
cleanly in a cross-platform (Windows, Linux, Solaris, OS X as a bare
minimum) sort of way, if no-one else already has such a thing.

A framework doesn't get you all the way there, but it makes it a
whole lot easier to work on what base data and information you
need, and how easy it is to map pgsql-performance and #postgresql
gut feel onto something more algorithmic.

Cheers,
  Steve

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-04 Thread Steve Atkins



On Jun 4, 2008, at 6:28 PM, Greg Smith wrote:



Josh Berkus pointed out that he already had the expert system part  
of this problem solved pretty well with a spreadsheet:


http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc  
(that's in the OpenOffice Calc format if you don't know the extension)


On Jun 4, 2008, at 6:20 PM, Steve Atkins wrote:
I'd be interested in putting together a framework+GUI client to do  
this

cleanly in a cross-platform (Windows, Linux, Solaris, OS X as a bare
minimum) sort of way, if no-one else already has such a thing.


/me makes go together motions, if nobody has any objection

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Steve Atkins



On May 29, 2008, at 9:12 AM, David Fetter wrote:


On Thu, May 29, 2008 at 11:58:31AM -0400, Bruce Momjian wrote:

Josh Berkus wrote:

Publishing the XIDs back to the master is one possibility.  We
also looked at using spillover segments for vacuumed rows, but
that seemed even less viable.

I'm also thinking, for *async replication*, that we could simply
halt replication on the slave whenever a transaction passes minxid
on the master.  However, the main focus will be on synchrounous
hot standby.


Another idea I discussed with Tom is having the slave _delay_
applying WAL files until all slave snapshots are ready.


Either one of these would be great, but something that involves
machines that stay useless most of the time is just not going to work.


I have customers who are thinking about warm standby functionality, and
the only thing stopping them deploying it is complexity and maintenance,
not the cost of the HA hardware. If trivial-to-deploy replication that  
didn't
offer read-only access of the slaves were available today I'd bet that  
most

of them would be using it.

Read-only slaves would certainly be nice, but (for me) it's making it  
trivial to

deploy and maintain that's more interesting.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] triggers on prepare, commit, rollback... ?

2008-05-20 Thread Steve Atkins



On May 19, 2008, at 6:53 PM, Tom Lane wrote:


Another response I've heard is but I don't want to make
inside-the-database changes, I want to propagate the state to  
someplace

external.  Of course that's completely broken too, because there is
*absolutely no way* you will ever make such changes atomic with the
inside-the-database transaction commit.  We discourage people from
making triggers cause outside-the-database side effects already ---
it's not going to be better to do it in an on commit trigger.


Isn't this close to what NOTIFY is? An on-commit trigger that
causes only outside-the-database effects.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL and USER_CERT_FILE

2008-05-15 Thread Steve Atkins



On May 15, 2008, at 6:31 AM, [EMAIL PROTECTED] wrote:


Mark Woodward wrote:
I am using PostgreSQL's SSL support and the conventions for the  
key and
certifications don't make sense from the client perspective.  
Especially

under Windows.

I am proposing a few simple changes:

Adding two API
void PQsetSSLUserCertFileName(char *filename)
{
   user_crt_filename = strdup(filename);
}
PQsetSSLUserKeyFileName(char *filename)
{
   user_key_filename = strdup(filename);
}




[snip]

Any comments?





I think it would probably be much better to allow for some  
environment

variables to specify the locations of the client certificate and key
(and the CA cert and CRL) - c.f. PGPASSFILE.

That way not only could these be set by C programs but by any libpq  
user
(I'm sure driver writers who use libpq don't want to have to bother  
with

this stuff.) And we wouldn't need to change the API at all.



The problem I have with environment variables is that they tend not  
to be
application specific and almost always lead to configuration issues.  
As a

methodology for default configuration, it adds flexibility. Also, the
current configuration does not easily take in to consideration the  
idea

that different databases with different keys can be used from the same
system the same user.


Environment variables don't have to be set in your shell.

This would seem to give the same functionality you suggest above,
given support for environment variables:

void PQsetSSLUserCertFileName(char * filename)
{
  setenv(PGCERTFILE, filename);
}

void PQsetSSLUserKeyFileName(char *filename)
{
  setenv(PGKEYFILE, filename);
}

Or, in perl, $ENV{PGKEYFILE} = $file and so on. It seems
less intrusive than adding new API calls.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Standard metadata queries

2008-04-24 Thread Steve Atkins

I've been chatting with the Trolltech folks about the implementation  
of the Qt wrapper around libpq, and the issue of how to properly do  
metadata queries came up. That is things like What are the column  
names and types of the primary key of this table, and what index  
enforces it? or What are the names and types of each field of this  
table?.


These seem like queries that'll be used by a lot of people, hidden  
down in ORMs and access libraries, and which are hard to get right,  
let alone efficient, and which will generally be written by one person  
(developing the ORM or library) and likely not touched again.


Is there a standard set of well-crafted implementations of these  
anywhere that could be used by all the interface and ORM developers?  
If not, would it make sense to put some together and document or wiki  
them? Both as example code and as a set of good, solid queries that  
library developers can cut and paste.


(The implementation I'm looking at right now has, amongst other  
things, hardwired OID-to-type mappings, and there's got to be a  
cleaner way than that).


Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standard metadata queries

2008-04-24 Thread Steve Atkins



On Apr 24, 2008, at 11:12 AM, Joshua D. Drake wrote:


On Thu, 24 Apr 2008 11:01:13 -0700
Steve Atkins [EMAIL PROTECTED] wrote:


I've been chatting with the Trolltech folks about the implementation
of the Qt wrapper around libpq, and the issue of how to properly do
metadata queries came up. That is things like What are the column
names and types of the primary key of this table, and what index
enforces it? or What are the names and types of each field of this
table?.

These seem like queries that'll be used by a lot of people, hidden
down in ORMs and access libraries, and which are hard to get right,
let alone efficient, and which will generally be written by one
person (developing the ORM or library) and likely not touched again.

Is there a standard set of well-crafted implementations of these
anywhere that could be used by all the interface and ORM developers?
If not, would it make sense to put some together and document or
wiki them? Both as example code and as a set of good, solid queries
that library developers can cut and paste.

(The implementation I'm looking at right now has, amongst other
things, hardwired OID-to-type mappings, and there's got to be a
cleaner way than that).


I believe the information_schema is standard.


Standard, but woefully incomplete (by design).

Also, AIUI, it's fairly slow in use, compared to touching the underlying
postgresql-specific tables, which would be something that you might
not care about in design tools but which might be a problem for use in
an ORM or similar.

Something like newsysviews might be an appropriate answer, but if
it's not included in a core distribution then none of the APIs or ORMs
can rely on it.

Given that, I think that using common queries for DBD::Pg, JDBC, Qt,
etc, etc would probably benefit an awful lot of users and reduce the
amount of duplicated effort across the various APIs. Ripping the
existing queries out of one or more of those and just having a few
people who understand pg_* sanity check them seems like it might
be a decent place to start, if nobody has already done something
similar.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standard metadata queries

2008-04-24 Thread Steve Atkins



On Apr 24, 2008, at 2:22 PM, Tom Lane wrote:


Steve Atkins [EMAIL PROTECTED] writes:

On Apr 24, 2008, at 11:12 AM, Joshua D. Drake wrote:

I believe the information_schema is standard.



Standard, but woefully incomplete (by design).


Sure, because it's restricted to standardized concepts.  Do the  
adapters

in question need to obtain info about nonstandard things?  One would
hope that they're trying to confine themselves to SQL-standard stuff.


Types and indexes are two things that are commonly needed that
aren't covered well by information_schema.

Also, AIUI, it's fairly slow in use, compared to touching the  
underlying

postgresql-specific tables, which would be something that you might
not care about in design tools but which might be a problem for use  
in

an ORM or similar.


This is a fair point, and it's unlikely ever to be fixed completely,
though perhaps we could put a bit more effort into whichever views are
considered performance-critical.


If it turns out that the sort of information that's needed by APIs
can be answered solely by information_schema queries then it'd
be worth a look (though I suspect that some of the requirements
that the standards put on information_schema rule out some
performance improvements).


AFAIR, the only times we've heard from adapter authors were when they
couldn't make something work at all :-(.  A review project like you
propose would be worthwhile.  Aside from possibly helping the adapter
authors, it would give us a better sense of which changes to the  
system

catalogs to avoid because they'd be likely to break clients.


Good enough reason for me to put some time into it, I think. I'll go see
what current APIs are using and put something up on the wiki.

Cheers,
  Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-03 Thread Steve Atkins



On Apr 3, 2008, at 7:01 AM, Aidan Van Dyk wrote:

* Greg Sabino Mullane [EMAIL PROTECTED] [080403 09:54]:

Right now contrib is a real catch-all of various things; it would  
be nice to
categorize them somehow. And by categorize, I emphatically do NOT  
mean

move to pgfoundry, which is pretty much a kiss of death.


But that begs the question of *why* it's a kiss of death?

For instance, in perl land, having something in CPAN and not in
perl core is most certainly *not* a kiss of death?  Why is it so
different for PostgreSQL?

Is it because the infrastructure behind CPAN is much better than that
behind pgfoundry?


Yes. I can install a package from a CPAN mirror with a one-line
incantation and be sufficiently sure it works that on the very rare
occasions it doesn't I'm really surprised.

On the Windows end of things I can usually get pre-built binaries
of those same packages installed, in the cases where a compiler
is needed to build them. The exact process is a bit different, but it's
consistent across most packages and uses the same namespace.

Or is it because CPAN is better vetted and organized than  
pgfoundry?


Partly. Vetted is partly self-vetting - you're expected to pass your  
self
tests and install cleanly before you publish to CPAN. The naming  
hierarchy

helps with the CPAN organization, and makes it easier to use than the
trove approach, once you're familiar with the perl namespace habits.

Some of that is applicable to a postgresql package distribution method,
but the neat organization is a perl thing, not a CPAN thing, so that  
idea

doesn't really transfer.




Or is it because the projects that go into CPAN are better quality and
projects in pgroundry?


Partly. There are some dubious packages on CPAN but they're finished,
and with extremely few exceptions download, pass their self tests and
do what it says on the box (the main flaws are packages going stale
and occasionally dependency problems).

Pgfoundry is a development site with a search engine and has projects
in various stages of completion from vaporware to production tested
usable code.


Or is it something else?


Projects vs Packages sums up the differences.

Cheers,
  Steve

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER and synchronized scans and pg_dump et al

2008-01-28 Thread Steve Atkins



On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:


Andrew Dunstan [EMAIL PROTECTED] writes:

Kevin Grittner wrote:

It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.



What will be the performance hit from doing that?


That worries me too.  Also, in general pg_dump's charter is to  
reproduce

the state of the database as best it can, not to improve it.


One common use of cluster around here is to act as a faster version
of vacuum full when there's a lot of dead rows in a table. There's no
intent to keep the table clustered on that index, and the cluster flag
isn't removed with alter table (why bother, the only thing it affects is
the cluster command).

I'm guessing that's not unusual, and it'd lead to sorting tables as part
of pg_dump.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Steve Atkins



On Nov 14, 2007, at 6:57 PM, Josh Berkus wrote:


Tom,

I've got one upstairs (HPPA), and I believe that it's actually a  
pretty

common situation in scientifically-oriented workstations from a few
years back.


Last I checked, scientific workstations aren't exactly a common  
platform for

PostgreSQL servers.

The question is, for our most common platforms (like AMD and Intel)  
is the FPU
notably slower/more contended than integer division?  I'd the  
impression that
it was, but my knowledge of chip architectures is liable to be out  
of date.


Can we have a hardware geek speak up?


Somewhat. The last version of K7 I looked at had three integer  
execution units versus one floating point unit.


They're also scheduled fairly independently, meaning that casts from  
double to integer or back again will have some minor negative effects  
on the pipeline or the scheduler more than the use of floating point  
itself.


In the grand scheme of things, though, I don't believe it's a big  
deal for typical code on most modern desktop CPUs, certainly not  
compared to memory starvation, use of less than optimal compilers and  
all the other reasons the pipeline might stall. I might care in the  
innermost of inner loops, but possibly not even then unless a  
profiler told me differently.


Cheers,
  Steve


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Best implementation of PATRICIA

2007-08-25 Thread Steve Atkins



On Aug 25, 2007, at 11:37 AM, Alex Povolotsky wrote:


Hello!

I'm working on a project requiring fast query like 'does ADDRESS  
belongs to SET OF NETWORKS?'. Naturally, such a query is better  
implemented using PATRICIA, but building PATRICIA tree is a  
relatively long task and is better to be done once, for instance,  
at server startup.


I'm thinking of implementing such a tree using stored procedures,  
and looking for advise from postgresql-hackers: how can I hook  
startup of server?


Idea of having something like a blob to store and restore PATRICIA  
tree may be better suited to standard SQL, but I'm looking for more  
elegant solution. Or am I totally wrong?




Patricia is a nice algorithm, but I suspect you'd be much happier with
GIST and http://pgfoundry.org/projects/ip4r/

Cheers,
  Steve


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] crypting prosrc in pg_proc

2007-08-10 Thread Steve Atkins



On Aug 10, 2007, at 12:00 PM, Gregory Stark wrote:


Jonah H. Harris [EMAIL PROTECTED] writes:


Obfuscation doesn't really work, it just makes big wigs in companies
*think* it's not easily reversible.

There is no real security.  With enough time and experience, anything
can be broken.


But that said, I wonder if having something may be useful legally  
for some

users.

If someone just went and did select * from pg_proc they could  
claim they
weren't violating their EULA or any protection you had put in  
place. If they
went through the trouble having to de-obfuscate it then you would  
have a

strong DMCA claim in the US.


If doing so would violate their contract with you then it'll violate
their contract (and make them liable for large amounts of liquidated
damages) whether they violate the DMCA or not.

If the code in question isn't important enough to your business that
you have a signed contract with those using it, then it's not really
that important.



But Jonah's entirely right that there's no way to make it technically
impossible to de-obfuscate. All you can do is make any casual  
observer pause

and decide to break your license agreement.


Code obfuscation is a bad non-solution to a problem that's far
better solved contractually (assuming it's a real problem, which
it often isn't).

Cheers,
  Steve


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Bugtraq: Having Fun With PostgreSQL

2007-06-23 Thread Steve Atkins



On Jun 23, 2007, at 11:03 AM, Magnus Hagander wrote:




I would also argue that trust auth is not such an evil option that we
mustn't allow it to be the default.  On a single-user machine it's
actually perfectly sane, seeing that we don't allow TCP connections
by default.


Is there really such a thing as a single-user machine running
PostgreSQL? Maybe single human user, but if you're not running some
other services on it (webserver, mailserver, whatever) it's not very
likely to be running pg, I think.


My laptop and dev boxes are all single user (and all do trust
on unix sockets).

I have several webserver boxes that use local postgresql
installations. Again, trust works just fine for that. There's
no security issue unless someone has compromised the
box (probably via the webapp), and if they've done that
they already have the keys to the castle. Mailserver, ditto.

We deploy CRM systems, running on dedicated boxes,
to customers. While they tend to get configured with
md5 access, just because it's tidy and easy to explain
to IT security folks, they'd actually be just as secure with
trust auth.


Out of curiosity, how do other databases deal with this? The only one
I've been installing recently is MSSQL which basically lets you chose
between SSPI
(ident-sameuser-over-sockets-and-kerberos-over-network-equivalent) or
password (md5 equivalent, AFAIK). It does *not* let you use empty
passwords, which is what would be equivalent with our trust.


But that's not an installation from source. That's a packaged
installation, provided by the distribution owner. It's much more
comparable to, say, a Debian package.

MySQL installs with an empty root password for access from
localhost or the machines own IP address. It also installs an
account with network access to any database beginning with
test and possibly some more ill-defined accounts with local
access.

Part of the suggested post-install process for mysql involves
deleting some of those accounts and setting password for root.

Cheers,
  Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] PG-MQ?

2007-06-19 Thread Steve Atkins



On Jun 19, 2007, at 2:45 PM, Chris Browne wrote:


I'm seeing some applications where it appears that there would be
value in introducing asynchronous messaging, ala message queueing.
http://en.wikipedia.org/wiki/Message_queue


Me too.


My bias would be to have something that can basically run as a thin
set of stored procedures atop PostgreSQL :-).  It would be trivial to
extend that to support SOAP/XML-RPC, if desired.

It would be nice to achieve 'higher availability' by having queues
where you might replicate the contents (probably using the MQ system
itself ;-)) to other servers.

There tend to be varying semantics out there:

- Some queues may represent subscriptions where a whole bunch of
  listeners want to get all the messages;

- Sometimes you have the semantics where:
  - messages need to be delivered at least once
  - messages need to be delivered no more than once
  - messages need to be delivered exactly once

Is there any existing work out there on this?  Or should I maybe be
looking at prototyping something?


The skype tools have some sort of decent-looking publish/subscribe
thing, PgQ, then they layer their replication on top of. It's multi
consumer and producer, with delivered at least once semantics.

Looks nice.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] storage of sensor data with Fourier transforms

2007-05-04 Thread Steve Atkins



On May 4, 2007, at 10:13 PM, Nathan Buchanan wrote:


Hello!

I have a potential situation where I will have a lot of sensor data  
coming in very often. (every second or so) The sensor data is from  
physics type measurements, and will normally follow a slowly  
changing pattern with sinusoidal disturbances. The overall shape of  
the data is more important than catching high frequency disturbances.


I had the idea of taking the Fourier transform of the waveform and  
storing the waveform internally that way to reduce storage  
requirements. When I get some free cycles, I may try doing this. I  
would want it to be done in the database in such a way that the  
user can still join to a table using this internal storage scheme.


Why am I mailing this list? I'd like to ask if anyone has heard of  
someone doing anything like this. I did a small search of the lists  
and internet but didn't come up with anything. I just don't want to  
re-invent the wheel.


It's an interesting idea in theory (sounds like a simple low-pass  
filter would do most of what you want) but it doesn't sound like that  
great an idea in practice.


Sampling at 1Hz isn't going to generate very much data at all, unless  
you're capturing ridiculous amounts of data at each sample point (I  
have boxes averaging hundreds of pieces of data a second totaling 10s  
of k, along with a bunch of index updates without breaking much of an  
I/O sweat).


Unless you have some very tight storage constraints (embedded  
controllers, flash storage, ...) you're not going to see a problem  
with storing the actual data, and you're likely to regret both  
discarding the original data and expending much effort developing  
software low-pass pre-processing for the input signal.


(Processing it for analysis is a whole other thing, but even that  
will be much easier to do if you can apply variants on your algorithm  
to the untouched original data).


Cheers,
  Steve


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Steve Atkins



On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:


Bruce Momjian wrote:
Yep, agreed on the random I/O issue.  The larger question is if  
you have
a huge table, do you care to reclaim 3% of the table size, rather  
than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is  
going
to take a lot of time, but vacuuming to relaim 3% three times  
seems like
it is going to be more expensive than just vacuuming the 10%  
once.  And

vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than  
just paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you  
can at least do something useful with a large table if a full  
vacuum takes 24 hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way  
back when isn't directly addressing the issue. What most people  
want (I'm guessing) is to vacuum *when the user-workload allows*  
and the time-tabling is just a sysadmin first-approximation at that.


Yup. I'd really like for my app to be able to say Hmm. No  
interactive users at the moment, no critical background tasks. Now  
would be a really good time for the DB to do some maintenance. but  
also to be able to interrupt the maintenance process if some new  
users or other system load show up.


With partial vacuuming possible, we can arrange things with just  
three thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table requires vacuuming value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to  
the thresholds and starts a new vacuum, kills one or does nothing.  
New vacuum processes keep getting started as long as there is  
workload spare and tables that need vacuuming.


Now the trick of course is how you measure system workload in a  
meaningful manner.


I'd settle for a start maintenance, stop maintenance API.  
Anything else (for instance the heuristics you suggest above) would  
definitely be gravy.


It's not going to be simple to do, though, I don't think.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] pg_hba.conf hostname todo

2006-12-27 Thread Steve Atkins



On Dec 27, 2006, at 1:47 PM, Joshua D. Drake wrote:


On Wed, 2006-12-27 at 16:41 -0500, Stephen Frost wrote:

* Joshua D. Drake ([EMAIL PROTECTED]) wrote:

Allow pg_hba.conf to specify host names along with IP addresses


Excellent.

Host name lookup could occur when the postmaster reads the  
pg_hba.conf
file, or when the backend starts. Another solution would be to  
reverse
lookup the connection IP and check that hostname against the host  
names
in pg_hba.conf. We could also then check that the host name maps  
to the

IP address.




Doing the DNS at connect time, not startup time is the right thing to  
do. The
main reason to use hostnames rather than IP addresses (or domain  
wildcards

vs CIDR blocks) is because you're expecting the mapping to change. You
really don't want to add restart all our postgresql instances to  
the DNS managers

I changed a hostname checklist.

I'm inclined towards doing the reverse-DNS of the connecting IP  
and then

checking that the forward of that matches.


Hmm what if it doesn't? Which is the case any many scenario. My  
thoughts

are:

If www.commandprompt.com is allowed, then the ip address  
207.173.200.129

is allowed to connect.


DNS is cheap. For the simple case it wouldn't be too hard to expand  
all the given

hostnames at connect time, but there's a problem ...


If we go the reverse way:

129.200.173.207.in-addr.arpaname = 129.commandprompt.com.

Which really isn't that useful imo.


... unfortunately, you need to do it that way if you want to support
wildcards, as there's no way to expand *.example.com to a list of
IP addresses.






Allow one to specify a FQDN or a simple wild card DN. E.g;
*.commandprompt.com.

A valid entry would look like this:

hostall all *.commandprompt.com  trust
hostall all www1.postgresql.org  md5

Thoughts?


While a wildcard does make sense (ie: www*.postgresql.org), I would
generally expect 'commandprompt.com' to mean '*.commandprompt.com'
implicitly.


Hmm interesting. I wouldn't expect that. I might
expect .commandprompt.com to mean *.commandprompt.com. But
commandprompt.com I would expect only whatever the A record returns as
commandprompt.com.



One thing I don't want to do is create a bunch of different style
syntaxes that are available :)


tcp_wrappers (/etc/hosts.allow and friends) are one fairly widely
used standard for this, and one that's mostly compatible with the
existing usage for dotted quads, I think. It uses .example.com
to match anything that ends in example.com.

(If you're going down that road you can also have a host that
begins with a / to refer to an external list of filenames, but that
way may lie madness.)

Cheers,
  Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] effective_cache_size vs units

2006-12-19 Thread Steve Atkins



On Dec 19, 2006, at 9:50 PM, Jonah H. Harris wrote:


On 12/19/06, Tom Lane [EMAIL PROTECTED] wrote:

I think we should just accept the strings case-insensitively, too.


While acknowledging Peter's pedantically-correct points, I say +1 for
ease of use.


+1. I spend some time walking people through tuning issues
by phone or IM. Anything that complicates supporting users or
frustrates users for no actual benefit is a bad thing.

(And this is unrelated to any theoretical units-aware data type -
we might well be interested in milliwatts and megawatts in a
datatype,  but in the configuration file we're unlikely to ever
need to configure things in units of millibits).

Cheers,
  Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Steve Atkins



On Oct 24, 2006, at 8:48 PM, Bruce Momjian wrote:


Joshua D. Drake wrote:

Josh Berkus wrote:

Bruce,

I have updated the text.  Please let me know what else I should  
change.
I am unsure if I should be mentioning commercial PostgreSQL  
products in

our documentation.


I think you should mention the postgresql-only ones, but just  
briefly with a

link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.


And to further this I would expect that it would be a subsection..  
e.g;
a sect2 or sect3. I think the open source version should  
absolutely

get top billing though.


I am not inclined to add commercial offerings.  If people wanted
commercial database offerings, they can get them from companies that
advertize.  People are coming to PostgreSQL for open source solutions,
and I think mentioning commercial ones doesn't make sense.

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.


I'm not coming to PostgreSQL for open source solutions. I'm coming
to PostgreSQL for _good_ solutions.

I want to see what solutions might be available for a problem I have.
I certainly want to know whether they're freely available, commercial
or some flavour of open source, but I'd like to know about all of them.

A big part of the value of Postgresql is the applications and extensions
that support it. Hiding the existence of some subset of those just
because of the way they're licensed is both underselling postgresql
and doing something of a disservice to the user of the document.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Steve Atkins



On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote:


Steve Atkins wrote:

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.


I'm not coming to PostgreSQL for open source solutions. I'm coming
to PostgreSQL for _good_ solutions.

I want to see what solutions might be available for a problem I have.
I certainly want to know whether they're freely available, commercial
or some flavour of open source, but I'd like to know about all of  
them.


A big part of the value of Postgresql is the applications and  
extensions

that support it. Hiding the existence of some subset of those just
because of the way they're licensed is both underselling postgresql
and doing something of a disservice to the user of the document.


OK, does that mean we mention EnterpriseDB in the section about Oracle
functions?  Why not mention MS SQL if they have a better solution?  I
just don't see where that line can clearly be drawn on what to  
include.

Do we mention Netiza, which is loosely based on PostgreSQL?   It just
seems very arbitrary to include commercial software.  If someone wants
to put in on a wiki, I think that would be fine because that doesn't
seems as official.


Good question. The line needs to be drawn somewhere. It's basically
your judgement, tempered by other peoples feedback, though. If it
were me, I'd ask myself Would I mention this product if it were open
source? Would mentioning it help people using the document?.

Cheers,
  Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Roadmaps 'n all that

2006-08-31 Thread Steve Atkins



On Aug 31, 2006, at 8:47 PM, Tom Lane wrote:


[ hijacking this thread over to where the developers hang out ]

Alvaro Herrera [EMAIL PROTECTED] writes:

Tom Lane wrote:

It's pointless to suppose that individual developers would really be
answerable to any project-wide management, since that's not who  
they're
paid by.  So I tend to think that a project roadmap would be more  
of an

exercise in wishful thinking than a useful management tool.  OTOH it
*could* be useful, if there are any developers out there  
wondering what
they should work on next.  Are there any ... and would they  
listen to a

roadmap if they had one, rather than scratching their own itches?



I would certainly listen to a roadmap if it talked to me ...


Well, this question keeps coming up, and we keep arguing about it, and
we still have no data to say whether it would work well for *this*
project.  Maybe it's time to take the bull by the horns.

I propose a modest experiment: for the 8.3 development cycle, let's  
try

to agree (in the next month or so) on a roadmap of what major features
should be in 8.3 and who will make each one happen.  A year from now,
we will know whether this is a great thing we should continue, or we
should stick to our traditional laissez-faire style of project
management.  I figure that even if it really sucks, it wouldn't  
kill us
to try it for one release cycle --- at the very worst, we'd make up  
lost
time in future by no longer needing to waste bandwidth arguing  
about it.


Would this be a core postgresql code roadmap or something a bit
broader (contrib, custom types, GUI-ish stuff, utilities and what
have you)?

Cheers,
  Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: pgsql-patches reply-to (was Re: [HACKERS] [PATCHES] selecting

2006-08-17 Thread Steve Atkins



On Aug 17, 2006, at 9:30 AM, Magnus Hagander wrote:


Ever since pgsql-patches replies started going to -hackers,
threading doesn't work anymore, so I for one can't tell what this
refers to at all.


Yeah, that experiment hasn't seemed to work all that well for me
either.  Do you have another idea to try, or do you just want to
revert to the old way?


I'd vote for reverting to the old way. Anyone serious about hacking
should be on both lists.


Then why bother with two different lists?

If developers need to be on both list (which I beleive they do),  
and the
focus of both lists is developers, then why not just remove one of  
them

and get rid of the problem?


One reason might be that a lot of application developers who develop
applications or modules associated with PG, but not the core PG code
itself also lurk on -hackers, as it's by far the best way to keep up  
with

the status of various PG enhancements (and also an excellent place
to pick up a lot of undocumented good practices).

Cheers,
  Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] automatic system info tool?

2006-07-17 Thread Steve Atkins



On Jul 17, 2006, at 12:57 PM, Martijn van Oosterhout wrote:


On Mon, Jul 17, 2006 at 11:06:50AM -0400, Bort, Paul wrote:

If you can open a command shell you can get the OS version with the
'ver' command under Windows:

C:\ver

Microsoft Windows XP [Version 5.1.2600]


How do you do this from a program though. Under UNIX uname() is a
function call as well as a program. It returns the os name, version,
hostname and system type.



GetVersionEx() will get you the windows version, service pack, etc IIRC.

Cheers,
  Steve


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread Steve Atkins



On Jun 22, 2006, at 6:56 PM, Agent M wrote:



On Jun 22, 2006, at 9:56 PM, Christopher Kings-Lynne wrote:


The example is a very active web site, the flow is this:
query for session information
process HTTP request
update session information
This happens for EVERY http request. Chances are that you won't have
concurrent requests for the same row, but you may have well over  
100 HTTP

server processes/threads answering queries in your web server farm.


You're crazy :)  Use memcache, not the DB :)


Still, the database is the one central location that the apaches  
can connect too- postgres already has a lot of application platform  
features- locking synchronization, asynchronous notifications,  
arbitrary pl code.


Personally, I think that a special non-MVCC table type could be  
created- the catalogs are similarly flat. What I envision is a  
table type that can only be accessed outside transactions (like  
AutoCommit mode)- this is already possible to implement in plperl  
for a single session. It would be more efficient to have something  
like a global temp table hanging around...




Have you seen pgmemcache? It allows you to access a memcached
instance from within postgresql - which makes many of the problems
with using a separate caching / storage layer go away, or at least
get far easier to deal with.

Cheers,
  Steve



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Steve Atkins



On Jun 13, 2006, at 7:34 AM, Martijn van Oosterhout wrote:


What this tells me is that we need a tool somewhere between psql and
pg_dump, say, pgquery. It's sole purpose in life is to generate output
from various queries. Because it's a seperate tool there's no question
of psql or pg_dump being able to parse them.

While you're at it, you could add modules to support many different
output styles, like CSV, XML, Excel format, HTML, etc.

This I beleive would take the load off psql to provide many different
output styles, as well as the load off pg_dump to produce
parsable-by-third-party output.

Thoughts?


Perl+DBD::Pg+CPAN does almost all of this already. Lots of support
for countless different output formats, and mostly fairly well battle- 
tested.


I suspect that a perl script to do all that would be dominated by  
commandline

option parsing, as all the hard work is in existing modules.

Would that be adequate, or do we really want to reimplement and  
maintain all

the output format complexity in our own code, in C?

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread Steve Atkins



On Jun 13, 2006, at 9:47 AM, Joshua D. Drake wrote:

Would that be adequate, or do we really want to reimplement and  
maintain all

the output format complexity in our own code, in C?


I think the point is that we should provide a native implementation  
because not everyone is crazy enough to use perl (blatant jab ;)).  
I would never expect a customer to write a perl or python script  
just to get their data in what is widely considered a standard  
business format that can be imported by their userland application.


That wasn't what I was discussing, on two levels. Firstly, I wasn't  
suggesting that the end user write anything, secondly I was talking  
about the other output formats discussed (Excel, HTML...) rather than  
just CSV.




The people on the hackers list, are NOT the target for this  
feature. The people on general, admin and novice are.


I was referring to the other export formats mentioned (Excel,  
HTML...). We already support CSV export of single tables with the  
bundled software, don't we?


Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Possible TODO item: copy to/from pipe

2006-05-31 Thread Steve Atkins



On May 31, 2006, at 12:58 PM, Dave Page wrote:





On 31/5/06 19:13, Andreas Pflug [EMAIL PROTECTED] wrote:


I wonder if we'd be able to ship gzip with the windows installer, to
insure proper integration.


'Fraid not. It's GPL'd.


Well, one implementation of it is. zlib is new-bsd-ish, though, and  
includes

minigzip, which should be just fine for use in a pipe on windows.

(Not that that's an argument one way or the other as to whether this
is something we should do).

Cheers,
  Steve




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] OSX intel

2006-03-18 Thread Steve Atkins



On Mar 18, 2006, at 1:39 PM, Neil Conway wrote:


On Sat, 2006-03-18 at 22:36 +0900, Michael Glaesemann wrote:

Yes, there have been reports that it builds. You can check the
archives for details.


Are we prepared to declare that OS/X on Intel is an officially  
supported

platform for the 8.1 release series? If so, we should add that
information to the list of supported platforms for the REL8_1_STABLE
branch. Given how (relatively) popular OS/X Intel machines will before
the 8.1 branch is retired, we should at least clarify the status of  
the

OS/X Intel port.

BTW, I notice that there isn't an OS/X Intel machine in the buildfarm.
Does anyone have access to one?


I'm running regressions on one with the buildfarm code. I just need
to get around to plugging it back into the buildfarm.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] random observations while testing with a 1,8B row table

2006-03-10 Thread Steve Atkins



On Mar 10, 2006, at 11:54 AM, Tom Lane wrote:


Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
3. vacuuming this table - it turned out that VACUUM FULL is  
completly
unusable on a table(which i actually expected before) of this  
size not

only to the locking involved but rather due to a gigantic memory
requirement and unbelievable slowness.


sure, that was mostly meant as an experiment, if I had to do this  
on a
production database I would most likely use CLUSTER to get the  
desired
effect (which in my case was purely getting back the diskspace  
wasted by

dead tuples)


Yeah, the VACUUM FULL algorithm is really designed for situations  
where

just a fraction of the rows have to be moved to re-compact the table.
It might be interesting to teach it to abandon that plan and go to a
CLUSTER-like table rewrite once the percentage of dead space is  
seen to

reach some suitable level.  CLUSTER has its own disadvantages though
(2X peak disk space usage, doesn't work on core catalogs, etc).


I get bitten by this quite often (customer machines, one giant table,
purge out a lot of old data).

CLUSTER is great for that, given the headroom, though I've often
resorted to a dump and restore because I've not had the headroom
for cluster, and it's a lot less downtime than a full vacuum.

While the right fix there is to redo the application engine side to use
table partitioning, I keep wondering whether it would be possible
to move rows near the end of the table to the beginning in one, non- 
locking

phase (vacuum to populate FSM with free space near beginning of table,
touch rows starting at end of table, repeat) and then finish off with a
vacuum full to tidy up the remainder and truncate the files (or a  
simpler

lock the table and truncate anything unused at the end).

Cheers,
  Steve




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_service.conf

2006-02-19 Thread Steve Atkins



On Feb 19, 2006, at 10:59 AM, Mark Woodward wrote:


Mark Woodward [EMAIL PROTECTED] writes:

DNS isn't always a better solution than /etc/hosts, both have  
their pros

and cons. The /etc/hosts file is very useful for instantaneous,
reliable, and redundent name lookups. DNS services, espcially in  
a large

service environment can get bogged down. 20,000 hosts doing a lot of
lookups can require a dedicated single point of failure. OK, so  
you add

two DNS machines and load balance across them with a fault tollerant
load
balancer, how many thousands of dollars? For how much information? A
simple clustercpy -f targets pg_service.conf /etc would save  
thousands

of dollars, increase efficiency, increase reliability, decrease
electrical
costs, etc.


Um, is there something wrong with having multiple DNS servers in
resolv.conf?  Other than having to time out on #1 before you try #2?
I'm genuinely curious.


What is the timeout of that DNS lookup, before it goes to the  
second DNS

server?


Depends on the resolver you use. Often the timeout is zero. Other  
times

it's adaptive, depending on history of response time from the servers.

Except in the case of horrible misconfiguration, it's rarely a problem.

Cheers,
  Steve


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] [HACKERS] New project launched : PostgreSQL GUI Installer for

2006-01-30 Thread Steve Atkins



On Jan 30, 2006, at 8:48 PM, Tony Caduto wrote:


Devrim GUNDUZ wrote:

Have you looked at AutoPackage?

http://autopackage.org

screen shots.

http://autopackage.org/gallery.html

Has a GUI wizard if X windows is available and a command line  
wizard if no X is available.



Using autopackage is similar to using MSI,Wise,Inno etc on Windows.


If that's the one that uses aptools it looks _excellent_. Until you try
and use it. It looked as though it would solve many of my packaging
problems, not least deploying on older platforms than the build box,
but simply didn't work on anything more complex than toy code.

I suspect that if you were just using it as a general installer, rather
than any of the portability magic, it might be worth a look.

Cheers,
  Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Cleaning up the INET/CIDR mess

2006-01-26 Thread Steve Atkins



On Jan 25, 2006, at 9:29 AM, Bruce Momjian wrote:


Tom Lane wrote:

Greg Stark [EMAIL PROTECTED] writes:
I wonder if this would be an opportunity to fix Postgres's  
handling of

addresses like '10.1'.


You've mistaken this for a proposal to change the I/O behavior, which
it is specifically not.


The standard interpretation of this is the same as '10.0.0.1'.


Standard according to whom?  Paul Vixie evidently doesn't think that
that's a standard abbreviation, else the code we borrowed from  
libbind

would do it already.


Agreed.  10.1 as 10.0.0.1 is an old behavior which has been removed  
from

most modern versions of networking tools.


Whether PG should support it or not is another question (personally I  
think
that anything other than a dotted quad should fail with an error) but  
it certainly

hasn't been removed from most modern versions of networking tools.

gethostbyname() is used by most networking tools, and on most unix OSes
it believes 10.1 'resolves to' 10.0.0.1. That includes current  
versions of

linux, OS X, Solaris, Windows XP and I believe the BSDs.

So the vast majority of applications on the vast majority of deployed  
platforms

believe that 10.1 is the address 10.0.0.1. (As is often the case binds
behaviour is inconsistent and can't really be used as proof of  
standard

behaviour).

Cheers,
  Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] State of support for back PG branches

2005-09-27 Thread Steve Atkins

On Mon, Sep 26, 2005 at 09:54:46PM -0700, Joshua D. Drake wrote:
 [ raised eyebrow... ]  Has bizgres obtained a crystal ball from
 somewhere?  There is *no* way anyone could provide you anything that
 has any legitimate claim on the name PG 8.2 three months from now.
  
 
 
 I **think** what he meant was that depending on what Bizgres has done in 
 the next three
 months they may not run 8.1 but run Bizgres instead.

Partly that, yes. Once 8.1 has stabilized we'll look at 8.1, bizgres
as current at that point and the 8.2 crystal ball and decide where the
sweet spot is for our needs.

The only crystal ball involved is the assumption that if bizgres has
Neat Stuff(tm) that would be of widespread use in it's development
tree at that point then the odds are good that it, or something
functionally equivalent to it, will appear in the 8.2 development
tree. If the neat stuff is neat enough we may push off an update from
7.4 and develop using some of that neat stuff, with the plan of
deploying either to bizgres or to 8.2.

(My best guess right now is that 8.1 will be the right thing to move to,
 but I'm delaying that decision until next year.)

Cheers,
  Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] State of support for back PG branches

2005-09-26 Thread Steve Atkins

On Mon, Sep 26, 2005 at 09:27:28PM -0400, Andrew Dunstan wrote:
 Tom Lane wrote:
 
 Marc G. Fournier [EMAIL PROTECTED] writes:
  
 
 On Mon, 26 Sep 2005, Andrew Dunstan wrote:

 
 Maybe something like this would do: We will attempt to maintain support 
 of each major version for 3 years after its release, although this will 
 not always be possible. After that time any major support requirement is 
 likely to result in support being ended.
  
 
 
  
 
 This sounds reasonable to me ... I think it is more then most software 
 projects do, isn't it?

 
 
 To translate that into reality: 7.2 (2002-02-04) would be dead already,
 and 7.3 (2002-11-27) will be dead around the time we are likely to
 release 8.1.  
 
 
 It doesn't say we must drop it, it says we can after 3 years (or 1 
 release + 2 years if you like) without any troubles of conscience. ISTM 
 that we could and possibly should keep supporting it until it appeared 
 some major patch was required that was too much work or too dangerous.
 
 Remember, many people don't want to jump onto a release right away - I 
 know of large enterprises that have a policy not to use the .0 version 
 of anything. So a 3 year cycle is more likely to be a 2 1/2 year cycle 
 in reality. Then factor in testing and migration time and the production 
 life in the field between deployment and end of life might be only about 
 2 years. That's plenty short enough, especially as we still don't have a 
 nice pg_upgrade utility.

We started our upgrade from 7.2 to 7.4 about 20 months ago and finished it
about 10 months ago, skipping 7.3 entirely.

We've only just today hit our first problem in 7.4, and it's fixed by
upgrading to 7.4.current, rather than the 7.4.something we originally
upgraded to from 7.2.something.

We'll be skipping 8.0 completely and the next step will probably be to
8.1.something (or possibly 8.2.something, depending on how bizgres
looks in 3 months time). We'd probably consider upgrading our
customers more often, but a dump and restore is extremely painful.

Just a view from the pg-based-enterprise-application world.

A nice pg_upgrade utility would make a big difference. Clearly an
in-place upgrade is possible, but maintaining is hard. There are two
broad ways of running a pg_upgrade project - one that is entirely
independent of the main codebase and one that puts requirements on the
main codebase developers (if you change $foo you provide code to
translate old $foo to new $foo). Any feel for the relative difficulty
of the two approaches? And how much push-back there'd be on the latter?

Cheers,
  Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] initdb profiles

2005-09-09 Thread Steve Atkins

On Thu, Sep 08, 2005 at 08:29:38PM -, Andrew - Supernews wrote:
 On 2005-09-08, Peter Eisentraut [EMAIL PROTECTED] wrote:
  Andrew - Supernews wrote:
  Running initdb behind the scenes is a proven dangerous practice
 
  Please elaborate.
 
 Example instance:
 http://archives.postgresql.org/pgsql-hackers/2004-12/msg00851.php
 
 More generally, you risk running initdb and doing a normal database
 startup despite missing filesystems (assuming your db is substantial
 and important enough that you don't keep in it /var or /usr). There are
 a number of ways that this can bite you, whether due to thinking that
 the database is up when it really isn't usable, or subsequently mounting
 over the new data dir, or any number of other potential issues.
 
 A missing data directory on startup should mean something is wrong, not
 oh look, I'll run initdb and start up anyway.

I think we read entirely different things into behind the scenes.

I have an installer script that's run to install a software
package. It runs an initdb to create the database it uses behind the
scenes.

Running initdb as part of an installation process is a very different
scenario to randomly running it whenever you think something may not
be quite right (although that is the pattern used by many other
daemon startup scripts on the OS in question, so it's at least
consistent).

Cheers,
  Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] initdb profiles

2005-09-08 Thread Steve Atkins

On Thu, Sep 08, 2005 at 09:54:59AM +0800, Christopher Kings-Lynne wrote:

 I think we should just do what MySQL does and include:
 
 postgresql.conf
 postgresql-large.conf
 postgresql-huge.conf

I do that, in the package of PG I distribute with my application. I
tell the user that they should use it in the installation documentation,
as part of the installation script, in the performance tuning documentation
in the maintenance documentation, in the runbook, on the website and
in the applications online help.

I also mention it to customers by phone, email and on occasion IRC or IM
when they're about to install it or have just installed it. I also mention
it to them any time they call about performance problems.

These are technically literate customers working for large ISPs, with
significant local sysadmin and DBA support, so the concept is not beyond them.

Yet when I ssh in to one of their servers only about 1 in 3 is running
with anything other than the default postgresql.conf.

Just a data point. If it works, most people won't think to fix it.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Call for 7.5 feature completion

2005-09-01 Thread Steve Atkins

On Thu, Sep 01, 2005 at 09:17:38AM +0800, William ZHANG wrote:
  Dave Page wrote:
  
  * Compile with MSVC on Win32 platforms. MySQL support it.
  
  So what? It would take a major amount of work, with no useful benefits.
  
  ... and you can compile all the client and library stuff with MSVC - 
  just not the server nor extensions. But the audience for compiling those 
  is far smaller.
 
 I think the most popular method to build a project on Win32 is using 
 MSVC or Intel C. Intel C can be integrated with MSVC's IDE to help
 developers increase their productivity. Actually I have tried to make 
 the backend of pgsql-8.0.3 build with MSVC 6.0, and it works well.
 Should I polish it and send it as a patch?
 
 Having been a Win32 developer for several years, I think it is more 
 convenient to use MSVC's IDE than CL.exe with NMAKE.exe.
 Although I do not like Microsoft very much, and like to use MinGW
 or Cygwin to do some small tests, MSVC is more suitable for 
 native Win32 development. If pgsql want to be the first class citizen
 on Windows, and want to compete with MySQL, I think supporting 
 MSVC is important. I beleive there will be many contributions from 
 the Win32 world.

I think supporting MSVC is important, certainly (though I think that
supporting the Intel compiler is even better, as the only compelling
reason, IMO, to switch for the server end is generated code
quality). But that's very different from supporting visual studio.

I've been doing cross-platform development on a big codebase for
years, and the idea of trying to use the proprietary build
environments on each platform, and expecting to keep them sufficiently
in-sync that the end result is actually comparable on each platform is
laughable. And that's on a much smaller, simpler codebase than PG with
a much smaller, more integrated development team.

I use gmake or cons everywhere. On Windows I run them under cygwin and
have them call the MSVC commandline compiler. It all works fine. And
it doesn't stop me from using Visual Studio to edit the code, run the
debugger or anything like that. On OS X I can use XCode. On Solaris I
use the Forte environment. On Linux I use emacs and gcc. And that's
all on the same codebase with the same makefile checked out from the
same CVS repository.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remote administration functionality

2005-07-30 Thread Steve Atkins

On Sat, Jul 30, 2005 at 11:39:20PM -0400, Bruce Momjian wrote:
 Let me try to outline where I think our goals are for remote
 administration.  I will not comment on Dave's analysis of the patch
 review process, but I think he has some valid points that this patch was
 not treated properly.
 
 Basically, I think everyone wants remote administration.  Remote
 administration requires several things:
 
   o  edit postgresql.conf
   o  edit pg_hba.conf
   o  reload the config files
   o  restart the server (for config variables requiring restart)
   o  view log files
   o  recycle log files
   o  rename/remove log files
 
 All these items are on the TODO list already.

My security spider-sense tingles when I see the ability for a remote
attacker to not only completely override password, certificate and IP
absed authentication but also to easily remove logfiles.

So, while I can see the attraction of being able to futz with the
database security configuration through a PHP web interface running on
an unpatched Apache build somewhere out on the open internet (and
would like to be able to do so myself, sometimes) I'd really, really
like to see the ability to disable as much of this at compile time as
is convenient.

Cheers,
  Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pl/pgsql: END verbosity

2005-06-22 Thread Steve Atkins

On Thu, Jun 23, 2005 at 01:41:49AM +1000, Neil Conway wrote:
 Andrew Dunstan wrote:
 But this doesn't make it easier to use - users don't just include those who
 write it. The antecedent language of these, Ada, from which this syntax
 comes, was explicitly designed to be reader-friendly as opposed to
 writer-friendly, and this is a part of that.
 
 IMHO it is just needless verbiage that makes programs both harder to 
 read *and* harder to write, albeit marginally so. I think there is a 
 reason why Ada-style block terminators are in the minority among 
 block-structured languages :)
 
 But obviously this is a matter of taste -- does anyone else like or 
 dislike the current syntax?

Like is a bit strong. But it does make functions written in it easier
to read. And given that the primary debugging methodolofy for pl/pgsql
is Look at it hard and see what might be incorrect I can't see that
as a bad thing.

I'd trade a whole lot of harder to write for even some likely to
work.

Cheers,
  Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Autovacuum in the backend

2005-06-16 Thread Steve Atkins

On Wed, Jun 15, 2005 at 09:07:30PM -0700, Josh Berkus wrote:
 Josh,
 
  Just my own two cents. First I am not knocking the work that has been on
  autovacuum. I am sure that it was a leap on its own to get it to work.
  However I will say that I just don't see the reason for it.
 
 I've personally seen at least a dozen user requests for autovacuum in the 
 backend, and had this conversation about 1,100 times:
 
 NB: After a week, my database got really slow.
 Me: How often are you running VACUUM ANALYZE?
 NB: Running what?

Yes, me too.

I always understood autovacuum to be a way to avoid having newbies get
burned by not vacuuming, and for simplifying the maintenance of lower
traffic databases.

I don't see people with high-traffic databases (relative to the hardware
they're running on) ever using autovacuum with the current state of
vacuum and autovacuum.

If improvements to vacuum (unrelated to autovacuum) reduce the IO load
that would be a great thing, especially for those of us dealing with
24x7 databases. (I really like the dirty bitmap suggestion - it sounds
a clean way to reduce the amount of work needed). If autovacuum were
extended to allow more flexible scheduling (or even to be aware of the
other IO going on) then it would be of wider use - but I think the real
value of autovacuum is to make sure that new users (Windows...) don't
have a bad experience when they first try PG.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] NOLOGGING option, or ?

2005-06-02 Thread Steve Atkins

On Thu, Jun 02, 2005 at 07:33:13AM -0700, Luke Lonergan wrote:
 Oliver,
 
  Haven't you just replaced one preprocessing step with another, then?
 
 Generally not.  The most common problem with the current choice of escape
 character is that there are *lots* of data load scenarios with backslash in
 the text strings. 

I can only think of one where it's common. Windows filenames. But if
you're going to support arbitrary data in a load then whatever escape
character you choose will appear sometimes.

 The extra preprocessing to escape them is unnecessary on
 other databases and, in effect, causes the load to be even slower because
 you have to prepare the data ahead of time.

 Also, note that this patch can also do escape processing and the net result
 will still be 5+ times faster than what is there.

I strongly suspect that a patch to improve performance without changing
behaviour would be accepted with no questions asked.

One that allowed specifying the field and record delimiters and the
escape character and null symbol might require more discussion about
an appropriate syntax at the very least. So you may want to separate
the two.

 In the data warehousing industry, data conversion and manipulation is
 normally kept distinct from data loading.  Conversion is done by tools
 called ETL (Extract Transform Load) and the database will have a very fast
 path for direct loading of the resulting data.  PostgreSQL is definitely a
 strange database right now in that there is a default filter applied to the
 data on load.

 It's even more strange because the load path is so slow, and now that we've
 found that the slowness is there mostly because of non-optimized parsing and
 attribute conversion routines.  The question of how to do escape processing
 is a separate one, but is wrapped up in the question of whether to introduce
 a new loading routine or whether to optimize the old one.

There are already two loader routines. One of them is text-based and is
designed for easy generation of data load format using simple text
manipulation tools by using delimiters. It also allows (unlike your
suggestion) for loading of arbitrary data from a text file.

Because it allows for arbitrary data and uses delimiters to separate
fields it has to use an escaping mechanism.

If you want to be able to load arbitrary data and not have to handle
escape characters there's are two obvious ways to do it.

The first is that used by MIME and suggested by you. That is to use a
separator that you believe will not appear in the data. That can be
done by using a long multicharacter separator containing random
characters and assuming that sequence won't appear, it can be done by
parsing the input data twice, looking for strings that don't appear
for use as delimiters or it can take advantage of knowledge about
what characters can and can not appear in the input data. (I can't
imagine any case involving data-mining of web logs where the last
is likely to be relevant).

The other is to use length+data format for each tuple, avoiding all
issues of escapes in the data and allowing arbitrary data to be
represented. That's how the binary load format PG supports works, I
believe. If you're really concerned about speed of load that may be a
better format for your front-end to generate, perhaps?

Cheers,
  Steve

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] NOLOGGING option, or ?

2005-06-01 Thread Steve Atkins

On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote:
  I propose an extended syntax to COPY with a change in semantics to remove
  the default of WITH ESCAPE '\'.
  
  Er, doesn't this break existing database dumps?
 
 Yes, one of the previously stated reasons to create another command for
 loading data.
 
 Another possible approach is to keep the default, but allow the escape
 processing to be turned off.

I've been following this thread, and I'm a little confused. Could you
possibly clarify what you mean, by providing a couple of lines of
input as it would be formatted with escape processing turned off - 
containing a text field with an embedded newline and tab and a null field.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-20 Thread Steve Atkins

On Fri, May 20, 2005 at 11:59:00PM +0900, Michael Glaesemann wrote:

 Right, if you classify the information coming in, you can set controls
 over who sees it.  What we don't do now is any kind of classification.
 
 This may be a bit off-the-wall, but I recall Joel Spolsky recently  
 writing about using Bayesian filtering to classify mail into groups  
 other than spam/ham. I wonder if there's any use for something like  
 that in this case.
 
 http://www.joelonsoftware.com/articles/FogBugzII.html

No, definitely not. Pseudo-bayesian classification as used by the more
optimistic spam-filtering folks is pretty crappy at the best of times,
and it's really unusuable for more than 3-4 categories.

There are natural language analysis techniques that'll do this sort of
thing, but they're in the realms of research projects, not canned
tools.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-17 Thread Steve Atkins

On Tue, May 17, 2005 at 06:25:16PM -0400, Alvaro Herrera wrote:
 On Wed, May 18, 2005 at 08:04:51AM +1000, Brendan Jurd wrote:
  On 5/18/05, Stephen Frost [EMAIL PROTECTED] wrote:
   * Brendan Jurd ([EMAIL PROTECTED]) wrote:
In the interests of putting my money where my mouth is, I would be
willing to enlist in the housekeeping effort for this hypothetical new
system.
   
   If you're willing to create it, host it, update it and keep it current,
   and feel it'd be so worthwhile to people that you'd be willing to
   continue to maintain it...  Then go for it.  You don't need anyone's
   approval or even agreement about it.  *That* would be putting your money
   where your mouth is.
  
  I'm detecting sarcasm here, but just in case you're being serious ...
 
 I don't think Stephen was being sarcastic.  Such a system would need an
 enormous bootstrap effort.  Once it's in place, and having shown its
 usefulness, maybe the community would start using it.
 (Of course no one can make promises on that other than for himself.)

The useful bug tracking systems I've used have also included QA.  Any
bug submitted doesn't get accepted without a standalone test case.
That test case is used both to confirm that the bug has been fixed
once it's fixed, and is added into a set of regression tests.

If someone were to create test cases for (some or all of the)
submitted bugs (or handle the management of their creation) and track
which test cases passed (via a tinderbox, or even the existing
build-farm) that'd be a useful thing in itself. I suspect it'd be a
useful thing for someone who has energy to expend on bug-tracking to
do, and probably more immediately useful than anything that requires
all the primary developers to change how they're currently handling
bugs and to-do lists.

Just a thought.

Cheers,
  Steve


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-17 Thread Steve Atkins

On Wed, May 18, 2005 at 12:07:14AM -0400, Tom Lane wrote:

 Steve Atkins [EMAIL PROTECTED] writes:
  The useful bug tracking systems I've used have also included QA.  Any
  bug submitted doesn't get accepted without a standalone test case.
 
 Side note: while test cases are certainly Good Things that make life
 easier for developers, so we should encourage people to provide 'em,
 I can't say that I like the idea of a tracking system designed around
 the concept that a bug for which you don't have a test case isn't real.
 It's not all that easy to make a test case for bugs involving concurrent
 behavior.  I'd go so far as to say that most of the seriously
 interesting bugs that I've dealt with in this project were ones that the
 original reporter didn't have a reproducible test case for.

Depends on the context. For a code base like PGs (with, as you say,
many possibilities for weird concurrency issues or data driven bugs),
or for a development style like PGs (i.e. mostly volunteer),
_requiring_ a test case before a bug is worked on makes no sense.

Some environments I've worked in, though, have had a stage between bug
submitted and bug passed to developer where someone in QA makes an
attempt to create a test case where one was not submitted with the bug.
That was more the idea I was suggesting as a possibility - if someone has
a QA itch to scratch then trolling postgresql-bugs for bugs without test
cases and creating recreatable test cases to attach to those bugs might
be a useful thing to do.

Cheers,
  Steve


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] [BUGS] Bug in create operator and/or initdb

2005-01-31 Thread Steve Atkins

On Sun, Jan 30, 2005 at 09:49:43PM -0600, Larry Rosenman wrote:
 On Sun, 30 Jan 2005, Tom Lane wrote:
 
 Steve Atkins [EMAIL PROTECTED] writes:
 For a replacement type, how important is it that it be completely
 compatible with the existing inet/cidr types? Is anyone actually using
 inet types with a non-cidr mask?
 
 If you check the archives you'll discover that our current inet/cidr
 types were largely designed and implemented by Paul Vixie (yes, that
 Vixie).  I'm disinclined to second-guess Paul about the external
 definition of these types; I just want to rationalize the internal
 representation a bit.  In particular we've got some issues about
 conversions between the two types ...

 Please do **NOT** break the external representations.  We had enough fights
 about that 2-3 releases ago, and I personally don't want to revisit them.

 Yes, we do flakey things with inet on the masking stuff.

Well, if you want the ability to store both a host address and a
netmask in the same datatype the inet masking stuff makes sense.
That's not really a useful datatype for any actual use, but it's
fairly well-defined. The problem is that when someone looks at the
docs they'll see inet as the obvious datatype to use to store IP
addresses, and it isn't very good for that.

But that's not all that's flakey, unfortunately.

The CIDR input format is documented to be classful, which in itself is
horribly obsolete and completely useless in this decades internet (and
was when the current code was written in '98).

But the implementation isn't either classful or classless, and the
behaviour disagrees with documented behaviour, and the behaviour you'd
reasonably expect, in many cases.

-- Class A - documented to be 10.0.0.0/8
  steve=# select '10.0.0.0'::cidr;
  cidr 
  -
   10.0.0.0/32

-- Class B - documented to be 128.0.0.0/16
  steve=# select '128.0.0.0'::cidr;
   cidr 
  --
   128.0.0.0/32

-- Class C - documented to be 223.10.0.0/24
  steve=# select '223.10.0.0'::cidr;
   cidr  
  ---
   223.10.0.0/32

-- Class D
  steve=# select '224.10.0.0'::cidr;
  ERROR:  invalid cidr value: 224.10.0.0
  DETAIL:  Value has bits set to right of mask.

  steve=# select '224.0.0.0'::cidr;
  cidr 
  -
   224.0.0.0/4

-- Class E
  steve=# select '240.10.0.0'::cidr;
   cidr  
  ---
   240.10.0.0/32

I use postgresql for network-related applications and for IP address
related data mining, so I'm dealing with IP addresses in postgresql
on a daily basis.

The cidr type, including it's external interface, is simply broken.
There is no way to fix it that doesn't change that external interface.

I know of at least two independant implementations of function IP
address types that have been put together for specific projects to
implement a working IP datatype. The ability to use gist indexes on
them to accelerate range-based lookups is a bonus.

If it's not possible (for backwards compatibility reasons) to fix
inet+cidr, would migrating them out to contrib be a possibility?
Data types in the core tend to be widely used, even if they're
broken and there are better datatypes implemented as external
modules.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [BUGS] Bug in create operator and/or initdb

2005-01-31 Thread Steve Atkins

On Mon, Jan 31, 2005 at 12:16:26PM -0500, Tom Lane wrote:
 Steve Atkins [EMAIL PROTECTED] writes:
  The cidr type, including it's external interface, is simply broken.
 
 That is a large claim that I don't think you have demonstrated.
 The only one of your examples that seems to me to contradict the
 documentation is this one:
 
  steve=# select '224.0.0.0'::cidr;
   cidr 
   -
224.0.0.0/4
 
 which should be /32 according to what the docs say:

OK. If this sort of thing is considered a bug, rather than part
of the external interface that shouldn't be changed, then I'd
agree that cidr isn't entirely broken and it may well be possible
to improve it without changing the interface.

/me goes grovelling through the IPv6 inet code...

Cheers,
  Steve


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [BUGS] Bug in create operator and/or initdb

2005-01-29 Thread Steve Atkins

On Sat, Jan 29, 2005 at 10:07:30PM -0500, Tom Lane wrote:
 John Hansen [EMAIL PROTECTED] writes:
  In that case may I suggest fixing the catalog so network_* functions exists 
  for both datatypes!
 
 Redesigning the inet/cidr distinction is on the to-do list (though I'm
 afraid not very high on the list).  ISTM it should either be one type
 with a distinguishing bit in the runtime representation, or two types
 with no such bit needed.  Having both is a schizophrenic design.  It's
 led directly to bugs in the past, and I think there are still some
 corner cases that act oddly (see the archives).

From a network engineering point of view the inet type is utterly
bogus. I'm not aware of data of that type being needed or used in
any real application. Given that, the complexity that it causes
simply by existing seems too high a cost.

I suspect that the right thing to do is to kill the inet type
entirely, and replace it with a special case of cidr. (And possibly
then to kill cidr and replace it with something that can be indexed
more effectively.)

For a replacement type, how important is it that it be completely
compatible with the existing inet/cidr types? Is anyone actually using
inet types with a non-cidr mask?

Cheers,
  Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] apparent problem on linux/s390

2004-12-10 Thread Steve Atkins

On Tue, Dec 07, 2004 at 10:37:36AM +0200, Nicolai Tufar wrote:
 On Mon, 06 Dec 2004 19:15:23 -0500, Andrew Dunstan [EMAIL PROTECTED] wrote:
  
  problems apparently with NaNs, infinities and negative zeros.

Sure smells more like an IEEE issue than a postgresql issue (built for
IBM FP rather than IEEE? Dodgy bugs in the emulator?).

 I am the owner of this baby. It is not an actual mainframe,
 it is Hercules emulator running on AthlonXP 2500+ producing
 20-25 MIPS of pure mainframe horsepower! On bootup Linux
 reports 130 BogoMIPS and OS/390 reports 40 MIPS. We all
 know that mainframe MIPS != mini MIPS. Anyways, a full
 compile of PostgreSQL source takes about 300 minutes.
 I run it overnight.
 
 Hercules is a tested and proved emulator but I have a vague doubt
 that it's IEEE FPU emulation may be slightly different from the 
 real iron's one. Are there test suites to check if a linux's 
 math functions confirm to the standards. I may need to run one 
 of them first.

IIRC older S390s don't do IEEE FP at all, and it's done via traps to
the OS. Are you emulating the S390 IEEE FP hardware option or are you
emulating non-IEEE hardware, with IEEE fixup in OS390?

Paranoia by Dr Kahan is the classic IEEE compliance tester
http://www.netlib.org/paranoia/ but I've also had a lot of
success with the softlib based tester at
http://www.jhauser.us/arithmetic/TestFloat.html.

Cheers,
  Steve

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] embedded postgresql

2004-10-19 Thread Steve Atkins

On Thu, Oct 14, 2004 at 09:49:47AM +0200, [EMAIL PROTECTED] wrote:
 Dear Sirs,
 
 I would like to know if there are any discussions about
 creating an embedded version on postgresql. My thoughts
 go towards building/porting a sqlite equivalent of pg.

The discussion comes up occasionally. After some well-reasoned
analysis the conclusion is generally that an embedded postgresql would
keep either none of the advantages of postgresql or would lose most of
the (single-user specific) advantages of an embedded database. Or
both.

In other words, postgresql is a really bad place to start if you want
to make an embedded database. If you search the list archives you
should find the last time this was discussed.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] fsync and hardware write cache

2004-08-23 Thread Steve Atkins

On Mon, Aug 23, 2004 at 10:19:20PM +0200, Manfred Spraul wrote:

 Does it make sense to add a platform specific call that will flush a write
 cache when fsync is enable?
 
 Pete Zaitsev from mysql wrote that there is a special call on Mac OS:
 Quoting him:
 
 Mac OS X also has this optimization, but at least it provides an
 alternative flush method for Database Servers:
 
 fcntl(fd, F_FULLFSYNC, NULL)
 
 can be used instead of fsync() to get true fsync() behavior. 
 
 
 I couldn't confirm this with a quick google search - perhaps someone 
 with MacOS docs (or mysql sources) should check it.

I can confirm it exists.

#define F_FULLFSYNC  51 /* fsync + ask the drive to flush to the media */

 What might be useful is a test tool that benchmarks fsync: if it's 
 faster than the rotational speed of a 15k rpm disk then probably someone 
 caches the write calls.

I played with doing that - and can't find any system where a naive
looped write, fsync, write, fsync took more that about 600us, so I
guess I'm missing something somewhere.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Version Numbering -- The great debate

2004-07-31 Thread Steve Atkins

On Sun, Aug 01, 2004 at 12:20:59PM +0800, Christopher Kings-Lynne wrote:
 This is more features worth mentioning than we've ever had in a single 
 release before -- and if you consider several add-ons which have been 
 implemented/improved at the same time (Slony, PL/Java, etc.) it's even 
 more momentous.   If this isn't 8.0, then what will be?   
 
 
 I tend to agree, and was about to bring up the point myself.
 
 I'm in favour of 8.0.  There's a time to be humble and a time for hard 
 work to be properly recognised.

We deploy postgresql as part of an application that goes out to big,
IT-savvy corporations. So far we've shipped 7.2.* and 7.4.* (the
upgrade pain to 7.3 outweighed the benefits, so we put that off
and put it off until 7.4 was available).

8.0.0 suggests, to my customers at least, a brand new release with
either massive re-architecting, many new features or both and that's
likely to be riddled with bugs. While it would be unlikely that we'd
ship 7.5.0 to customers (I suspect there'd be a .1 release before we
were comfortable with the .0 release, given the massive changes)
there's not a chance we'd ship 8.0.0 - even though it's the identical
codebase - because of that perception. Probably not 8.0.1 either.

From the discussions I've seen and the number and size of changes I've
seen go into the codebase recently I suspect 8.0.0 might be quite an
appropriate version number on several levels. There have been a lot of
major changes in this release, some significant enough, I think,
anyway, to justify a bump in major version number.

Those major changes touch the code everywhere (especially nested
transactions - where the breadth of the changes scares me) and are
likely to lead to a number of obscure bugs that will be problematic
and will probably survive through an extended beta period. People are
likely to expect more bugs in a .0.0 release - but that also means
they're likely to be much more tolerant of them (nice functionality,
but some problematic bugs - but it's a .0.0 release, so we expect some
bugs, but we also expect the .0.2 or .1.0 release to be _really_
good.).

So, from a managing customer expectations POV, 8.0.0 looks an
appropriate version number for at least two major reasons. I'd rather
end-users treat this release as a development/preview release, forgive
the bugs and take a minor release or two before expecting the level of
stability _we_ expect from postgresql - and I suspect that may be
doubly appropriate for the large base of potential win32 users.

Just a perspective from a company that uses and redistributes
PostgreSQL to end-users.

Cheers,
  Steve


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] localhost redux

2004-07-20 Thread Steve Atkins

On Tue, Jul 20, 2004 at 08:32:00AM -0400, Tom Lane wrote:
 Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes:
  I think you should first trace down what the problem really is --- is
  your system just misconfigured or is there some fundamental issue that
  we really ought to answer to?
 
  The trouble on AIX is, that getaddrinfo only does a nameservice lookup,
  no /etc/hosts.
 
 Wouldn't that break many other programs besides Postgres?  But in any
 case if your nameserver doesn't resolve localhost, I'd say you'd better
 fix your nameserver.  See eg, http://www.faqs.org/rfcs/rfc2606.html
 
   The .localhost TLD has traditionally been statically defined in
   host DNS implementations as having an A record pointing to the
   loop back IP address and is reserved for such use.  Any other use
   would conflict with widely deployed code which assumes this use.

There is no requirement for a recursive resolver (local nameserver) to
have localhost hardwired to return 127.0.0.1. It's not even clearly a
good idea in all cases. If, though, the resolver returns anything for
localhost it must be 127.0.0.1 (or, presumably, ::127.0.0.1).

While many recursive resolvers do have localhost hardwired (to avoid
this sort of issue) relying on it may be overly brave.

It would be nice if the OP could check their recursive resolver to see
what it is returning for a query for localhost ('dig @nameserver localhost', 
where nameserver is the nameserver listed in /etc/resolv.conf), just to
check whether the DNS vs non-DNS localhost thing is an issue at all. Also
whether they have /etc/netsvc.conf (IIRC) configured to use /etc/hosts
or not.

getaddrinfo() has had strange and unusual bugs on some older versions of
AIX, anyway, so it may not be a DNS issue at all.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Accelerating aggregates

2004-06-11 Thread Steve Atkins

Stop me if you've heard this before.


I'm looking at fast calculation of aggregates (sum(), max(), count())
across large tables, or across fairly simply defined subsets of those
tables.

Lets say that, for a given aggregate function on a given table (with a
given where clause, perhaps), each postgres process maintains a state
variable (stype, in aggregate terms) and there's a also a single state
variable available to all backends via shared memory.

Each time a transaction starts the process initialises its local state
variable to the initcond of the aggregate. Each time a row is inserted
into the table the local state variable is updated, using the
aggregate update function. Each time a row is removed then the local
state variable is either updated, or invalidated, using a
reverse-update function, again specific to the aggregate.

If the transaction is rolled back, the local state variable is
thrown away. If the transaction is commited and the local state
variable has been invalidated then the global state variable is
invalidated, otherwise the global state variable is updated using
a a state merge function, specific to the aggregate.

Then, at any point, the correct value of the aggregate function,
taking into account transactions that commited before the current
transaction started can be found by merging the global state variable
and the local one.

If the global state variable isn't valid (because it's been explicitly
invalidated, or the system has just started up or the caching of the
value was just enabled) then it can be calculated by running the
real aggregate function against the table.

By defining the right functions this approach could significantly
accelerate many aggregate queries, including count(), min(), max(),
avg() and also simple 'count() where something' or 'sum() where
something' calculations.

count()
  update():  local_stype = local_stype+1
  reverse_update():  local_stype = local_stype-1
  merge():   global_stype = global_stype + local_stype

max()
  update(X): local_stype = max(local_stype, X)
  reverse_update(X): if X = local_stype then invalidate
 else nothing
  merge()global_stype = max(global_stype, local_stype)

This would be fairly cheap on updates, no worse than a fairly cheap
trigger, and either very cheap to read or at worst no more expensive
than calculating the aggregate fully.

It's a fairly nasty kludge, and probably a bad idea, but do you think
it's feasible (ignoring the whole nested transaction issue, for now)?

Cheers,
  Steve

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Accelerating aggregates

2004-06-11 Thread Steve Atkins

On Fri, Jun 11, 2004 at 09:27:07AM +0100, Richard Huxton wrote:

 If the transaction is rolled back, the local state variable is
 thrown away. If the transaction is commited and the local state
 variable has been invalidated then the global state variable is
 invalidated, otherwise the global state variable is updated using
 a a state merge function, specific to the aggregate.
 
 Isn't this going to have visibility issues wrt other backends? How do I 
 know what transactions have updated the global and what haven't and 
 which I should currently be seeing?

The global is only updated at transaction commit.

So, if you take a local snapshot of the global at the beginning of
your transaction then the visible changes at any point are those from
transactions that commited before your transaction started. That's
well-defined, at least, and appears to be pretty much the same as the
standard read commited isolation level.

 I'm not sure that there is a solution simpler than the insert +1/-1 
 into summary table that gets discussed.

That's fairly slow and painful.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Accelerating aggregates

2004-06-11 Thread Steve Atkins

On Fri, Jun 11, 2004 at 12:17:57PM -0400, Greg Stark wrote:
 Steve Atkins [EMAIL PROTECTED] writes:
 
  So, if you take a local snapshot of the global at the beginning of
  your transaction then the visible changes at any point are those from
  transactions that commited before your transaction started. That's
  well-defined, at least, and appears to be pretty much the same as the
  standard read commited isolation level.
 
 no, read committed would see any other updates that have been committed since
 the start of your transaction. 

Uhm... only updates within the current transaction. So if you merge the
global state and the local state that's exactly what you'll see.
 
 For some linear aggregates you could start with the initcond, apply all the
 local updates and whenever you have to read the actual value then use the
 global variable at that time. But not all aggregates can be handled that way.
 I think all the standard ones could be though, sum(), count(), stddev(), etc.

I think all the standard ones can (anything with an associative update
function, if I remember my math correctly). And my thought was not
that this would be a neato transparent optimization that the parser
would use directly in all cases, rather that it would be a hack
explicitly setup by the DBA for those specific cases where they need
it.

Cheers,
  Steve

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Accelerating aggregates

2004-06-11 Thread Steve Atkins

On Fri, Jun 11, 2004 at 01:49:18PM -0400, Tom Lane wrote:
 Steve Atkins [EMAIL PROTECTED] writes:
  Uhm... only updates within the current transaction. So if you merge the
  global state and the local state that's exactly what you'll see.
 
 The only way this would work is if at every SetQuerySnapshot() you copy
 *all* of the global variables as part of the snapshot.  You'd have to
 copy them all since you don't know which ones you'll need for the next
 query.  To avoid race conditions, you'd need to lock out transaction
 commits while you are doing this copying.

Yup, though that's going to be acquire lock, memcpy, release lock and
there's unlikely to be more than a few hundred bytes of state.

 I think there are also race conditions involved in transaction commit,
 since there's no way to make the update of the global state be atomic
 with the actual transaction commit ... unless perhaps you want to hold
 a lock on the global state area while committing.

Yeah, that's the implementation detail that's going to really kill the
idea in most cases.

 All in all, I think the overhead of this scheme would be enormous.  It
 implies significant costs during every transaction start and commit,
 whether or not that transaction is getting any benefit.

I think you're right, but it was interesting to consider briefly.

Cheers,
  Steve


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Email data type

2004-05-17 Thread Steve Atkins

On Mon, May 17, 2004 at 05:01:36PM +0200, Gaetano Mendola wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Bruno Wolff III wrote:
 
 | On Sun, May 16, 2004 at 04:36:55 +0200,
 |   Gaetano Mendola [EMAIL PROTECTED] wrote:
 |
 |The type is indexable and provide also conversion methods:
 |
 |text -- email
 |and the operator , is possible use it in select like:
 |
 |
 | When you are converting between text and email data, what format are the
 | text version of the address going to be? For example you might be using
 | rfc2821 encoding, rfc2822 encoding or concatenating the local part,
 | an @ sign and the domain name. Don't forget about domain literals.
 
 Actually I use local_part@domain_name
 
 Also the validator will validate emails in this form, if you are thinking to
 validate emails as:
 
 Gaetano M. Public(junior)gmendola@(new account)bigfoot.com
 
 that are perfectly valid I think that it's a valid option to consider.
 
 About the domain literals, I think to validate it in the near future,
 rejecting private subnet according to this list:
 
 10.0.0.0 - 10.255.255.255
 172.16.0.0 - 172.31.255.255
 192.168.0.0 - 192.168.255.255
 169.254.0.0 -169.254.255.255

Bad idea. If I'm testing I'm likely to be testing with addresses on my
local network. My local network is in 10/8. The middleware or the
application should be making those decisions. But if you are going to
filter on IP space, also consider class D  E space.

Also, [EMAIL PROTECTED] is a syntactically valid email address, in the .13
TLD. It does not deliver to 10.11.12.13, or anywhere else, as of
today, unless the MTA or local recursive resolver is broken (a common
case). [EMAIL PROTECTED] is a whole other thing. As is [EMAIL PROTECTED]::10.11.12.13]
and various other IPv6 variants.

[EMAIL PROTECTED] is syntactically valid. [EMAIL PROTECTED] is syntactically valid, but
should be immediately rejected. [EMAIL PROTECTED] is valid, even if known to
be non-deliverable.

 I think I have to discard also the addresses with last octet equal to 256.

Or equal to 4872014, come to that. Any reason you're looking at 256
in particular? If you mean .255 (or .0) then don't fall into that trap -
there are perfectly valid, routable addresses ending in both .0 and
.255, despite what some folks in Redmond would have you believe.

 Any comments ?

Parsing email addresses is a significant part of my day job, and email
address validation is a lot harder than it looks at first sight.

Don't forget quoting, whitespace, escaping and nesting parenthetical
comments. Also, remember that [EMAIL PROTECTED] and [EMAIL PROTECTED] are different 
email
addresses, while [EMAIL PROTECTED] and [EMAIL PROTECTED] are the same email
address. [EMAIL PROTECTED] and [EMAIL PROTECTED] are the same email
address. [EMAIL PROTECTED] and [EMAIL PROTECTED] may be the same address or
different email addresses, depending on which religious faction you
belong to.

In some contexts the empty string is a valid email address. In some
contexts Postmaster is a valid email address.

Also, one persons definition of a valid email address will be very
different from another persons definition of such. Many of those
definitions require some DNS resolution to make the decision.

I'm not entirely convinced that an email address is a simple and
well-defined enough datatype to handle comprehensively within the
DB. The validation decisions are complex and vary from application to
application.

(I use two text columns - localpart and domainpart, with an index
 on reverse(lower(domainpart)) and leave validation to the application,
 myself).

Cheers,
  Steve


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Email data type

2004-05-17 Thread Steve Atkins

On Mon, May 17, 2004 at 05:01:05PM +0100, Dave Page wrote:

  [EMAIL PROTECTED] is syntactically valid. [EMAIL PROTECTED] is 
  syntactically valid, but should be immediately rejected. 
 
 I disagree - just because the database server cannot verify the the
 existence of a domain does not mean that it isn't valid. I think only
 addresses that are syntactically incorrect should be rejected.

I think you're missing the point. Go and take a glance at RFC 2606 -
it's a BCP which defines the .test, .invalid, .example and .localhost
TLDs.

Of course, some users may want to use .invalid email addresses in
their database, despite their being, by definition, invalid. It's
something that will vary depending on their needs.

Cheers,
  Steve


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Email data type

2004-05-17 Thread Steve Atkins

On Mon, May 17, 2004 at 09:21:54AM -0700, Steve Crawford wrote:

 Along those lines [EMAIL PROTECTED] and [EMAIL PROTECTED] are valid but they 
 don't necessarily refer to the same mailbox (depends on the mx for 
 foo.bar.com).

I don't believe the latter is actually valid, as it has to be an
address literal inside the square brackets. Until that's extended
by RFC that means IPv4 or IPv6 literals.

It's another example of this is neither trivial nor well-defined,
though.

  I'm not entirely convinced that an email address is a simple and
  well-defined enough datatype to handle comprehensively within the
  DB. The validation decisions are complex and vary from application
  to application.
 
  (I use two text columns - localpart and domainpart, with an index
   on reverse(lower(domainpart)) and leave validation to the
  application, myself).
 
 Indeed. A problem with the email address datatype is that it hinders 
 normalization:

Yup. That's a bigger concern than the overall vagueness of the problem.

 Joe User [EMAIL PROTECTED] is valid but a database designer would 
 probably prefer columns for name and email, or if the addresses were 
 all people, firstname, middlename, lastname, email.
 
 As you mentioned, the email can be broken into localpart and 
 domainpart but if the app requires it, the domainpart could be 
 further rendered into toplevel (so you could find all the .gov or 
 .edu), secondlevel (at least corresponds to a registrant) and 
 subdomain(s) as necessary.

That's why I index the domainpart on reverse(lower()) - that way I
can say WHERE reverse(lower(domainpart)) LIKE reverse('%.com')

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] PostgreSQL pre-fork speedup

2004-05-05 Thread Steve Atkins

On Wed, May 05, 2004 at 02:12:58PM -0400, Rod Taylor wrote:
 
 Most of it has been. It's the duty cycle. As stated in another email,
 only about 20% of the work a script does is database related -- which
 occurs all at one time. Even when all Apache backends are active, a
 large number of connections will be idle but were used or will be used
 at some point during the generation of that page.
 
 It really is an Apache fault -- but I don't think it can be fixed within Apache 
 itself.

http://apache.webthing.com/

  mod_pg_pool or mod_valet_sql - Apache modules to handle postgresql
  connection pools

http://sqlrelay.sourceforge.net/
http://dbbalancer.sourceforge.net/

  Database connection pooling software

And, of course, most development environments (perl, php, java etc)
have their own language specific connection pooling solutions.

Cheers,
  Steve
  

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Socket communication for contrib

2004-04-16 Thread Steve Atkins

On Fri, Apr 16, 2004 at 08:10:20AM -0400, Andrew Sullivan wrote:
 On Mon, Apr 05, 2004 at 06:33:51PM +0200, Hans-J?rgen Sch?nig wrote:
  
  I have learned (please correct me if I am wrong) that people tend to 
  look in contrib before they look at gborg.
 
 This may be true, but if so, perhaps it's a reason to add a
 contrib/gborg directory with just a README that says For lots of
 additional software which is designed to be part of your PostgreSQL
 installation, go to http://gborg.postgresql.org.

Or even a text dump of http://gborg.postgresql.org/project/projdisplaylist.php
(which would be more useful if all gborg projects had useful descriptions -
 but that would directly benefit users of gborg too)

Cheers,
  Steve

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] PostgreSQL configuration

2004-04-11 Thread Steve Atkins

On Sat, Apr 10, 2004 at 03:53:49PM -0400, [EMAIL PROTECTED] wrote:
  The whole idea of having multiple command-line switches to pick config
  and data separately bothers me.  ISTM this would mostly create great new
  opportunities to shoot yourself in the foot (by accidentally picking the
  wrong combination), without nearly enough benefit to outweigh the risk.
 
 This is where I think we disagree. Very much so, in fact. I think having
 something like:
 
 /etc/postgres/webdb.conf
 In which there is a line:
 datadir=/RAID0/postgres
 
 and
 
 /etc/postgres/testdb.conf
 In which there is this line
 datadir=/RAID1/postgres
 
 Allows for a very standardized, and IMHO, very self documenting installation.

But not as flexible as the existing alternative.

For instance, what if webdb is PostgreSQL 7.3 and testdb is PostgreSQL
7.4?  There is no way you can put that difference in a configuration
file, so the user will still need to know which binary of postgresql
to fire up.

So, yes, let's have a standard directory for storing the configuration
for all the PostgreSQL installations on the machine.

/etc/postgres sounds fine.

In /etc/postgres/webdb:

#!/bin/sh
datadir=/RAID0/postgres
/usr/local/pgsql73/bin/postmaster -D $datadir

and in /etc/postgres/testdb

#!/bin/sh
datadir=/RAID1/postgres
/usr/local/pgsql742/bin/postmaster -D $datadir

Much more flexible and explicitly self-documenting.

For more flexibility still, do what I do and make the scripts standard
rc.d style startup scripts.

To walk a user through listing the supported installations is easy -
'ls /etc/postgres'. Starting and stopping one - '/etc/postgres/webdb start'
or '/etc/postgres/webdb stop'. Checking system status and displaying the
data directory '/etc/postgres/webdb status'.

It seems to me to be far more intuitive to the end user, and to the
typical admin than your -C suggestion, it's certainly safer, and it
works fine now.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] [HACKERS] fsync method checking

2004-03-26 Thread Steve Atkins

On Fri, Mar 26, 2004 at 07:25:53AM +0100, Manfred Spraul wrote:

 Compare file sync methods with one 8k write:
(o_dsync unavailable)  
open o_sync, write   6.270724
write, fdatasync13.275225
write, fsync,   13.359847
  
 
 Odd. Which filesystem, which kernel? It seems fdatasync is broken and 
 syncs the inode, too.

This may be relevant.

From the man page for fdatasync on a moderately recent RedHat installation:

  BUGS
   Currently (Linux 2.2) fdatasync is equivalent to fsync.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Mixing threaded and non-threaded

2004-01-30 Thread Steve Atkins

On Fri, Jan 30, 2004 at 11:10:49AM -0600, Scott Lamb wrote:
 On Jan 30, 2004, at 3:18 AM, Bruce Momjian wrote:
 Manfred Spraul wrote:
 Bruce Momjian wrote:
 
 Woh, as far as I know, any application should run fine with 
 -lpthread,
 threaded or not.  What OS are you on?  This is the first I have 
 heard of
 this problem.
 
 
 Perhaps we should try to figure out how other packages handle
 multithreaded/singlethreaded libraries? I'm looking at openssl right
 now, and openssl never links against libpthread: The caller is
 responsible for registering the locking primitives.

I don't think changing the linking approach is a good thing. But a
mention in the documentation might be.
 
 We perhaps don't need -lpthread for creating libpq, but only for ecpg.
 However, now that we have used thread locking for SIGPIPE, we are now
 calling pthread from libpq, but only 7.5.
 
 However, I still don't understand why the user is seeing a problem and
 what rewrite he thinks is necessary for his application because pthread
 is linked in.

I suspect the rewrite needed is to avoid doing Bad Things in the signal
handler.

 I'm 99% certain that any application will work with -lpthread on RedHat 
 Linux. And 95% certain that's true on _any_ platform. There's no 
 pthread_init() or anything; the distinction he was describing between a 
 non-threaded application and a threaded application with only one 
 thread doesn't exist as far as I know.

That may be true for any correctly written application, but it's
certainly not true for any application. The distinction is, at the
very least, that some system calls are wrapped with mutexes.

 And he mentioned that the deadlocks are occurring in a SIGCHLD handler. 
 Since so few functions are async signal-safe (I doubt anything in libpq 
 is), the code in question was broken before; the extra locking is just 
 making it more obvious.

I tend to agree. However, while it may have been broken before, it
worked flawlessly in multiple production environments on several
different operating systems for several years when not linked with
pthread.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Mixing threaded and non-threaded

2004-01-27 Thread Steve Atkins

On Fri, Jan 23, 2004 at 10:03:30PM -0500, Bruce Momjian wrote:
 Steve Atkins wrote:
  When I rebuilt libpq to use threads, I started seeing a bunch of weird
  failures in many of the older applications. The change in libpq meant
  that libpthread was being dynamically linked into the non-thread-aware
  applications, leading to some mutex deadlocks in their signal
  handlers, hanging those applications.
  
  There doesn't seem to be any tidy way to build and use both threaded
  and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks
  aren't really viable for distributed code). Is there something I'm
  missing?
 
 No, there is not.  We could compile two versions, and have you specify
 the threaded version only when you want it, but only some operating
 systems have that distinction, so then we would have to identical
 libraries on some platforms, and different ones on others, and that
 seemed pretty confusing.  Of course, we can always revisit this.
 
  (If it's relevant, the OS in question is RedHat Linux, but I'm
   maintaining the same suite of apps on several other architectures.)
 
 This is interesting.  I had not considered that libpq's calls to
 libpthread would cause problems.  In fact, libpq shouldn't be doing
 anything special with pthread except for a few calls used in
 port/thread.c. 

Yes, libpqs use of actual use of pthread seems pretty harmless.

 However, the issue we always were worried about was that
 linking against libpthread would cause some unexpected thread calls in
 the application, and it looks like that is exactly what you are seeing. 
 In fact, it sounds like it is the calls to allow synchronous signals to
 be delivered to the thread that generated them that might be the new
 change you are seeing.

Exactly that, yes.

 My guess is that creating applications against the non-thread libpq and
 then replacing it with a threaded libpq is your problem. 

Yes. It seems to make no difference whether the application is rebuilt
or not. It's pulling libpthread into a non-thread-aware application
that's the problem.

The only fix that would allow the non-threaded application to work
with a thread-safe libpq would be to rewrite it to be a threaded
application with a single active thread.

 I guess the
 question is whether you would like to have two libpq's and have to
 decide at link time if you wanted threading, or just have one libpq and
 make sure you recompile if you change the threading behavior of the
 library.  We considered the later to be clearer.

Recompiling doesn't neccesarily help unless the application is also
rewritten. Also, if there are dozens of non-threaded applications
using libpq on a system (possibly installed via rpms or equivalent)
then replacing the system libpq could break something else.

For now I'm just building and distributing two different libpqs and
choosing between them with rpath hacks (yes, renaming one of them
might be easier, but I'm specifying rpath explicitly anyway for other
reasons). That seems to be working just fine for me.

If there are multiple applications on the system using PostgreSQL we
really don't want to break some of them if libpq is rebuilt to support
a new one. Probably worth a mention in the documentation at least.

Cheers,
  Steve


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Mixing threaded and non-threaded

2004-01-27 Thread Steve Atkins

On Tue, Jan 27, 2004 at 02:07:44PM -0500, Bruce Momjian wrote:
 Steve Atkins wrote:
   My guess is that creating applications against the non-thread libpq and
   then replacing it with a threaded libpq is your problem. 
  
  Yes. It seems to make no difference whether the application is rebuilt
  or not. It's pulling libpthread into a non-thread-aware application
  that's the problem.
  
  The only fix that would allow the non-threaded application to work
  with a thread-safe libpq would be to rewrite it to be a threaded
  application with a single active thread.
 
 
 Woh, as far as I know, any application should run fine with -lpthread,
 threaded or not.  What OS are you on?  This is the first I have heard of
 this problem.

Linux/i386, RedHat 7.something, gcc 2.96. Not my favorite
configuration, but nothing particularly odd.

   I guess the
   question is whether you would like to have two libpq's and have to
   decide at link time if you wanted threading, or just have one libpq and
   make sure you recompile if you change the threading behavior of the
   library.  We considered the later to be clearer.
  
  Recompiling doesn't neccesarily help unless the application is also
  rewritten. Also, if there are dozens of non-threaded applications
  using libpq on a system (possibly installed via rpms or equivalent)
  then replacing the system libpq could break something else.
 
 Why?  How would you rewrite it?

No idea. I've not looked at exactly what's going on, yet.

It's perfectly possible that the problem I'm seeing is actually a bug
in the underlying code - but it's been used in heavy production use
for two years without pthread, and deadlocked immediately when built
with pthread, so it's the sort of bug that could be elsewhere.

It's a very complex application, so I'd really need to reduce it to
a test case to narrow it down.

A hint, though, might be that it's a multiprocess application with a
single master process that controls dozens of child processes. When the
master shuts down it asks all the children to shut down, and then it
deadlocks in the SIGCHILD handler.

I'll burrow a bit deeper when I get some time.

Cheers,
  Steve

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

85 matches

Mail list logo