from:"Bill"

Re: [GENERAL] advocacy: drupal and PostgreSQL

2008-01-17 Thread Bill Moran

In response to Tom Hart [EMAIL PROTECTED]:

 Joshua D. Drake wrote:
  Robert Treat wrote:
 
  There's been a big move in the php community to push people towards 
  php5 (one of which was EOL of php4), which has started to pay off.  
  I'd guess that if they wanted to, they could switch to PDO with 
  Drupal 7 and not hurt themselves too much.
 
  When I spoke with Dries about this issue one of the big hold backs 
  wasn't PHP 4 but actually MySQL 3. When Drupal 6, MySQL 3 is not 
  longer supported. So they can actually do some nicer stuff (like 
  foreign keys) etc..
 
  I am sure that with PHP5 things will improve as well.
 
  Sincerely,
 
  Joshua D. Drake

 Let me just sneak in a quick rant here, from somebody who really doesn't 
 matter.
 
 We run drupal for our corporate intranet (currently being built) and we 
 use postgreSQL as the backend. Some of the modules and things don't work 
 perfectly, but drupal supported it and that made me happy enough to work 
 with it. Now after reading this garbage, I'm extremely disappointed. 
 Completely dropping postgresql capability might not affect them too 
 largely in the huge run, because a large amount of their user base is 
 using mySQL, but it would send a message to those of us that believe in 
 choice. I'm afraid that they're choosing the route of convenience over 
 their users, and every time I think about it I want to go looking for 
 replacements.

I run my personal site on Drupal+PostgreSQL.  If Drupal drops PG support,
I'll switch the front-end.  I'm not switching the back end.

I'm also planning a small enterprise that I was originally considering
using Drupal for.  I'm now more seriously considering Bricolage.

However, read on ...

 It'd be easier to build drupal to only run on mySQL, but then again it'd 
 be easy to build postgreSQL to only run on linux and forget about the 
 windows users. I know it's not their duty to make drupal work with 
 postgresql, but if they drop it like they're talking about, I'll be 
 making a push here and to everyone I know who uses drupal to switch to 
 another system, whether they're running postgres or not. If drupal 6 
 absolutely doesn't support postgres, then I'm dropping my drupal 5 
 install on the spot. This is a cold move drupal, and you should be ashamed.

I made a post on the drupal-devel list to this effect.  I got chewed out
for flaming Karoly ... who's obviously some big Drupal code guru.

Frankly, every time this topic comes up, it's initiated by Karoly, and
I've lost patience with the crap, so I unsubscribed.

If I can get my life back in order, I'll re-subscribe some time in Feb,
and hopefully start to do something productive, like contribute testing
and patches.

 Sorry, I'll end the rant here.
 
 BTW, I'm a PHP developer who uses postgreSQL almost exclusively and I'm 
 on this list as well as other postgres lists constantly (even if as a 
 reader most of the time). If they have this big of an issue, why not ask 
 for help?

If you read through the thread, it's just Karoly and a few other minor
players in the Drupal community.  Many people have stepped up and said,
I _do_ test on PostgreSQL, so what are you complaining about?

As best I can tell, Karoly writes patches, and when they don't work on
PostgreSQL and therefore don't get committed right away, he starts this
argument up on the Drupal lists yet again.  The guy is a whiner who has
a personal axe to grind and seems unable to accept that Drupal wants to
run on more than just MySQL.  If he loves MySQL so much, he should join
a project that only supports MySQL and leave the Drupal people to their
work.  There's a LOT of effort in the Drupal community to build code
abstractions that will make the system database-agnostic, and Karoly's
constant whining is simply counterproductive.

To a large degree, I think Karoly has blown the situation out of
proportion.  Look at how it affects _this_ list every time he starts
bitching, for example.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] advocacy: drupal and PostgreSQL

2008-01-17 Thread Bill Moran

In response to Tom Hart [EMAIL PROTECTED]:
 Bill Moran wrote:

[snip]

  To a large degree, I think Karoly has blown the situation out of
  proportion.  Look at how it affects _this_ list every time he starts
  bitching, for example.
 

 Is it just Karoly (chx) who has all these things to say about pg? He's 
 just one person on the drupal team. Has anybody else in the core team 
 spoken out on this subject?

Last time this came up (which was in Dec, I believe) a few other core
members jumped in eventually and said, No, we're keeping PG.  The
goal of Drupal is to be database agnostic, so dropping PG is counter-
productive.  To which Karoly responded that he didn't want to _drop_
PG support, he just wanted to drop PG support ... or something equally
nonsensical.

The guy sets my jerk alarms ringing like a 5 alarm fire.  He doesn't
play well with others, he constantly starts fights, and he threatens
to take his ball and go home every time he loses.  I don't care how
much code he writes, I don't think he's worth the headache.

 Let's keep in mind as well that this doesn't only affect pg users but 
 any other database as well that drupal supports or plans on supporting. 
 Drupal is pretty popular, and I expect there are a number of 
 organizations that don't fit in their mold of the ideal drupal user.

As I said, I get the impression that most of the Drupal developers get
this, and they have mentioned more than once that Drupal's design goal
is to be database-agnostic.  It just seems to be Karoly and a few people
here and there that he's able to incite into riot.

 I'd almost consider trying to take drupal and create a derivative 
 product and build in the pg and oracle and mssql, etc. support myself, 
 but if the drupal team really pulls a messed up move like this, I really 
 don't want to have anything to do with them anymore. It's not that I'm 
 that huge of a pg nut (I used mySQL for a while myself), but any team 
 that can turn it's back on that many of it's users to make their lives a 
 little easier isn't in it for the right reasons (the advancement of 
 technology, computing as a science, etc.). I am literally astonished 
 that they would even consider telling even 1% of their users Take off, 
 you're too much work. How many drupal+postgres users are large 
 corporations, or regular donators? What about code contributors? How 
 many people are they looking at pissing off with a move like this?

I'm upset with the community.  The other core members need to stand up
to Karoly and say, You opinions are not those of the community, and
we'll ban you from the lists if you continue to start this fight over
and over again.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Sun acquires MySQL

2008-01-16 Thread Bill Moran

In response to dvanatta [EMAIL PROTECTED]:

 
 What's up with 3 of the 7 being from Pennsylvania?  What's the connection?

Well, as everyone knows, Pennsylvania is a haven for brilliant
people.  In fact, simply living in Pennsylvania makes you smarter.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] large table vacuum issues

2008-01-05 Thread Bill Moran

Ed L. [EMAIL PROTECTED] wrote:

 On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
  On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote:
   We need some advice on how to handle some large table
   autovacuum issues.  One of our 8.1.2
 
  First of all, update your 8.1 install to 8.1.10.  Failing to
  keep up with bug fixes is negligent.  who knows, you might be
  getting bitten by a bug that was fixed between 8.1.2 and
  8.1.10
 
 Could be.  But like you said, who knows.  In some environments, 
 downtime for upgrading costs money (and more), too, sometimes 
 even enough to make it negligent to take downtime to keep up 
 with bug fixes (and of course, the new bugs) which may or may 
 not be a factor at hand.

Upgrades along the 8.1.x branch take something on the order of
5 minutes (if you're meticulous and serialize the process).

If you haven't set yourself up so you can schedule 5 minutes of
downtime once a month or so, then the negligence occurred much
earlier than at the failure to upgrade.

 While the time required to restart a 
 DB may be neglible, there are often upstream/downstream 
 dependencies that greatly expand the actual downtime for the 
 customer.

Like what?  The point to the double-dot branch is that upgrades
don't affect dependencies.

 How much would downtime need to cost before you 
 thought it negligent to upgrade immediately?  It's a tradeoff, 
 not well-supported by simple pronouncements, one the customer 
 and provider are best qualified to make.

Not really.  Unscheduled downtime is _always_ more expensive than
scheduled downtime.  Scheduled downtime isn't going to put you in
breach of contract if you've got an uptime guarantee.

If you're really in a situation where you need 100% uptime, then
you're still negligent for not having something like Slony to allow
you to switch production to another server so you can alternate
maintenance between the two.

This is something along the RAID 5 argument, no matter how you argue
it, it's a bad idea.  If you claim you can't afford to buy more hardware,
then you made a mistake in pricing out your product to your client.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] [OT] Slony (initial) Replication - Slow

2008-01-03 Thread Bill Moran

In response to Ow Mun Heng [EMAIL PROTECTED]:
 
 I'm just wetting my hands with slony and during the setup of the slave,
 I did and dump and restore of the master DB to the Slave DB.
 
 However during the startup of slony, I noticed that it issues a truncate
 command to the (to be) replicated table. Hence, this means that there's
 no such need for me to do a dump/restore in the 1st place.
 
 can someone confirm this?

Confirmed.  It's how Slony is designed to work.

 It _is_ taking long time (for slony) to do the
 \copy (~60GB in multiple tables being replicated, including (on the fly)
 index creation)

1) It only needs to be done once
2) You can remove the indexes from the replica and add them back in after
   the initial sync is complete.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Is there PHP mysql_real_escape_string for postgresql?

2007-12-20 Thread Bill Moran

In response to Erik Jones [EMAIL PROTECTED]:

 
 On Dec 17, 2007, at 8:37 PM, [EMAIL PROTECTED] wrote:
 
  In php is there a postgresql version of mysql_real_escape_string() ?
 
 You have both pg_escape_string and pg_escape_bytea available.

Is there a mysql_fake_escape_string()?  Should PostgreSQL have a
pg_pretend_to_escape_string() that effectively does nothing?

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] multiple version installation in the same machine ????

2007-12-18 Thread Bill Moran

In response to Josh Harrison [EMAIL PROTECTED]:

 Hi
 I have a postgres version 7.4 and version 8.3 installed in my system.
  7.4 uses port 5432 and 8.3 uses port 5433.
 I started 7.4 and the database is running fine. Now i started the database
 server in version 8.3
 and it started fine.
 
 pg_ctl -D /export/home/josh/postgres8.3/pgsql/data start -l log8.3.log
 server starting
 -sh-3.00$ pg_ctl status
 pg_ctl: server is running (PID: 4408)
 /usr4/postgres8.3/bin/postgres  -D  /export/home/josh/postgres8.3/pgsql/data
 
 But when I type psql -l   I get this error
 -sh-3.00$ psql -l
 psql: FATAL:  database postgres does not exist
 
 why? Is it not possible to have multiple version installations i the same
 machine(in different ports)?

PostgreSQL 7.4 doesn't install a postgres database by default.  Try
explicitly connecting to template1.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] spreadsheet interface

2007-12-17 Thread Bill Moran

In response to hjenkins [EMAIL PROTECTED]:

 Some people in my workplace are asking if there exists a GUI that will
 allow cut-and-paste of multiple cells directly from (and, preferably,
 directly to) spreadsheets. pgAdmin III, PGAccess, and TOra don't seem to.
 Any suggestions?

To add on to Thomas' comment.  You can also install OpenOffice.org with
the pgsql ODBC driver and use the OOo spreadsheet to access data directly.

I haven't done this, personally, so I can't vouch for how well it works.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] install problem

2007-12-17 Thread Bill Taylor


Hey,

I am working on testing our windows version of software
which uses postgress on a machine with XP Pro and 512
meg of memory.

The very first time I installed, it was fine except for the
fact it was only listening on the localhost 127.0.0.1
and I need it listening on its ip address.

Since, I have not been able to re-install. I have tried thoroughly
cleaning machine and registry. The issue always comes back to
a 1920 error, do you have permissions in the event viewer
and always fails to start the service.

I even wiped away my test box for our product and
installed a fresh copy of windows on top of
my old version.

zip :-)

Can someone tell me what I am doing wrong? We have an end of the year
rush and we have a number of people world wide who would be using
this once we can actually test it.

Bill

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] postgres cannot access the server configuration file

2007-12-15 Thread Bill Moran

Sebastien ARBOGAST [EMAIL PROTECTED] wrote:

 I'm trying to start postgreSQL server on my Macbook Pro. I've
 installed it using packages available here:
 http://www.kyngchaos.com/wiki/software:postgres
 But when I try to start up the server running sudo SystemStarter
 start PostgreSQL, I get the following message:
 
 postgres cannot access the server configuration file
 /usr/local/pgsql/data/postgresql.conf: No such file or directory
 
 And as a matter of fact, there is no such file in this directory. Do I
 have to create it manually?

You need to run initdb to create the directory:
http://www.postgresql.org/docs/8.3/static/app-initdb.html

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] postgres cannot access the server configuration file

2007-12-15 Thread Bill Moran

Sebastien ARBOGAST [EMAIL PROTECTED] wrote:

 2007/12/15, Bill Moran [EMAIL PROTECTED]:
  Sebastien ARBOGAST [EMAIL PROTECTED] wrote:
  
   I'm trying to start postgreSQL server on my Macbook Pro. I've
   installed it using packages available here:
   http://www.kyngchaos.com/wiki/software:postgres
   But when I try to start up the server running sudo SystemStarter
   start PostgreSQL, I get the following message:
  
   postgres cannot access the server configuration file
   /usr/local/pgsql/data/postgresql.conf: No such file or directory
  
   And as a matter of fact, there is no such file in this directory. Do I
   have to create it manually?
 
  You need to run initdb to create the directory:
  http://www.postgresql.org/docs/8.3/static/app-initdb.html

 The problem is that I need the password of the postgres user that has
 been created automatically for me. I've tried postgres but it
 doesn't seem to work. And since I can't see the user in my Preference
 Pane, I can't change his password.

Please don't top-post.  And please don't respond personally to email that
was originated on the mailing list.  I've returned
pgsql-general@postgresql.org to the list of recipients.

If you installed the software, you obviously have root access, so just change
the postgres password to something you know.

While I'm not familiar with the Mac OS installation procedure, I'd assume
that account was created without login capability (probably without a
password at all) which is good, sound security practice.  Software shouldn't
create users with known passwords.

You can also use sudo to switch to the postgres user without needing the
password for the postgres user (although it will probably ask you for the
root password again -- not entirely sure how Mac OS is set up by default)
This approach has become pretty much par for the course on modern POSIX
systems.

Try:

sudo -u postgres initdb

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Need LIMIT and ORDER BY for UPDATE

2007-12-13 Thread Bill Moran

In response to D. Dante Lorenso [EMAIL PROTECTED]:
 Bill Moran wrote:
  D. Dante Lorenso [EMAIL PROTECTED] wrote:
  All,
 
  I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE 
  commands.  Is this possible?
 
 UPDATE invoice i
 SET reserve_ts = NOW() + '1 hour'::timestamp
 FROM account a
 WHERE a.acct_id = i.acct_id
 AND i.reserve_ts  NOW()
 AND a.status = 'A'
 AND i.is_paid IS FALSE
 ORDER BY i.create_ts ASC
 LIMIT 1
 RETURNING invoice_id;
 
  This query would find JUST ONE invoice record which is not paid and 
  reserve the right to operate on the row using the 'reserve_ts' column 
  for all active accounts.  The one row would be the oldest invoice 
  matching the criteria.  Only that one row would be updated and the 
  invoice_id of the updated row (if any) would be returned.
 
  Running a query like this over and over would pop just one record off 
  the queue and would guarantee an atomic reservation.
  
  While I'm not going to argue as to whether your suggestion would be
  a good idea or not, I will suggest you look at SELECT FOR UPDATE, which
  will allow you to do what you desire.
 
UPDATE invoice
SET reserve_ts = NOW() + '1 hour'::interval
WHERE invoice_id = (
  SELECT invoice_id
  FROM invoice i, account a
  WHERE a.acct_id = i.acct_id
  AND i.reserve_ts  NOW()
  AND a.status = 'A'
  AND i.is_paid IS FALSE
  ORDER BY i.create_ts ASC
  LIMIT 1
  FOR UPDATE
)
RETURNING invoice_id;
 
 Does this do the same thing while still remaining a single atomic query 
 that will guarantee no race conditions during the inner select/update?
 
ERROR:  SELECT FOR UPDATE/SHARE is not allowed in subqueries
 
 Guess not.

BEGIN;
SELECT invoice_id
  FROM invoice i, account a
  WHERE a.acct_id = i.acct_id
  AND i.reserve_ts  NOW()
  AND a.status = 'A'
  AND i.is_paid IS FALSE
  ORDER BY i.create_ts ASC
  LIMIT 1
  FOR UPDATE;
UPDATE invoice
  SET reserve_ts = NOW() + '1 hour'::interval
  WHERE invoice_id = [previously selected value];
COMMIT;

And before you start asking a lot of won't this x or y, please read
the docs:
http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE

Then feel free to ask more questions.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Need LIMIT and ORDER BY for UPDATE

2007-12-12 Thread Bill Moran

D. Dante Lorenso [EMAIL PROTECTED] wrote:

 All,
 
 I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE 
 commands.  Is this possible?
 
UPDATE invoice i
SET reserve_ts = NOW() + '1 hour'::timestamp
FROM account a
WHERE a.acct_id = i.acct_id
AND i.reserve_ts  NOW()
AND a.status = 'A'
AND i.is_paid IS FALSE
ORDER BY i.create_ts ASC
LIMIT 1
RETURNING invoice_id;
 
 This query would find JUST ONE invoice record which is not paid and 
 reserve the right to operate on the row using the 'reserve_ts' column 
 for all active accounts.  The one row would be the oldest invoice 
 matching the criteria.  Only that one row would be updated and the 
 invoice_id of the updated row (if any) would be returned.
 
 Running a query like this over and over would pop just one record off 
 the queue and would guarantee an atomic reservation.

While I'm not going to argue as to whether your suggestion would be
a good idea or not, I will suggest you look at SELECT FOR UPDATE, which
will allow you to do what you desire.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Hijack!

2007-12-11 Thread Bill Moran

In response to Gregory Williamson [EMAIL PROTECTED]:

 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Joshua D. Drake
 Sent: Tue 12/11/2007 9:43 AM
 To: [EMAIL PROTECTED]
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Hijack!

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Tue, 11 Dec 2007 16:31:40 +
 Raymond O'Donnell [EMAIL PROTECTED] wrote:

   // 
   Please note in particular the following points of netiquette:

   * Don't top-post, as it makes for confusing reading.

   * Don't start a new thread by replying to an old one, because [insert 
   suitable technical explanation here].

   Failure to observe the above may result in your question going
   unanswered. // 

  O.k. this might be a bit snooty but frankly it is almost 2008. If you
  are still a top poster, you obviously don't care about the people's
  content that you are replying to, to have enough wits to not top post.

  However, I would also note that in windows world, it is very common
  to top post. I am constantly retraining very smart, just very ignorant
  customers.

 * Not all mail clients deal well with inline/bottom quoting (manually
 added   to lines here since my mail reader does not do so automatically
 -- imagine doing so for a complex quote!)

I recommend finding a better mail program.  There are lots out there.
Being forced to use substandard software in this day and age is a crime.

 * Top posting is very common in companies with lots of blackberry (etc)
 users since they seem to see only tops easily.

Illicit drug use is very common in many areas.  Denigration of women
simply because they are women is common in many parts of the world.

Crying everyone else is doing it is not a valid argument in my book.

 * my mail client *always* starts at the top of the message. For
 rapid/internal mails top posting works better because the answer/most
 recent is always at the top. Complex messages do deserve in-posting but
 not always easy, especially if you have to do it manually). Does your
 mail browser always start at the bottom ? I always see the top of a
 message first. Simple threads work very well this way -- complicated
 ones collapse under top-posting.

This is a tired, overused argument that has little value.

 * a lot of us have to use what ever the company provides as mail server.

Are you saying your mail server forces you to top post?  That's a new one.

 Exchange sucks but I'd rather not quit my job just because _you_ have a
 problem reading mail that does not conform to the T to your
 expectations. And there is a limit to how much time I want to spend
 manually formatting your mail to respond to it.

There's a limit to the amount of time I'm willing to spend trying to make
heads/tails of an incomprehensible email.  I think I deleted over 100
emails last week after seeing how badly formatted they were, even though
I probably had the expertise to offer helpful information.

I don't complain about people top-posting because I don't like it.  I
complain because it makes it more difficult for me to help, and thus
less likely to do so, and I know that other, knowledgeable people feel
the same way.  I complain about top-posting because I know that the
person is less likely to get helpful replies if they format their
email poorly.

 Note that a lot of postGIS mail list posts are top-posted and the
 complaint rate is vanishingly small. Yet somehow business clanks on.
 Imagine that! And I can't even use exchange/outlook -- web interface
 to Micro$soft really sucks.

Again, you're asking a community to offer you free help in spite of the
fact that your tools suck.  I'm not saying nobody will do it, all I'm
saying is that if you make it too difficult for people to help, they
won't.

 * Try to see the world from a perspective other that your own
 (admittedly superior) one ! Not everyone is so advanced.

I do see it from other perspectives.  I can still see it from the 
perspective of a Bill Moran from 10 years ago who got chewed out for
top-posting because I didn't know anything and didn't get very good
help because I didn't formulate good questions.  That's a Bill Moran
who learned _because_ people pointed out what I was doing wrong.

I'm trying to pass the favor on when I point out problems with folks
emails.  I'm not trying to be an asshole -- that happens naturally.

 * Get a life

Of course.  How about:

1) I'll stop replying to emails that are formatted too badly to understand.

2) You accept that the rules of this community are no top posting and
   stop dragging this discussion out and accept that top-posted emails
   won't be responded to.  Since nobody smart will offer advice on how
   to better format emails any more, the newbies will remain ignorant
   and never learn.  That's obviously the best thing we can do for the
   community.

Actually, I'd rather just continue to politely point out the rules of
the list to newbies and help the world become a better place

Re: [GENERAL] top posting

2007-12-11 Thread Bill Moran

In response to Collin Kidder [EMAIL PROTECTED]:

 Geoffrey wrote:
  Collin Kidder wrote:
 
  I have to suffer through dealing with people like the two of you 
  quoted above. You can deal with people who'd like to top post. 
  Anything else is just being a spoiled baby who can't deal with minor 
  issues. If all the energy spent crying about top posting were used to 
  fuel cities none of us would be paying for power right now. Sorry to 
  be so blunt but it really irritates me when people cry like 4 year 
  olds about top posting. It's not that bad, get over it.
 
  If it's not brought to the attention of the masses, then it will 
  simply grow, and it simply is not the way it's done on this list.  Get 
  use to it.  Now who's doing the 4 year old crying??
 
 Yes, I'm bitching, crying, or whatever you'd like to call it. But you 
 notice, I'm still attempting to follow the proper posting etiquette for 
 this list. However, I do not see any actual valid reason that top 
 posting cannot ever be acceptable except that some people are way too 
 stuck in a mental rut and refuse to allow for anything other than their 
 way.

This is called Trolling

Whether or not you are doing it on purpose is irrelevant.  The effect is
still the same, even if you do it accidentally.

The point has been brought up again and again and again: top posting
makes it difficult for the veterans on this list to understand and
respond to your email.  As a result, in order to get the best possible
response, DO NOT TOP POST.

Somehow, you continue to bring this back around to how we hate top-posting
and despise top-posters and whatever else it is you're saying.  I'm not
aware of _anyone_ ever being banned or anything horrible as a result of
top-posting.  The worst thing that happens is that busy people begin
ignoring the thread, and this is what me (and others) who say please don't
top-post are trying to avoid.

If you want to turn it into some personal war or something, please don't
do it on the list.  

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Transaction problem

2007-12-04 Thread Bill Moran

In response to x asasaxax [EMAIL PROTECTED]:

 Its just use a constraint then? there´s no problem id two sessions decrease
 the number, and this number goes to less then or equals as zero?
 I´m programming with php.

BEGIN;
SELECT quantity FROM products WHERE productid=[productid] FOR UPDATE;
[Check in PHP to ensure enough product exists for this purchase]
UPDATE products SET quantity=[new quantity after purchase]
WHERE productid=[productid];
[... any other table updates you need to do for this transaction ...]
COMMIT WORK;

SELECT ... FOR UPDATE will prevent other transactions from locking this
row until this transaction completes.  It guarantees that only 1
transaction can modify a particular row at a time.  See the docs for
more details:
http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE
http://www.postgresql.org/docs/8.1/static/explicit-locking.html

 2007/12/3, Cesar Alvarez [EMAIL PROTECTED]:
 
  What are you programing with?.
  are you using npgsql?
 
  Regards Cesar Alvarez.
   Hi everyone,
  
  I would like to know how can i do a simple transaction for this
   situation:
  
   I have n products in certain row of a table. When the user buys a
   product,  the quantity of this product will be decreased. The user can
   only buy a product that has a quantity n  0. This means that when the
   user send the product confirmation to the system, the bd will decrease
   the product quantity with a transaction if the number of product in
   stock is greater than zero.
  
  
   Did anyone knows how can i do that with postgre?
  
   Thanks a lot.
 
 
 
 


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] PostgreSQL Beta4 released

2007-12-04 Thread Bill Moran

In response to Joshua D. Drake [EMAIL PROTECTED]:
 
 Thanks to all the testing, feedback and bug reports the community has
 performed with the current betas, we now have our fourth beta
 of 8.4.

I assume you meant 8.3.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Avoid huge perfomance loss on string concatenation

2007-12-04 Thread Bill Moran

-12-04'
 and dok.kuupaev||dok.kellaaeg BETWEEN '2007-11-01' AND '2007-12-0423 59'
 
 so it runs fast ?
 
 Andrus.
 
 
 PostgreSQL 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 
 (mingw-special) 
 
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Killing a session on windows

2007-11-29 Thread Bill Bartlett

Use select pg_cancel_backend(pid) instead -- we have to do this periodically
when queries get timed out by the web server but Postgres doesn't notice /
doesn't get notified...

- Bill

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Howard Cole
Sent: Thursday, November 29, 2007 5:55 AM
To: 'PgSql General'
Subject: [GENERAL] Killing a session on windows


I have a database I want to drop on a windows server. Unfortunately I 
cannot restart postgres because it is running several live database.

To kill the offending session, I tried select * from pg_stat_activity to 
find the PID of the session, and then tried to kill it with command line:

taskkill /f /pid 1234

This appeared to kill the session, but postgres still thinks the session 
is live with the same process id.

Using the SysInternals process explorer - there doesn't appear to be a 
process with the given ID.

How can I get postgres to drop this session?

Thanks

Howard Cole
www.selestial.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?

2007-11-19 Thread Bill Moran

In response to Ow Mun Heng [EMAIL PROTECTED]:
 
 Even with the regular vacuuming and even a vacuum full ( on my test DB)
 I still see that perhaps something is wrong (from the below)
 
 (I got this gem from the mailling list archives)
 hmxmms= SELECT
 c.relname,
 c.reltuples::bigint as rowcnt,
 pg_stat_get_tuples_inserted(c.oid) AS inserted,
 pg_stat_get_tuples_updated(c.oid) AS updated,
 pg_stat_get_tuples_deleted(c.oid) AS deleted
 FROM pg_class c
 WHERE c.relkind = 'r'::char
 GROUP BY c.oid, c.relname, c.reltuples
 HAVING pg_stat_get_tuples_updated(c.oid) +
 pg_stat_get_tuples_deleted(c.oid)  1000
 ORDER BY pg_stat_get_tuples_updated(c.oid) +
 pg_stat_get_tuples_deleted(c.oid) DESC;
 relname|  rowcnt  | inserted | updated | deleted
 ---+--+--+-+--
  tst_r | 11971691 |0 |   0 | 22390528 --
  pg_statistic  | 1465 |  280 |7716 |  153
  dr_ns |  2305571 | 1959 |   0 | 1922
  pg_attribute  | 3787 | 1403 | 184 | 1292
 
 No matter how many times I vacuum/full the deleted number still doesn't
 go down.

Are you sure you're interpreting that number correctly?  I took it to
mean a counter of the number of delete operations since server start.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?

2007-11-13 Thread Bill Moran

In response to Ow Mun Heng [EMAIL PROTECTED]:

 
 On Thu, 2007-11-08 at 12:02 -0500, Bill Moran wrote:
  Frequently, when people ask for help because they've exceed max_fsm*,
  it's because they're not paying attention to their systems, and therefore
  the problem has been occurring for a while before it got so bad that
  they couldn't ignore it.  As a result, a full vacuum is frequently a
  necessity.
  
  Folks who are monitoring their databases closely don't hit this
  problem nearly as often.
 
 How does one monitor it closely anyway? the warning comes when one does
 a vacuum verbose and with autovacuum turned on, I don't even see it
 anywhere.

1) Run vacuum verbose from cron on a regular basis and have the output
   emailed to you.

2) Capture and graph (I use mrtg) various stats that would indicate to
   you that something is wrong.  Some suggestions are graphing the
   output of pg_database_size(), various stuff captured from
   the pg_buffercache addon.  I also graph transactions/second and
   other stats, but those are useful for detecting _other_ problems,
   unrelated to vacuuming.

It's amazing to me how many people just throw up a database and expect
it to just magically work forever.  Actually, this isn't isolated to
databases ... I've seen people with fileservers run around one day
saying the fileserver is full, someone delete some files!  If it's
a fileserver, why aren't you monitoring disk usage so you see this
coming?

If it's a database server, you should be monitoring critical stats on
it.  Then you can throw out all those silly rules of thumb and use
some actual data!

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?

2007-11-08 Thread Bill Moran

In response to Vivek Khera [EMAIL PROTECTED]:

 
 On Nov 1, 2007, at 8:51 PM, Ow Mun Heng wrote:
 
  Another question is, based on what I've read in the archives (in my
  laptop.. No-Inet conn @ work) Since I've overran my max_FSM, I'm
  basically screwed and will have to do a vacuum verbose FULL on the
  entire DB. Crap..
 
 I've seen this repeated many times as well, and I can't think of a  
 really good reason why this should be true.

It's not inherently true, it's just likely.

 Once you increase max fsm  
 pages, won't the very next regular vacuum find all the free space in  
 pages and add them to the map anyway?

Yes.

 Ie, you've not lost any free  
 space once the next regular vacuum runs.  At worst, you've got a  
 slightly bloated table because you allocated more pages rather than re- 
 using some, but is that worth a full vacuum?

The situation you just described is the reason I recommend a full
vacuum after such a situation has occurred.  No, it's not required
in all cases, but it's a lot easier to recommend than the research
required to determine whether or not your table bloat is excessive
enough to warrant it.

If you can make the time to do the full vacuum, it's probably worth
it, just for peace of mind.  If it's difficult to schedule a full
vacuum, then you need to carefully review various page usages to
see if any individual tables are worth it and/or all kinds of careful
consideration.  As a result, I recommend a full vacuum, and if the
person complains that they can't schedule it, _then_ I go into the
details of how to figure out what else can/should be done.

So I guess I'm recommending it to make my own life easier :)

 I don't think it will be  
 unless you're *way* under the fsm pages needed and have been for a  
 long time.

Frequently, when people ask for help because they've exceed max_fsm*,
it's because they're not paying attention to their systems, and therefore
the problem has been occurring for a while before it got so bad that
they couldn't ignore it.  As a result, a full vacuum is frequently a
necessity.

Folks who are monitoring their databases closely don't hit this
problem nearly as often.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Bill Moran

 with Red Hat Enterprise Linux
 ES release 4 (Linux 2.6.9-42.0.3.ELsmp) on 2xDual Core AMD Opteron(tm)
 Processor 270 (4x2GHz logical CPUs) with 2GB RAM

While 2G is better, that's still not a lot of RAM if you're trying to run
2 DB servers and a web server on a single system.  If you haven't tuned
PG to take advantage of it, then it won't help much anyway.  Additionally,
you've neglected to mention the disk subsystem on this machine as well.
Is it running cheapo SATA drives because the price/gig is right?

  * both servers run in x86_64 mode, PostgreSQL footprint in memory stays
 relatively small,

Of course it does, because you've told it not to use more than 8M of RAM.

 CPU usage maxes out on import, there is no resource
 starvation in any way

You do realize that you're contradicting yourself here, right?

The advice provided by others is good as well, so I won't repeat it.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: DB on a ramdisk (was Re: [GENERAL] Temporary, In-memory Postgres DB?)

2007-11-07 Thread Bill Moran

In response to Gauthier, Dave [EMAIL PROTECTED]:
 
 One question I had earlier that I don't think got answered was how to
 undo an initdb.  dropdb drops a DB, but how do I undo an initdb?

rm -rf the directory in which you put the initdb.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Syntax error in a large COPY

2007-11-07 Thread Bill Moran

In response to Thomas Kellerer [EMAIL PROTECTED]:

 Tom Lane, 07.11.2007 06:14:
  Thomas Kellerer [EMAIL PROTECTED] writes:
  If everyone simply top-posted, there would be no need for me to scroll 
  down, 
  just to find a two line answer below a forty line quote - which I 
  personally 
  find more irritating than top-posting.
  
  I think you're ignoring my basic point, which was that people shouldn't
  be quoting forty lines' worth in the first place.  *Especially* not if
  they only have two lines to contribute.
 
 No, I did get your point.
 
 My point is: with top-posting I don't care how many lines were repeated 
 because I don't have to scroll.

Considering there is an RFC that recommends inline posting over
top-posting (http://tools.ietf.org/html/rfc1855), and considering the
fact that this topic has been beat to death on dozens of mailing lists
and the predominant preference is _not_ for top-posting -- perhaps you
should either follow the preferences of the group, or leave the group.

  But this horse has been beat to death before...

Obviously not, as it keeps coming back to life.  I guess it's an
undead horse?

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Postgresql simple query performance question

2007-11-06 Thread Bill Moran

In response to SHARMILA JOTHIRAJAH [EMAIL PROTECTED]:

 Hi
 We are in the process of testing for migration of our database from Oracle to 
 Postgresql.
 I hava a simple query
 
 Select count(*) from foo

This is asked a lot.  The quick answer is that PostgreSQL method of MVCC
makes it impossible to make this query fast.  Perhaps, someday, some
brilliant developer will come up with an optimization, but that hasn't
happened yet.

There may be some tweaks you can make to your tuning, see inline below.
However, if you really need a fast, accurate count of rows in that
table, I recommend you create a trigger to track it.

 This table has 29384048 rows and is indexed on foo_id
 
 The tables are vacuumed and the explain plan for postgresql is 
 
 QUERY PLAN

 
 -- 
  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual 
 time=68797.280..68797.280 rows=1 loops=1)   
-  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0) (actual 
 time=0.232..60657.948 rows=29384048 loops=1) 
  Total runtime: 68797.358 ms  

  
 
 The explain plan for oracle is
  
 OPERATIONOBJECTACCESS_PREDICATES 
 FILTER_PREDICATES
  ---      
  
  SELECT STATEMENT ()  (null)(null)(null)  
  
   SORT (AGGREGATE)(null)(null)(null)  
  
INDEX (FULL SCAN)  foo_IDX_ID  (null)(null)   
 
 Oracle uses index for count(*) query in this case
 This query in   Oracle takes only 5 sec and in postgresql it takes 1 min 
 10sec  
 
 The same query in oracle without the index and full table scan(like in 
 postgresql) has the 
 
 explain plan like this and it takes 34 sec.
 
 select /*+ full(foo1) */ count(*) from foo1
 
 OPERATIONOBJECT  ACCESS_PREDICATES 
 FILTER_PREDICATES
  ---  --    
  
  SELECT STATEMENT ()  (null)  (null)(null)

   SORT (AGGREGATE)(null)  (null)(null)   
 TABLE ACCESS (FULL)   foo (null)(null)
 
 
 In short the query Select count(*) from foo takes the following time:
 Postgresql - 1m 10 sec
 Oracle(index scan) - 5 sec
 Oracle (full table scan) - 34 sec 
 
 How can I speed up this query in postgresql ? The other postgres settings are
 
 postgresql
 
max_connections = 100
shared_buffers = 5

How much memory does this system have?  What version of PostgreSQL are you
using?  If you're using an 8.X version and have more 2G of RAM, this
value is likely too low.  Start with 1/4 the available RAM and tune from
there.

temp_buffers = 5000 
work_mem = 16384 
maintenance_work_mem = 262144  
fsync = on 
wal_sync_method = fsync 
effective_cache_size = 30   
random_page_cost =  4   
cpu_tuple_cost = 0.01   
cpu_index_tuple_cost = 0.001 
cpu_operator_cost = 0.0025  
 
 Are there any tuning that need to be done in the OS  or database side? I had 
 attached the iostat and vmstat results of postgresql



-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Postgresql simple query performance question

2007-11-06 Thread Bill Moran

In response to Reg Me Please [EMAIL PROTECTED]:

 I have no doubt you're right, Pavel.
 But why not?
 It could be a simple enhacement.

It's not simple.  Do some searches on the mailing lists and you will
find discussion of why it's difficult to do.

 
 Il Tuesday 06 November 2007 15:11:02 Pavel Stehule ha scritto:
  Hello
 
  PostgreSQL doesn't use index for COUN(*)
 
  http://www.varlena.com/GeneralBits/18.php
  http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7
 
  Regards
  Pavel Stehule
 
  On 06/11/2007, SHARMILA JOTHIRAJAH [EMAIL PROTECTED] wrote:
   Hi
   We are in the process of testing for migration of our database from
   Oracle to Postgresql.
   I hava a simple query
  
   Select count(*) from foo
   This table has 29384048 rows and is indexed on foo_id
  
   The tables are vacuumed and the explain plan for postgresql is
  
   QUERY PLAN
  
  
   --
Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
   time=68797.280..68797.280 rows=1 loops=1)
  
  -  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
   (actual
   time=0.232..60657.948 rows=29384048 loops=1)
Total runtime: 68797.358 ms
  
  
  
   The explain plan for oracle is
  
   OPERATIONOBJECTACCESS_PREDICATES
   FILTER_PREDICATES
---    
   
SELECT STATEMENT ()  (null)(null)   
   (null)
  
 SORT (AGGREGATE)(null)(null)   
   (null)
  
  INDEX (FULL SCAN)  foo_IDX_ID  (null)(null)
  
   Oracle uses index for count(*) query in this case
   This query in   Oracle takes only 5 sec and in postgresql it takes 1 min
   10sec
  
   The same query in oracle without the index and full table scan(like in
   postgresql) has the
  
   explain plan like this and it takes 34 sec.
  
   select /*+ full(foo1) */ count(*) from foo1
  
   OPERATIONOBJECT  ACCESS_PREDICATES
   FILTER_PREDICATES
---  --  
   
SELECT STATEMENT ()  (null)  (null)   
   (null)
  
 SORT (AGGREGATE)(null)  (null)   
   (null) TABLE ACCESS (FULL)   foo (null)   
   (null)
  
  
   In short the query Select count(*) from foo takes the following time:
   Postgresql - 1m 10 sec
   Oracle(index scan) - 5 sec
   Oracle (full table scan) - 34 sec
  
   How can I speed up this query in postgresql ? The other postgres settings
   are
  
   postgresql
  
  max_connections = 100
  shared_buffers = 5
  temp_buffers = 5000
  work_mem = 16384
  maintenance_work_mem = 262144
  fsync = on
  wal_sync_method = fsync
  effective_cache_size = 30
  random_page_cost =  4
  cpu_tuple_cost = 0.01
  cpu_index_tuple_cost = 0.001
  cpu_operator_cost = 0.0025
  
   Are there any tuning that need to be done in the OS  or database side? I
   had attached the iostat and vmstat results of postgresql
  
   Thanks
  
   __
   Do You Yahoo!?
   Tired of spam? Yahoo! Mail has the best spam protection around
   http://mail.yahoo.com
  
   ---(end of
   broadcast)---
   TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 
  ---(end of broadcast)---
  TIP 9: In versions below 8.0, the planner will ignore your desire to
 choose an index scan if your joining column's datatypes do not
 match
 
 
 
 -- 
 Reg me Please
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Postgresql simple query performance question

2007-11-06 Thread Bill Moran

In response to André Volpato [EMAIL PROTECTED]:

 Richard Huxton escreveu:
  Reg Me Please wrote:
  While I would not spend resources in fine tuning the count(*), I would
  spend some to underastand why and how the other ones do it better.
 
  Just to be better.
 
  The problem is well understood, and there is extensive discussion in 
  the mailing lists archives. The basic problem is that with PG's 
  implementation of MVCC the indexes don't have row visibility 
  information. The simple solution of adding it to every index entry 
  would increase index size substantially imposing costs on every index 
  access and update.
 
  There's a thread in -hackers called Visibility map thoughts that is 
  looking at the situation again and if/how to implement visibility 
  information in a compact form.
 
 
 Remember that you can always use serial fields to count a table, like:
 
 alter table foo add id serial;
 select id from foo order by id desc limit 1;
 
 This should return the same value than count(*), in a few msecs.

I don't think so.  What kind of accuracy do you have when rows are
deleted?  Also, sequences are not transactional, so rolled-back
transactions will increment the sequence without actually adding
rows.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?

2007-11-01 Thread Bill Moran

Ow Mun Heng [EMAIL PROTECTED] wrote:

 I just ran a vacuum verbose on the entire DB and this came out.
 
  number of page slots needed (274144) exceeds max_fsm_pages (153600)
 
 Hence, I've changed the max to 400,000 (pulled it straight out of the
 air). How does one calculate what's the number needed anyway?

It's not simple.  Every update or delete creates a dead tuple that
needs to be tracked by an fsm entry.  So it depends on how frequently
your database is changing in between vacuum runs.

In my experience, the best bet is to do vacuum verbose on a regular
basis and get a feel for what you need.  Every database load is
different.

 Another question is, based on what I've read in the archives (in my
 laptop.. No-Inet conn @ work) Since I've overran my max_FSM, I'm
 basically screwed and will have to do a vacuum verbose FULL on the
 entire DB. Crap..

You don't _need_ to.  But it's generally a good idea to get table
bloat reduced.

 (I was playing with pgfouine and then I found the above piece of advice)
 
 I'm planning to run vacuum verbose full tonight/over the weekend. (is
 this sane?) Thanks for the advice..

vacuum full is sane, if that's what you mean.  The only problem is that
it locks tables while working on them, so you have to take into account
what other workload might be blocked while vacuum full is working, and
how long vacuum full is liable to take.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] (Never?) Kill Postmaster?

2007-10-24 Thread Bill Moran

In response to Stefan Schwarzer [EMAIL PROTECTED]:

 Hi there,
 
 I read dozens of times the TIP 2: Don't 'kill -9' the postmaster...
 
 Now, what am I supposed to do if I launched a query which takes ages,  
 and which I want to interrupt?
 
 Thanks for any advice,

Tracing through the other threads, this is obviously not the regular
wayward query, but one that won't die by the normal methods.

Unfortunately, I came across this recently, and the only solution I
found was to do a pg_ctl restart -m i (Yes, I tried -m f first).

Luckily, the db in question was such that the front ends didn't
suffer horribly from this and reconnected, and that the database
finished up its recovery in a timely manner.

Hopefully, I can generate a reproducible example so I can file a
bug, but haven't gotten that far with it yet.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Indexes Primary Keys (based on the same columns)

2007-10-22 Thread Bill Moran

In response to Joshua D. Drake [EMAIL PROTECTED]:

 Ow Mun Heng wrote:
  I'm wondering if what I'm doing is redundant.
  
  I have a primary key on columns (A,B,C,D)
  and I've also defined an index based on the same columns (A,B,C,D)
  
  and sometimes in the query explain, I see the pkey being used for the
  scan instead of the index.
  
  So.. That made me think perhaps the additional index on the _same_
  parameter is redundant.
 
 A primary key creates an index so having a second index with the same 
 definition is redundant.

Note the same definition.

Since this is a multi-column index, there may be some advantage gained
by having indexes defined slightly differently.  I.e., your PK is
(ABCD) but you have an additional index on (DCBA)

Whether or not this is actually helpful depends on the nature of the
queries you run.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] looking for some real world performance numbers

2007-10-22 Thread Bill Moran

In response to Gregory Stark [EMAIL PROTECTED]:

 Tom Lane [EMAIL PROTECTED] writes:
 
  Thomas Kellerer [EMAIL PROTECTED] writes:
  Where else do they want to store relational data than in a RDBMS?
 
  Indeed.  It seems like we can hardly answer the OP's question without
  asking compared to what?  If they're afraid an RDBMS won't scale,
  what have they got in mind that they are so certain will scale?
 
 I suspect they're misapplying the lesson Google taught everyone. Namely that
 domain-specific solutions can provide much better performance than
 general-purpose software.
 
 Google might not use an RDBMS to store their search index (which doesn't need
 any of the ACID guarantees and needs all kinds of parallelism and lossy
 alorithms which SQL and RDBMSes in general don't excel at), but on the other
 hand I would be quite surprised if they stored their Adsense or other more
 normal use data structures in anything but a bog-standard SQL database.

Google also has enough high-calibre people that they can probably
re-invent the concept of an RDBMS if they want to.  Yet they don't.
I know a particular Googleite who's a PostgreSQL buff and is bummed
that they use MySQL all over the place.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] keeping an index in memory

2007-10-21 Thread Bill Moran

Rajarshi Guha [EMAIL PROTECTED] wrote:

 Hi, relating to my previous queries on doing spatial searches on 10M  
 rows, it seems that most of my queries return within 2 minutes.  
 Generally this is not too bad, though faster is always better.
 
 Interestingly, it appears that the CUBE index for the table in  
 question is about 3GB (the table itself is about 14GB). Not knowing  
 the details of the postgres internals, I assume that when a query  
 tries to use the index, it will need to read a 3GB file. Is this a  
 correct assumption?
 
 In such a situation, is there a way to keep the index in memory? My  
 machine has 8GB installed and currently has about 7.4GB free RAM (64  
 bit linux 2.6.9)

Free or cached/buffered?  Your OS should be using most of that to
buffer disk blocks.

 A side effect of the size of the index is that if I do a query that  
 performs a seq scan (say using cube_distance) it takes longer than  
 when an index is used, but not significantly longer. And this is on a  
 10M row table.
 
 What strategies do people follow when the index becomes very big?

What version of PG are you using and what is your shared_buffers setting?

With 8G of RAM, you should start with shared_buffers around 2 - 3G, if
you're using a modern version of PG.  With that much shared memory, a
large portion of that index should stay in RAM, as long as it's being
used often enough that PG doesn't swap it for other data.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] looking for some real world performance numbers

2007-10-21 Thread Bill Moran

snacktime [EMAIL PROTECTED] wrote:

 I'm working through the architecture design for a new product.  We
 have a small group working on this.  It's a web app that will be using
 ruby on rails.  The challenge I'm running into is that the latest
 conventional wisdom seems to be that since obviously databases don't
 scale on the web, you should just not use them at all.

Who are the people saying this?  It doesn't sound very wise to me.

Where are they proposing to put the data, if not in a database?  That's
what I'd like to know.

 I have a group
 of otherwise very bright people trying to convince me that a rdbms is
 not a good place to store relational data because eventually it won't
 scale.

What is _their_ evidence?

 And of course we don't even have version 1 of our product out
 of the door.

E.S. Raymond's The Art of UNIX Programming: Rule #15: Write a
prototype before you optimize.

Nothing is funnier than watching people try to performance optimize
software that hasn't even been written yet.  Very few people are
smart enough to know where the performance bottlenecks will be before
they've coded anything.  If they insist on doing it wrong, at least
you'll have a good laugh.

 I'll admit we do have a very good chance of actually
 getting tons of traffic, but my position is to use a rdbms for
 relational data, and then if and when it won't scale any more, deal
 with it then.

That's sane.

 So what would really help me is some real world numbers on how
 postgresql is doing in the wild under pressure.  If anyone cares to
 throw some out I would really appreciate it.

http://people.freebsd.org/~kris/scaling/

Lots of interesting graphs on that page ... most of them seem to indicate
that RDBMS scale rather nicely.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] a failover scenario

2007-10-18 Thread Bill Moran

In response to Tomi N/A [EMAIL PROTECTED]:

 I am considering pgsql as the RDBMS in a project with the following 
 constraints:
 - there's a master and reserve instance of the RDBMS on every remote location
 - there's a master and reserve instance of the RDBMS on a central location
 - the connections are DSL connections and therefore unreliable
 - all changes have to be propagated to all servers (multimaster replication)
 - if the connection between a remote location and the central location
 fails, the local server continues working and resynchronizes with the
 central server when the connection is restored
 - if any master fails, the reserve instance takes over and the rest of
 the system acts as though nothing happened
 
 The master/reserve instance is, from what I read, standard
 functionality, but I'm not so sure about the resynchronization part of
 a failed link...I imagine something like WAL shipping might be of use
 here, but it's just an uneducated guess.
 Does code exist to support this on pgsql or is it considered
 application specific functionality? Do other RDBMSs support similar
 functionality?

I don't know of any system that will just hand you those capabilities.

Every multi-master system I've ever heard of requires high-speed links
between the masters, otherwise the synchronization is far too slow to
be usable.

I believe you could do what you want in the application.  PostgreSQL
8.3 will have a native UUID type, which will help with managing conflicts
between multiple masters.  If you can define clear rules on how to manage
conflicts, that can be done automatically.  If the rules aren't so clear,
you'll need an interface where a human can manage conflicts.

With triggers and LISTEN/NOTIFY, you can put together an app that
handles replicating data when tables experience changes.  From there,
you'll need to structure your schema so such an app can detect conflicts,
(create last_updated timestamps on all tables, and ensure that primary
keys include a UUID or other mechanism to guarantee uniqueness) and design
some sort of queue mechanism to ensure updates can wait while network
problems are resolved.

How much effort such a thing requires is dependent on how complex the
data is.  If it's a sales database (for example) it's not all that hard,
since there aren't typical cases where two people are simultaneously
updating the same record.

I know, for example, that the PA gaming commission is putting something
like this together for the race tracks.  Each track has handheld devices
that are used to record bets/payouts, etc.  These devices can't be
connected all the time, but a sync system is pretty easy because all they
ever do is _add_ new records.  Thus, you assign each handheld a unique
device ID, and that's part of the primary key for each table, so there's
no chance of of conflict.

Sounds like a fun and challenging project.  I'm jealous.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Possibilities of Fine Grained Access Control?

2007-10-17 Thread Bill Moran

In response to Uden van Hout [EMAIL PROTECTED]:

 I need a solution for the following: With all data stored in the same 
 schema, consumers may only retreive and update data that is relevant to 
 them. At the same time, users in our own company need to see all data.
 
 Is a solution similar to Oracle's Virtual Private Database possible with 
 PostgreSQL, as this is precisely what we need?

Not familiar with Oracle's solution, but ...

Without knowing the details, it's difficult to be sure if PostgreSQL's
native security meets your needs.  You can assign read/write/create
permissions to databases, schemas, tables, and other objects:
http://www.postgresql.org/docs/8.2/static/sql-grant.html

This falls short if you need permissions at the row or column level,
which PG doesn't support naively (unless this has been added in 8.3
and I simply haven't see the announcement).

For that, the best approach I know for you is Veil:
http://veil.projects.postgresql.org/curdocs/index.html

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] reporting tools

2007-10-14 Thread Bill Moran

Geoffrey [EMAIL PROTECTED] wrote:

 Andrus wrote:
  I guess I should have noted that we will need to run this on Linux 
  clients.
  
  Geoffrey,
  
  You can run FYIReporting engine in Linux using MONO ( www.go-mono.com )
 
 Thanks, we're looking for something that will run natively on Linux.

I read this, almost deleted it, read it again ...

Just in case there's confusion, MONO + FYIReporting _is_ native on Linux.

At least, as much so as Java on Linux is.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Need advice on keeping backup database up to date

2007-10-12 Thread Bill Moran

In response to Matthew Wilson [EMAIL PROTECTED]:

 
 I have been using postgresql for my web application for a while now and
 it has been great.
 
 I want to set up a separate emergency failover server and database in a
 different data center.
 
 In the event that my primary data center becomes inaccessible, I want to
 update a DNS record and then redirect visitors to the backup data
 center.
 
 I am trying to figure out how I can keep the postgresql database in the
 backup data center as up to date as possible.
 
 The ideal solution would keep the backup database updated in real time
 as the primary database changes.
 
 I need advice about the best way to accomplish this.

This sounds like a textbook case for Slony.

Slony will not guarantee that your database is up to the second copy,
but that's part of the beauty of it.  The _only_ way you can guarantee
that two databases in different datacenters are perfectly synchronized
at all times is not to let an application move forward until it has
received confirmation from both databases that a transaction has
completed -- and doing that will absolutely kill performance.

Slony will make a best effort.  If traffic is low, it will keep the
two withing a few fractions of a second of each other.  If traffic
gets busy, the backup will get behind, but when things slow down
again, Slony will get them caught up.  As long as your average
database traffic does not exceed the available bandwidth, all will
be well.

Slony will also allow you pick/choose which tables you want to
duplicate.  This can optimize things, as it's not normally worthwhile
to replicate things like session tables, and they usually eat up a
lot of bandwidth.

http://www.slony.info

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off

2007-10-08 Thread Bill Bartlett

 -Original Message-
 From: Michael Glaesemann [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 05, 2007 3:25 PM
 To: Bill Bartlett
 Cc: 'Andreas Kretschmer'; pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Request: Anyone using bogus / 
 humorous X-Message-Flag headers, could we please turn them off

  (Makes me have to think twice about raising
  any _real_ issues though, like why my VACUUMs periodically keep
getting
  into lock contentions with my JDBC connections and ultimately
causing me
  to have to shut down Postgres w/ pg_ctl stop -m immediate, or how
to
  properly handle postmaster errors that don't appear to be documented
  anywhere except in the source code...  [No, it's not the absolutely

  most recent version of Postgres; No, I can't upgrade it.])

 This isn't fair, Bill. Your original question concerned posts your  
 email client has trouble processing, which isn't the primary 
 topic of the list. You also knew that it was somewhat contentious,
given that  
 you've made comments such as Believe me, I'm not defending 
 Outlook, so you probably weren't overly surprised at some of the 
 responses you got. Also note that the responses you got were attempts
to 
 solve your problem: I don't see any that only belittled your choice of
software.

Most people didn't completely read my email and thus unfortunately
completely missed the point, in many cases seemingly because they were
too quick to jump on my use of Outlook as an email client (thus assuming
I was just one of those terrible horrible know-nothing Windows users).
It's obvious from 3 years of reading these forums that there is a Linux
Postgres users good, Windows Postgres users bad bias by many (but not
all) of the members rather than them realizing that  'we' are all
Postgres users, 'those' other horrible people are MySQL / Oracle / etc
users.  I forgot that by posting about a header flag which could be
traced back to a Microsoft product, I'd be stepping right smack into
that muck.

Rereading my original email, you can see that the problem I was trying
to address was not my choice of email software but rather that several
people in these Postgres listservs (like other people in other
listservs) were intentionally misusing a specific header flag that is
used by specific email programs (only the various Outlook-related
ones, as far as I've been able to determine) to highlight messages for
special handling.  Granted that they were using it for joking
purposes, but after a while, all jokes get old.  (After spending 4 days
in a row dealing with significant Postgres system crashes on several
different servers, old just happened to rub me the wrong way whereas I
normally just grit my teeth and ignore it.)  After researching this flag
further (after seeing the feedback I got on this forum), I've discovered
that this type of misuse is frequently used, and even recommended on
many Linux-oriented web sites as a means, to annoy Outlook-based users
(as a means to annoy Windows users).  As I mentioned above, I think in a
forum such as this, where we need to all be Postgres users, I don't
think it's appropriate to intentionally annoy any of our group.  (That
may not have been the intent, but after seeing the many Linux-oriented
web sites and forums recommending its use for specifically this purpose,
now it does annoy me even more.)

(Just for the record, not that I should have to justify my background
and biases [or hopefully lack thereof] to the group: I gave up fighting
platform wars a LOOONG time ago, back when I used to try to get the
corporate world to bring in Macs instead of Windows 3.x.  Now I
generally use the best tool for the job, or the tools I have to use when
that's not an option.  Insofar as systems and OS's, I am currently
handling 140+ servers running a pretty much even split between Win2K3
and various versions of Linux (primarily several SuSE 9.x versions, a
few Red Hats and at least one Debian), with VMware instances of 2 SuSE
servers running inside this specific XP development desktop, managing
pretty much everything remotely via SSH and bash (via Cygwin on the
Windows servers).  I may be using Windows on my desktop, but I don't
think I'd put myself into the category of being merely one of those
terrible horrible know-nothing Windows users.)

 If you *are* having issues with PostgreSQL, it would behoove you to  
 at least post them and see what kind of response you get, 
 rather than judge the list as a whole due to the response you got to
an 
 off-topic post. Many of the people on the lists have been here for
years and  
 have gotten lots of helpful advice, which is why they've stuck  
 around, and are many others that are happy to share their advice and  
 experience. You never know: you might be pleasantly surprised.

I agree that the majority of the responses that I've seen over the last
3 years of reading several of the Postgres listservs were indeed
attempts to help (including several replies that I've

[GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off

2007-10-05 Thread Bill Bartlett

Quick request to the group: we have several members who include bogus or
humorous X-Message-Flag headers in their email messages.  Could I
request that you _please_ turn them off?  Because they come through as
flagged messages in Outlook,  it throws off my email rules processing
and the messages end up into the wrong groups.  (With the volume of
email these days, I definitely need all the assistance I can get from
things like rules processing to attempt to stay on top of it.)

(Yes, I understand people's personal preferences for not liking Windows
or Outlook or Microsoft, but that isn't going to change the applications
that I need to use for my day-to-day work.  Feel free to continue
posting your feelings in email signatures though -- never can tell if it
will indeed influence a change ...)

Much appreciated!

- Bill



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off

2007-10-05 Thread Bill Bartlett

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of A. Kretschmer
 Sent: Friday, October 05, 2007 10:57 AM
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Request: Anyone using bogus / 
 humorous X-Message-Flag headers, could we please turn them off

 am  Fri, dem 05.10.2007, um 10:05:32 -0400 mailte Bill 
 Bartlett folgendes:
  Quick request to the group: we have several members who 
 include bogus 
  or humorous X-Message-Flag headers in their email 
 messages.  Could I 
  request that you _please_ turn them off?  Because they come 
 through as

 Do you mean me?

Not specifically -- yours just happened to be the one that triggered it,
but no, I see this in postings from other people as well.  (Didn't mean
to single you out though.)

  flagged messages in Outlook,  it throws off my email rules 
 processing 
  and the messages end up into the wrong groups.  (With the volume of

 This header is a special Outlook-Feature. If this header 
 realy make problems in _this_ software, then i think, _this_ 
 software is broken. But hey, tell news ;-)

See other reply -- I'm not complaining that Outlook is broken but rather
that the misuse of this flag slows down my workflow.  The issue is that
because this flag is useful when used properly (e.g.: for mail rules
like take all messages flagged for followup and move them to specific
other folders for different types of followup), having the flag be
added to emails just for the sake of putting in a humorous message
just gets in the way.  (I keep needing to go fish the
incorrectly-flagged messages back out of the various followup folders
before I can completely follow the threads in the Postgres listservs.)

 And yes: there are any email-software available, without 
 problems. Including Windows.

Hmm ... I still use Pine on some of my really old Linux boxes -- does
this count?  (Useful for reading CRON output...)

  (Yes, I understand people's personal preferences for not liking 
  Windows or Outlook or Microsoft, but that isn't going to change the 
  applications that I need to use for my day-to-day work.  
 Feel free to 
  continue

 Okay, i like this list and i like the people here. I will try 
 to disable this special header line for this and only this 
 list (okay, for all postgresql-lists). Let me try, i'm not 
 sure how to disable this header-line only for [EMAIL PROTECTED]

 Please, don't beat me if this header are still in this 
 message, the problem is work in progress... i'm not sure if i 
 change the config properly.

Yes, it's working properly now.  MUCH thanks!

 Andreas
 -- 
 Andreas Kretschmer
 Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
 GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

 ---(end of 
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off

2007-10-05 Thread Bill Bartlett

 -Original Message-
 From: Michael Glaesemann [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 05, 2007 10:45 AM
 To: Bill Bartlett
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Request: Anyone using bogus / 
 humorous X-Message-Flag headers, could we please turn them off

 On Oct 5, 2007, at 9:05 , Bill Bartlett wrote:

  Quick request to the group: we have several members who include
  bogus or
  humorous X-Message-Flag headers in their email messages.  Could I
  request that you _please_ turn them off?

 In all practicality, a request like this is futile:
 * Given the number of people on the list, you'll never get everybody  
 to remove bogus
 X-Message-Flag headers
 * Given it's an X- header, doesn't that mean the meaning of 
 the value  
 is implementation dependent? What's bogus wrt Outlook may not be  
 wrt another mail system or client
 * Doesn't this indicate that Outlook is broken (for some values of  
 broken)?

Actually, no -- this is why I listed a specific X- header (
X-Message-Flag ) rather than simply saying Hey, would everyone please
turn off their X-headers.  This specific X- header is designed to have
Outlook flag a message and display an extra line of text with the flag
comment above the email.  Since this is generally only used to flag
messages for followup actions, having messages come across with the
header already embedded in them simply serves to add an extra
distraction to the already-too-many bits into which I need to slice my
time.  (Not that I want more time for things like my having to spend 3
hrs yesterday regenerating sequences after diagnosing a database crash
[bad disk controller on an old SuSE box] and restoring from backup, but
that's my real world...)

Because they come through as
  flagged messages in Outlook,  it throws off my email rules 
 processing 
  and the messages end up into the wrong groups.  (With the volume of 
  email these days, I definitely need all the assistance I 
 can get from 
  things like rules processing to attempt to stay on top of it.)

 I sympathize. there *is* a lot of email traffic these days (and not  
 just from the lists). But rather than request that others bend to  
 your rules, I'd think a better solution would be to find (or 
 develop)  
 tools that do what you want. Whether that means better rule handling  
 or better understanding of various headers, it sounds like Outlook  
 isn't doing the job for you. Perhaps a hybrid approach would be  
 helpful: use another email client for mailing lists and Outlook  
 otherwise.

See above -- the problem isn't with Outlook at a mail client. Outlook is
doing exactly what it's supposed to do when it sees this X- header:
highlighting it and flagging the message for special handling.  The
issue is with the headers being used (or misused) as they are.  Believe
me, I'm not defending Outlook; however, that's what I (and many other
people) use -- it's just a tool to get a job done.

(For all you non-Outlook people out there, since you aren't seeing the
messages anyway, most of them are generally humorous messages like '
Windows is not the answer. Windows is the question and the answer is
no! ' or ' Really, I'm not out to destroy Microsoft. That will just be
a completely unintentional side effect. (Linus Torvalds) ' Yes, they
were funny 2 years ago when I first saw them, but now it's gotten a bit
old -- sorry.  [No, I'm not meaning to single out Andreas -- his just
happened to be the first two that I saw in my listserv items of useful
Postgresql stuff to keep for reference folder.])

 Hope this gives you some ideas.

 Michael Glaesemann
 grzm seespotcode net

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off

2007-10-05 Thread Bill Bartlett

 -Original Message-
 From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 05, 2007 12:30 PM
 To: Bill Bartlett
 Cc: A. Kretschmer; pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Request: Anyone using bogus / 
 humorous X-Message-Flag headers, could we please turn them off

 On 10/5/07, Bill Bartlett [EMAIL PROTECTED] wrote:

  Hmm ... I still use Pine on some of my really old Linux boxes -- 
  does this count?  (Useful for reading CRON output...)

 If you need / want the familiar interface of pine on a modern 
 linux box, look for cone.

 Note that there's also an open source implementation of pico 
 called nano.

 Just FYI.  I use both, quite a bit.

I just starting using nano, although I still tend to use vi a lot more
than I should.  (Never got into the whole emacs thing tho.)

I hadn't heard of cone, so I'll have to take a look for that.  Thanks
for the tip!  (Long live the console!)

- Bill

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off

2007-10-05 Thread Bill Bartlett

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Andreas Kretschmer
 Sent: Friday, October 05, 2007 12:55 PM
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Request: Anyone using bogus / 
 humorous X-Message-Flag headers, could we please turn them off

 Bill Bartlett [EMAIL PROTECTED] schrieb:

   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of 
 A. Kretschmer
   Sent: Friday, October 05, 2007 10:57 AM
   To: pgsql-general@postgresql.org
   Subject: Re: [GENERAL] Request: Anyone using bogus / 
   humorous X-Message-Flag headers, could we please turn them off

   am  Fri, dem 05.10.2007, um 10:05:32 -0400 mailte Bill
   Bartlett folgendes:
Quick request to the group: we have several members who
   include bogus
or humorous X-Message-Flag headers in their email
   messages.  Could I
request that you _please_ turn them off?  Because they come
   through as

 Outlook is a PITA. As you can see, it breaks the lines in 
 quotings. Hard
 to read, because any mail readers can colored different 
 quoting planes.
 Disgusting.

Agreed - never said it wasn't.  But putting time into changing to
something else is time that I just won't have for the foreseeable
future.

  See other reply -- I'm not complaining that Outlook is 
 broken but rather
  that the misuse of this flag slows down my workflow.  The 
 issue is that

 I think, you have any other problems with this crappy software. For
 instance, Outlook can't group messages by Message-IDs (Treading). For
 mailing-lists IMHO a no-go.

Agreed - but I'm not debating the merits of Outlook.  It's a tool that
does a job, and for all its faults, for what I need right now and for
the very limited time I have avaiable, it does the job.

   And yes: there are any email-software available, without 
   problems. Including Windows.

  Hmm ... I still use Pine on some of my really old Linux 
 boxes -- does
  this count?  (Useful for reading CRON output...)

 Pine isn't a solution, mutt is a solution for instance. KMail is a
 solution, and Thunderbird, both for Linux and Windows.
 (and any other software, i'm a mutt-user)

Again, for what I need and how I need to use it, pine is a perfectly
good solution that works (although in looking at cone, that definitely
looks like a much better solution).  For the specific cases where I use
pine, neither KMail nor Thunderbird is a workable solution at all, since
I need to use a console-mode mail reader running on remote servers being
accessed via SSH and without being able to forward X-Windows so anything
graphical is out.  (I always need to match an appropriate solution to
the problem, and part of that is realizing that one single solution does
not always fit every problem.)

   Please, don't beat me if this header are still in this 
   message, the problem is work in progress... i'm not sure if i 
   change the config properly.

  Yes, it's working properly now.  MUCH thanks!

 No problem, but i'm not sure, if this the right way. YOU 
 can't force all
 other people to omit things that make problems in your crappy 
 software.
 And: YOU renounce for nice feature in modern software...

Never mind -- it's not that big a deal, certainly not as much as
everyone has turned it into.  I assumed that this would be a simple
request about a simple problem, but I didn't realize I'd have to spend
so much time trying to justify to so many people on the listserv the
tools I have to use to do my job.  (btw: The heading is back on in this
reply, so there must be some other setting somewhere in your mail
program that needs to be tweaked.  But don't worry about trying to
change anything else -- my effort involved in correcting the misdirected
emails is probably less than your effort in trying to configure
different headers for different groups, so I wouldn't want you to waste
any more time on it.)

Anyway - to those people still slogging through this thread who haven't
moved on to something more productive: no more emails on this please;
I'm done with this subject.  (Makes me have to think twice about raising
any _real_ issues though, like why my VACUUMs periodically keep getting
into lock contentions with my JDBC connections and ultimately causing me
to have to shut down Postgres w/ pg_ctl stop -m immediate, or how to
properly handle postmaster errors that don't appear to be documented
anywhere except in the source code...  [No, it's not the absolutely most
recent version of Postgres; No, I can't upgrade it.])

 (sorry, english isn't my native language and i know my english is bad)

 Andreas
 -- 
 Really, I'm not out to destroy Microsoft. That will just be a 
 completely
 unintentional side effect.  
 (Linus Torvalds)
 If I was god, I would recompile penguin with --enable-fly.  
   (unknow)
 Kaufbach, Saxony, Germany, Europe.  N 51.05082°, 
 E 13.56889

Re: [GENERAL] Large Result and Memory Limit

2007-10-04 Thread Bill Moran

In response to Scott Marlowe [EMAIL PROTECTED]:

 On 10/4/07, Mike Ginsburg [EMAIL PROTECTED] wrote:
 
  export it.  Memory limit is a major concern, but the query for one row
  returns a result set too large and PHP fails.
 
 If this is true, and one single db row makes php exceed its memory
 limit just by returning it, you've done something very very in your
 design.

I work with Mike, and I'm not sure there's something very wrong in
the design.

The problem we're looking at is an export.  This big TEXT field is
nothing more than a staging area.  The export itself can take several
hours to run, so we background the process, and store the result in
a big TEXT field (it ends up being a CSV).  Later, the user can log
in and download the exported file via the web interface.  (But it's
unlikely that anyone is going to click export, then wait 2 hours
for their browser to refresh :)

The process is, web interface - request export - get an estimate
on completion time - come back later and check the status - download
if complete.

It's difficult (maybe impossible) to materialize the data on a schedule,
since the user has a number of options how how to export the data
(filters, etc) so we chose to do it on-demand.

 You'd proably be better served using either a plain text file system
 to store these things, or large objects in postgresql.

We have multiple web servers with a load balancer, so saving the
result to a file doesn't really work.  We could put shared storage
in place, but I'm still not convinced that's the best fix for this.

Large objects are one option we were considering.  The problem is
they don't work with Slony, and our redundancy relies on Slony.  Granted,
we might want to just make this a non-replicated table, since it's
only a staging area anyway, but we're looking for a better solution.

My (specific) question is whether or not anyone has experimented with
putting something like a CSV file in a table with one tuple for each
row, and compared performance, etc to putting it in a large object?

 But if you're stuffing ~8 megs worth of csv text data into a single
 row* you're probably not using a very relational layout of your data.
 And you're losing all the advantages (checking your data for
 consistency and such) that a relational db could give you.

Actually, we _are_ using lots of tables with lots of relations and
foreign keys and triggers and stored procedures and all sorts of other
stuff.  That's why it takes multiple hours to flatten everything (is
there a better term to describe the process of turning relational data
into a single flat file?)

 * Note that I'm assuming a few things up there.  1: php uses about 2:1
 memory to store data it's holding, roughly.  If you're set to 16 Meg
 max, I'm assuming your return set is 8Meg or larger.

I'm not sure, but that's probably correct.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] more problems with count(*) on large table

2007-10-02 Thread Bill Moran

In response to Mike Charnoky [EMAIL PROTECTED]:

 The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM.  The io
 subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware
 9500S-8 controller (Seagate Nearline ST3400632NS drives).  Currently,
 shared_buffers is set to 5 (nearly 400M)

The advice on 8.x systems has been to start with 1/4-1/3 of the available
RAM on the system, and fine-tune from there.  Unless there are other
(non-postgresql) functions this machine serves, you should probably up
shared_buffers to about 2G.  From there, you may find that your workload
benefits from even more, or possibly less, but 400M seems pretty small
for a 6G system.

 As for the data stored in this large table, there are 15 columns.  Each
 row takes roughly 134 bytes to store, not counting the index.  So, for
 one day's worth of data we are talking about 1.5G/day (1.8G with the
 index).  That's about 11.5M rows/day.  Although the data isn't stored
 exactly sequentially by the indexed time field, it is pretty close.

How much other data is this server pushing around?  If there's only that
one table in that one database, then something is wrong, as that whole
thing should be in the filesystem cache all the time.  Otherwise, you
have to consider what other operations may be needing memory and moving
those tables out of the way.

 If it takes PG ~40 minutes to count(*) one day's worth of records, the
 avg throughput is 786k/s.  Watching iostat during the count(*)
 operation, I see average read speeds in the range of 1100-1500k/s.

Could be a lot of fragmentation of that table.  Keep in mind that if
you're deleting records occasionally, that free space will get reused,
which means an insert might not insert sequentially, it might go all
over the table.

 I guess I would expect postgres to perform a count(*) faster.  When I
 run benchmarks on the machine with hdparm (with the db insert process
 running), I see the disk averages  80MB/sec for reads
 
 # hdparm -tT /dev/sdb1
 /dev/sdb1:
  Timing cached reads:   3884 MB in  2.00 seconds = 1942.85 MB/sec
  Timing buffered disk reads:  248 MB in  3.01 seconds =  82.49 MB/sec
 
 Maybe PG has to do a lot of random disk access?  I'm running bonnie++
 now to get more detailed disk performance info.  As Tomasz pointed out
 maybe using CLUSTER would help, but this probably takes a long time to
 perform.

If you can spare the time, give it a try to see if it helps.

 Again, the only other thing happening with the db: a separate process is
 inserting data into this table.  I have checkpoint_segments set to 64 so
 that pg is not constantly thrashing the disk with writes.  The
 transaction log is on a separate disk.
 
 
 Mike
 
 Bill Moran wrote:
  In response to Mike Charnoky [EMAIL PROTECTED]:
  
  This is strange... count(*) operations over a period of one day's worth
  of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
  first time the data is queried it takes about 40 minutes.  If I try the
  query again, it finishes in 1-2 minutes!
  
  This sounds like a caching issue.  My guess at what's happening is that
  other operations are pushing this data out of the shared_buffers, so
  when you run it, the system has to pull a bunch of tuples off the disk
  to check them.  If you run it again immediately, the tuples are still in
  memory, and it runs very fast.
  
  If this is the case, you can speed up things by adding RAM/shared_buffers,
  or by moving to faster disks.  The RAM solution is going to give you the
  biggest performance improvement.
  
  However, if there's enough other data on this system, you may have
  difficulty getting enough RAM to mitigate the problem, in which case,
  faster disks are going to be your best bet.
  
  How much RAM do you have, and how much of it is allocated to shared_buffers?
  What's your IO subsystem look like?
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Bill Moran

In response to Mike Charnoky [EMAIL PROTECTED]:

 This is strange... count(*) operations over a period of one day's worth
 of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
 first time the data is queried it takes about 40 minutes.  If I try the
 query again, it finishes in 1-2 minutes!

This sounds like a caching issue.  My guess at what's happening is that
other operations are pushing this data out of the shared_buffers, so
when you run it, the system has to pull a bunch of tuples off the disk
to check them.  If you run it again immediately, the tuples are still in
memory, and it runs very fast.

If this is the case, you can speed up things by adding RAM/shared_buffers,
or by moving to faster disks.  The RAM solution is going to give you the
biggest performance improvement.

However, if there's enough other data on this system, you may have
difficulty getting enough RAM to mitigate the problem, in which case,
faster disks are going to be your best bet.

How much RAM do you have, and how much of it is allocated to shared_buffers?
What's your IO subsystem look like?

 Again, nothing else is happening on this db server except for a constant
 insertion into this table and a few others.  I have done set statistics
 100 for the evtime field in this table.
 
 Here is the output from EXPLAIN ANALYZE.  This is the same query run
 back to back, first time takes 42 minutes, second time takes less than 2
 minutes!
 
 mydb=# explain analyze select count(*) from prediction_accuracy where
 evtime between '2007-09-29' and '2007-09-30';
 
   QUERY PLAN
 
 ---
  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
 time=2549854.351..2549854.352 rows=1 loops=1)
-  Index Scan using pred_acc_evtime_index on prediction_accuracy
 (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
 rows=11423786 loops=1)
  Index Cond: ((evtime = '2007-09-29 00:00:00-07'::timestamp
 with time zone) AND (evtime = '2007-09-30 00:00:00-07'::timestamp with
 time zone))
  Total runtime: 2549854.411 ms
 (4 rows)
 
 Time: 2549943.506 ms
 mydb=# explain analyze select count(*) from prediction_accuracy where
 evtime between '2007-09-29' and '2007-09-30';
 
  QUERY PLAN
 
 -
  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
 time=111200.943..111200.944 rows=1 loops=1)
-  Index Scan using pred_acc_evtime_index on prediction_accuracy
 (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
 rows=11423786 loops=1)
  Index Cond: ((evtime = '2007-09-29 00:00:00-07'::timestamp
 with time zone) AND (evtime = '2007-09-30 00:00:00-07'::timestamp with
 time zone))
  Total runtime: 111201.000 ms
 (4 rows)
 
 Time: 111298.695 ms
 
 
 Mike
 
 Gregory Stark wrote:
  Mike Charnoky [EMAIL PROTECTED] writes:
  
  I altered the table in question, with set statistics 100 on the
  timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
  queries don't seem to hang, but it still takes a long time to do the count:
   * where evtime between '2007-09-26' and '2007-09-27'
 took 37 minutes to run (result was ~12 million)
   * where evtime between '2007-09-25' and '2007-09-26'
 took 40 minutes to run (result was ~14 million)
 
  Still stymied about the seemingly random performance, especially since I
  have seen this query execute in 2 minutes.
  
  
  And the explain analyze for these?
  
  Are you still sure it's certain date ranges which are consistently problems
  and others are consistently fast? Or could it be something unrelated.
  
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] more problems with count(*) on large table

2007-09-28 Thread Bill Moran

In response to Mike Charnoky [EMAIL PROTECTED]:

 The autovacuum is turned on.  Since this is pg8.1, I don't know when the
 table was actually last vacuumed.  I *did* run analyze on the table,
 though.  Also, nothing has been deleted in this table... so vacuum
 should have no affect, right?

Updated rows also produce dead tuples that require vacuuming.  If the
table is insert only, you don't need vacuum.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] How unsubscrib this list ?

2007-09-27 Thread Bill Moran

In response to Gerson Machado [EMAIL PROTECTED]:

 How unsubscrib this list ?
Flickr agora em português. Você clica, todo mundo vê. Saiba mais.

The information is in the mail headers:

List-Archive: http://archives.postgresql.org/pgsql-general
List-Help: mailto:[EMAIL PROTECTED]
List-ID: pgsql-general.postgresql.org
List-Owner: mailto:[EMAIL PROTECTED]
List-Post: mailto:pgsql-general@postgresql.org
List-Subscribe: mailto:[EMAIL PROTECTED]
List-Unsubscribe: mailto:[EMAIL PROTECTED]

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Help tuning a large table off disk and into RAM

2007-09-26 Thread Bill Moran

In response to James Williams [EMAIL PROTECTED]:

 I'm stuck trying to tune a big-ish postgres db and wondering if anyone
 has any pointers.
 
 I cannot get Postgres to make good use of plenty of available RAM and
 stop thrashing the disks.
 
 One main table. ~30 million rows, 20 columns all integer, smallint or
 char(2).  Most have an index.  It's a table for holding webserver
 logs.  The main table is all foreign key ids.  Row size is ~100bytes.
 
 The typical query is an aggregate over a large number of rows (~25% say).
 
  SELECT COUNT(*), COUNT(DISTINCT user_id)
  FROM table
  WHERE epoch  ...
  AND epoch  ...
  AND country = ...
 
 The box has 4 x Opterons, 4Gb RAM  five 15k rpm disks, RAID 5.  We
 wanted fast query/lookup.  We know we can get fast disk IO.
 
 Running a typical query like above seems to:
 
 * hardly tax a single CPU
 * plenty of RAM free
 * disks thrash about
 
 The last is based mostly on the observation that another tiddly
 unrelated mysql db which normally runs fast, grinds to a halt when
 we're querying the postgres db (and cpu, memory appear to have spare
 capacity).
 
 We've currently got these settings, and have tried doubling/halving
 them, restarted and benchmarked a test query.  They don't appear to
 materially alter our query time.
 
  shared_buffers  = 128MB

shared_buffers = 1.5GB

Unless you've got a lot of stuff other than PostgreSQL on this machine.

  temp_buffers= 160MB
  work_mem= 200MB
  max_stack_depth = 7MB

These look reasonable, although I can't be sure without more details.

 
 We're less concerned about insert speed.  Typically 1 or 2 users, but
 want fast queries.
 
 Perhaps a little extreme, but I'm trying to find a way to express this
 in a way that Postgres understands:
 
 * Load this table, and one or two indexes (epoch, user_id) into RAM.

Give it enough shared_buffers and it will do that.  You're estimating
the size of your table @ 3G (try a pg_relation_size() on it to get an
actual size)  If you really want to get _all_ of it in all the time,
you're probably going to need to add RAM to the machine.  With 8G, you
could allocate about 3G to shared_buffers, but that would be ignoring
the size of indexes.

However, I think you'll be surprised how much performance improves
with 1.5G of shared_buffers.  You may not need any more.  128M is
really forcing PG to work within limited space.

 * All of the table, all of those indexes.
 * Keep them there, but keep a disk based backup for integrity.
 * Run all selects against the in RAM copy.  Always.

This is what PG does if you allocate enough shared_buffers.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Building Windows fat clients

2007-09-20 Thread Bill Bartlett

I haven't used it in a year or so, although I've heard that it only
keeps getting better / more mature / more complete.

According to their Supported Platforms page (
http://www.mono-project.com/Supported_Platforms ), their current list of
supported OS's is:

* Linux
* Mac OS X
* Sun Solaris
* BSD - OpenBSD, FreeBSD, NetBSD
* Microsoft Windows 

One of these days I'm hoping to have some time to get back to using it
(instead of just using Java for cross-platform work).

- Bill

 -Original Message-
 From: Martin Gainty [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, September 20, 2000 11:00 AM
 To: Bill Bartlett; 'johnf'; pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Building Windows fat clients
 
 
 Glad to hear that
 
 I knew Novell *was* working on a port to BSD
 But I havent heard about ports to other Nix platforms?
 
 M--
 - Original Message - 
 From: Bill Bartlett [EMAIL PROTECTED]
 To: 'Martin Gainty' [EMAIL PROTECTED]; 'johnf' 
 [EMAIL PROTECTED]; 
 pgsql-general@postgresql.org
 Sent: Wednesday, September 19, 2007 10:42 PM
 Subject: Re: [GENERAL] Building Windows fat clients
 
 
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Martin 
  Gainty
  Sent: Tuesday, September 19, 2000 5:58 PM
  To: johnf; pgsql-general@postgresql.org
  Subject: Re: [GENERAL] Building Windows fat clients
 
 
  Hello Guys
 
  Using C# means .NET framework will need to be installed and your 
  webapp will only work with Microsoft OS
 
 
  Not entirely true.  The Mono project ( www.mono-project.com ) has
  implemented a decent amount of the .NET Framework in a 
 cross-platform
  environment, including much of ASP.NET.
 
 
  Be aware scripting languages such as PHP and Python will
  necessitate that
  you acquire all of the libraries for your web app..
  As long as you stay mainstream you should be ok
  But if you have specific requirements for XYZ Db that nobody
  supports or
  protocols or device drivers that nobody has written you'll
  have to write the
  libraries yourself
 
  Martin--
  - Original Message - 
  From: johnf [EMAIL PROTECTED]
  To: pgsql-general@postgresql.org
  Sent: Wednesday, September 19, 2007 5:20 PM
  Subject: Re: [GENERAL] Building Windows fat clients
 
 
   On Wednesday 19 September 2007 10:19, Scott Ribe wrote:
   I'm asking this group because we tend to think alike wrt to data
   modeling and separation of concerns ;-)
  
   Any recommendations on ORM libraries for new Windows
  development? The
   last
   time I started anything from scratch was over 10 years
  ago, and the
   state
   of the art seemed to be to smash everything together into
  event handlers
   on GUI objects. Ugh. I pulled the M of the MVC out into
  separate coherent
   classes and implemented a *very* simple ORM, leaving 
 the VC mostly
   conflated in the event handlers--which is not too bad
  since this app will
   never need to be cross-platform.
  
   So the dev tool was discontinued, some closed-source 
 libraries are
   getting
   less and less compatible by the year, and we're going to
  rewrite. Where
   to
   start? It's a custom Windows-only app, only installed at
  one site. Using
   .NET would be fine. C# or C++ would be most-preferred
  language choices,
   although we could suck it up and use Java. I don't want to
  put VB on the
   table.
  
   Leaning toward Visual Studio .NET because I know it 
 will be around
   (in whatever morphed form) for a while; but also considering
   Borland's supposedly revitalized C++ tools because I used
  C++ Builder
   with success back when MS C++ compilers were still 
 awful. I should
   probably mention that the Windows apps, with the 
 exception of one
   complicated explore customer's
   entire history here screen, are pretty simple; the
  complexity is in
   reports and stored procedures.
  
   Suggestions where to start?
   If you like python you might want to check www.dabodev.com.
   Dabo was
   designed
   to access data.
   -- 
   John Fabiani
  
   ---(end of
   broadcast)---
   TIP 6: explain analyze is your friend
  
 
 
  ---(end of
  broadcast)---
  TIP 2: Don't 'kill -9' the postmaster
 
 
 
 
  ---(end of 
 broadcast)---
  TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org/
  
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] autovacuum

2007-09-20 Thread Bill Moran

In response to Robert Fitzpatrick [EMAIL PROTECTED]:

 I have a large database used with our mail filter. The pg_dumpall
 results in about 3GB with this being the only database in the system
 besides templates and postgres.
 
 I do a vacuum every night after backup and it takes about an hour, is
 this normal for this size db?

normal is relative.  If it's taking an hour to vacuum 3G, I would say
that either your hardware is undersized/badly configured, or you're
not vacuuming often enough.

 I also have autovacuum enabled and when it
 is running during the day, our mail queues will tend to fill up with
 slow response to the server. Should I have autovacuum on even if I am
 vacuuming the db every night?

I'm not aware of any problems with autovaccum and scheduled vacuum
working together.

That doesn't mean you're vacuuming often enough, however.  Switch your
nightly vacuum to vacuum verbose and capture the output to see how much
work it has to do.  Are your fsm settings high enough?

 Let me know if you need more specifics. Just trying to get some feedback
 on if my vacuum is taking too long or if both are necessary...thanks for
 the help!

How much RAM does the system have?  What's your shared_buffer settings?
What's your maintenance_work_mem set to?

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] autovacuum

2007-09-20 Thread Bill Moran

In response to Robert Fitzpatrick [EMAIL PROTECTED]:

 On Thu, 2007-09-20 at 13:22 -0400, Bill Moran wrote:
  That doesn't mean you're vacuuming often enough, however.  Switch your
  nightly vacuum to vacuum verbose and capture the output to see how much
  work it has to do.  Are your fsm settings high enough?

snip

 INFO:  free space map contains 30078 pages in 41 relations
 DETAIL:  A total of 30304 page slots are in use (including overhead).
 30304 page slots are required to track all free space.
 Current limits are:  204800 page slots, 1000 relations, using 1265 kB.

This was what I was most concerned about.  If your FSM settings are too
low, vacuum won't be able to fully do its job.  But it looks like you're
OK here.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] autovacuum

2007-09-20 Thread Bill Moran

In response to Robert Fitzpatrick [EMAIL PROTECTED]:

 On Thu, 2007-09-20 at 13:22 -0400, Bill Moran wrote:
  In response to Robert Fitzpatrick [EMAIL PROTECTED]:
  
   I have a large database used with our mail filter. The pg_dumpall
   results in about 3GB with this being the only database in the system
   besides templates and postgres.
   
   I do a vacuum every night after backup and it takes about an hour, is
   this normal for this size db?
  
  normal is relative.  If it's taking an hour to vacuum 3G, I would say
  that either your hardware is undersized/badly configured, or you're
  not vacuuming often enough.
 
 It is a dual P4 processor supermicro server with 2GB of RAM, so I will
 need to go over the configuration then? I didn't think it should take so
 long...

Why does everyone leave of the IO subsystem?  It's almost as if many
people don't realize that disks exist ...

With 2G of RAM, and a DB that's about 3G, then there's at least a G of
database data _not_ in memory at any time.  As a result, disk speed is
important, and _could_ be part of your problem.  You're not using RAID
5 are you?

   Let me know if you need more specifics. Just trying to get some feedback
   on if my vacuum is taking too long or if both are necessary...thanks for
   the help!
  
  How much RAM does the system have?  What's your shared_buffer settings?
  What's your maintenance_work_mem set to?
 
 Yes, this is the first time I've had to do any tuning to pgsql, so I
 most likely need help in this area. This is 8.2.4 on a FreeBSD 6.2
 server...here are those settings currently below. I also had to tweak
 BSD loader.conf to allow the changes to work...
 
 max_connections = 250
 max_fsm_pages = 204800
 shared_buffers = 128MB

Unless this machine runs programs other than PostgreSQL, raise this to
about 650MB.  You might get better performance from even higher values.
The rule of thumb is allocate 1/4 - 1/3 of the available RAM to
shared_buffers ... subtract the RAM that other programs are using first.

 effective_cache_size = 256MB

More like 1300MB (again, unless this machine is doing other things)

 work_mem = 64MB
 maintenance_work_mem = 256MB
 
 mx1# cat /etc/loader.conf
 kern.ipc.semmni=256
 kern.ipc.semmns=512
 kern.ipc.semmnu=256
 mx1# cat /etc/sysctl.conf 
 # tuning for PostgreSQL
 kern.ipc.shm_use_phys=1
 kern.ipc.shmmax=1073741824
 kern.ipc.shmall=262144
 kern.ipc.semmsl=512
 kern.ipc.semmap=256
 
 If I don't have it listed above, then it is default settings for
 anything else.

Watch the system during vacuum to see if it's blocking on IO or CPU.
systat, vmstat, iostat, and top (use 'm' to switch views) are all
good utilities to check on this.

Another possibility is that autovac isn't configured correctly.  Watch
your PostgreSQL logs to see if it's running at all.  If it is, turn up
the logging level until it tells you which tables it's vacuuming.  You
may have to tweak the thresholds to make it more aggressive.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Building Windows fat clients

2007-09-19 Thread Bill Bartlett

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Martin Gainty
 Sent: Tuesday, September 19, 2000 5:58 PM
 To: johnf; pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Building Windows fat clients

 Hello Guys

 Using C# means .NET framework will need to be installed and 
 your webapp will 
 only work with Microsoft OS

Not entirely true.  The Mono project ( www.mono-project.com ) has
implemented a decent amount of the .NET Framework in a cross-platform
environment, including much of ASP.NET. 

 Be aware scripting languages such as PHP and Python will 
 necessitate that 
 you acquire all of the libraries for your web app..
 As long as you stay mainstream you should be ok
 But if you have specific requirements for XYZ Db that nobody 
 supports or 
 protocols or device drivers that nobody has written you'll 
 have to write the 
 libraries yourself

 Martin--
 - Original Message - 
 From: johnf [EMAIL PROTECTED]
 To: pgsql-general@postgresql.org
 Sent: Wednesday, September 19, 2007 5:20 PM
 Subject: Re: [GENERAL] Building Windows fat clients

  On Wednesday 19 September 2007 10:19, Scott Ribe wrote:
  I'm asking this group because we tend to think alike wrt to data 
  modeling and separation of concerns ;-)

  Any recommendations on ORM libraries for new Windows 
 development? The
  last
  time I started anything from scratch was over 10 years 
 ago, and the 
  state
  of the art seemed to be to smash everything together into 
 event handlers
  on GUI objects. Ugh. I pulled the M of the MVC out into 
 separate coherent
  classes and implemented a *very* simple ORM, leaving the VC mostly
  conflated in the event handlers--which is not too bad 
 since this app will
  never need to be cross-platform.

  So the dev tool was discontinued, some closed-source libraries are
  getting
  less and less compatible by the year, and we're going to 
 rewrite. Where 
  to
  start? It's a custom Windows-only app, only installed at 
 one site. Using
  .NET would be fine. C# or C++ would be most-preferred 
 language choices,
  although we could suck it up and use Java. I don't want to 
 put VB on the
  table.

  Leaning toward Visual Studio .NET because I know it will be around 
  (in whatever morphed form) for a while; but also considering 
  Borland's supposedly revitalized C++ tools because I used 
 C++ Builder 
  with success back when MS C++ compilers were still awful. I should 
  probably mention that the Windows apps, with the exception of one 
  complicated explore customer's
  entire history here screen, are pretty simple; the 
 complexity is in
  reports and stored procedures.

  Suggestions where to start?
  If you like python you might want to check www.dabodev.com. 
  Dabo was
  designed
  to access data.
  -- 
  John Fabiani

  ---(end of 
  broadcast)---
  TIP 6: explain analyze is your friend

 ---(end of 
 broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?

2007-09-18 Thread Bill Moseley

I'm just starting with pgcrypto, and I'm curious if it's
needed/recommended to use an initialization vector/value (IV) with
the pgp_sym_encrypt() function.

The docs hint that an IV is used automatically, but encrypting plain
text that starts the same seems to result in initial common cipher
text.  So, I'm not clear.

2. Data is prefixed with block of random bytes. This is equal to
using random IV.

So, I'm currently generating a substring of a md5 hash of a few items
and pre-pending that to the plain text I need to encrypt as the IV.
Then when I decrypt I remove that prefix.


BTW, this is for credit card storage, which is a business requirement.

Besides following the PCI DSS and external audit procedures, the plan
is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a
credit card storage server.  The server and db are SSL only and the
key is passed from the application and never stored anyplace (except
in memcached on other servers during the session).  The key is a
user's plain text password plus an application-specific secret.  So,
each row has its own key.  Passwords must be changed periodically,
etc.

I'd welcome any comments or recommendations from others that have
implemented something similar.

Thanks,

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Bill Moran

Filip Rembiałkowski [EMAIL PROTECTED] wrote:

 2007/9/18, Joshua D. Drake [EMAIL PROTECTED]:
 
  If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
  you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
  the issue.
 
 Are you sure? I have a situation where above is no true. postgres
 version 8.1.8. while vacuum verbose says:
 
 INFO:  free space map contains 2329221 pages in 490 relations
 DETAIL:  A total of 2345744 page slots are in use (including overhead).
 2345744 page slots are required to track all free space.
 Current limits are:  1000 page slots, 1000 relations, using 58698 KB.
 
 ... and we have constant problem with index bloat and need to REINDEX
 frequently.
 
 the database is very redundant and has quite hight data retention rate
 (it's an ecommerce site)

I've been involved in a number of the discussions on this, and I think
part of the confusion stems from the fact that index bloat is an
ambiguous term.

If the index gets large enough that it no longer fits in shared memory,
and reindexing it will reduce its size to where it _will_ fit in shared
memory, then the index _could_ be said to be bloated.

However, an equally valid solution to that problem is to increase the
amount of shared memory available (possibly by adding RAM).

Unfortunately, folks like Phoenix are looking for yes/no answers, and
with many of these questions, the _correct_ answer is it depends on
your workload

If you find that reindexing improves performance, then you should
investigate further.  Depending on the exact nature of the problem,
there are many possible solutions, three that come to mind:
* Add RAM/SHM
* REINDEX on a regular schedule
* (with newer version) reduce the fill factor and REINDEX

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Bill Moran

In response to Phoenix Kiula [EMAIL PROTECTED]:

  If you find that reindexing improves performance, then you should
  investigate further.  Depending on the exact nature of the problem,
  there are many possible solutions, three that come to mind:
  * Add RAM/SHM
 
 Can I add SHM with merely by managing the entry in sysctl.conf? My
 current values:
 
 kernel.shmmax = 536870912
 kernel.shmall = 536870912

These values define the max allowed.  They exist to keep poorly written
applications from sucking up all the available memory.  Setting them
higher than is needed does not cause any problems, unless a greedy or
poorly-written application grabs all that memory.

 My shared_buffers in postgresql.conf is 2. From the website
 http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
 should be sharedbuffer*8192, so I suppose my shmmax can be much lower
 than the above, but I raised it for performance. Am I wrong to do so?

It's completely impossible to tell without knowing more about your
physical hardware.  The rule of thumb is 1/3 physical RAM to start, then
adjust if more or less seems to help.  That advice is for versions of
PG = 8.  If you're still running a 7.X version, upgrade.

How much RAM does this system have in it?  Unless you have other
applications running on this system using RAM, you should allocate
more of it to shared_buffers.  If 160M is 1/3 your RAM, you probably
need to add more RAM.

How big is your database?  If it's possible to fit it all in
shared_buffers, that will give you the best performance.  

  * REINDEX on a regular schedule
 
 This is sadly not really feasible, because we need to offer a 100%
 availability website. REINDEX does not work concurrently so it is not
 really an option for us. My max_fsm_pages and max_fsm_relations are
 way above the numbers that come up after the VACUUM ANALYZE VERBOSE
 run.

Hence my comment about depending on your workload and investigating
the situation to determine the best solution.  

 But still, the autovacuum stuff seems like it is not working at all.
 Some related entries in the conf file:
 
 autovacuum   = on
 autovacuum_vacuum_cost_delay = 20
 vacuum_cost_delay= 20
 autovacuum_naptime   = 30
 stats_start_collector= on
 stats_row_level  = on
 autovacuum_vacuum_threshold  = 80
 autovacuum_analyze_threshold = 80
 
 And yet, the db often slows down, at which point I manually login and
 run a manual VACUUM ANALYZE and it seems fine for some more time.
 Sometimes, I also restart pgsql and that seems to help for a while.

You don't mention *_scale_factor settings.  Those are going to be
important as well.  Based on your symptoms, it sounds like autovacuum
is not getting those tables vacuumed enough.  I recommend raising the
debug level and watching the logs to see if autovacuum is actually
getting tables vacuumed.  Consider lowering your *_scale_factor values
if not.  Or even reducing autovacuum_naptime.

 Another advice on these forums is to see vmstat 1, without actually
 specifying how to draw inferences from it. The free version of it is
 coming up at decent rates, as follows:
 
 
 procs ---memory-- ---swap-- -io --system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
  1  0  29124 110760 108980 346773601   206   1400 4  2  1 85 
 12
  0  0  29124 110632 108980 346773600 0 0 1052   108  0  0 100 
  0
  2  0  29124 108840 108980 346773600 0 0 1112   299  1  1 98   0
  1  0  29124 109288 108980 346773600 0 0 1073   319  2  1 98   0
 .

Explaining how to interpret the output of this command and determine
what to do with it is not something easily done in a short paragraph.
However, it looks like you've got a lot of RAM being used for the disk
cache.  That memory would probably be better used as shared_buffers, so
I suggest you increase that value considerably.

  * (with newer version) reduce the fill factor and REINDEX
 
 I think some of my tables are updated very frequently so a smaller
 fill factor will be nice. How can I find the current fill factor on my
 tables? Also, is there some method or science to calculating a decent
 fill factor -- size of table, number of indexes, frequency of updates,
 and such? We have one major table which faces a lot of INSERTs and
 UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10
 million).

I'm not sure how to find the current value, but a smaller fill factor
on busy tables should lead to less fragmentation, thus more efficient
indexes over time.  Keep in mind that a smaller fill factor will also
lead to larger indexes initially.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

index fillfactor (was Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER)

2007-09-18 Thread Bill Moran

In response to Phoenix Kiula [EMAIL PROTECTED]:

 Thanks for a very informative post! One question:
 
  I'm not sure how to find the current value, but a smaller fill factor
  on busy tables should lead to less fragmentation, thus more efficient
  indexes over time.  Keep in mind that a smaller fill factor will also
  lead to larger indexes initially.
 
 What constitutes a small fill factor? Would 70 be good?

Unfortunately, I can't say.  I have not yet had the opportunity to
experiment with different fillfactors, so I can only speak in vague
estimations on this topic.

 I guess my
 current must have been the default, which the manual says is 100.

I expect it's at the default, but the docs say that is 90%:
http://www.postgresql.org/docs/8.2/static/sql-createindex.html

Where did you see 100?

 Or
 did you mean really small fill factor like 20? In this context, what
 is packing in the manual -- is that some kind of compression?

Hopefully, someone more knowledgeable will chime in with some wise
suggestions.  Barring that, I can only suggest you experiment to find
what works for your workload, but don't rule out the possibility that
extremely low fillfactor values might work well for you.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?

2007-09-18 Thread Bill Moseley

On Tue, Sep 18, 2007 at 11:26:17AM +0300, Marko Kreen wrote:
 Few bytes being same is normal.  Those are PGP packet header,
 telling this is symmetrically encrypted session key packet,
 with length X plus some more details.

I see.  So, you are saying no need to generate my own IV to prepend to
the plain text before encrypting.

 If more that few bytes are same, and if the salt is not randomised
 it _could_ be a sign of problem. Either pgcrypto bug or failure
 to initialize random generator.  If you suspect a problem, please
 send me few example encryptions with keys and your setup details
 (postgres version, openssl or not, os version)

No, it was only a few bytes that were similar, so the headers explain
that.

  Besides following the PCI DSS and external audit procedures, the plan
  is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a
  credit card storage server.  The server and db are SSL only and the
  key is passed from the application and never stored anyplace (except
  in memcached on other servers during the session).  The key is a
  user's plain text password plus an application-specific secret.  So,
  each row has its own key.  Passwords must be changed periodically,
  etc.
 
 I don't know details of your setup, but I strongly suggest you
 look into using public-key crypto.  That allow you separate keys
 for encryption and decryption.  So in webserver where users only
 input credit cards, you keep only public keys, so anybody cracking
 that won't be able to decrypt data.

I need to look at that more.  But I've seen that suggested where one
needs to decrypt the data at a later time.  We don't have that need.


Our plan was to never store any keys.  Every user must log in to the
application with a password.  Their account passwords are only stored
hashed on disk, so we don't know their passwords.

The plan is to encrypt their plain-text password with a secret known
by the application only and stored into memcached.  It's this
plain-text password that will be sent to a separate server to encrypt
and (and decrypt) their credit card data when the user make a
transaction.

We only need to store the credit card data to allow subsequent charges
to their card on file -- and that only happens when a user logs in
and processes a transaction.  We don't have any way to decrypt the
data without this password stored in the session.

If someone hacks an application server they could pluck active user's
passwords from memcached and also find the application's secret word.

Then if they also hacked the credit card server they could then
decrypt the data using passwords they were able to sniff.

See any glaring holes?

Thanks for the help!

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] constrains on two tables

2007-09-14 Thread Bill Moran

In response to finecur [EMAIL PROTECTED]:

 On Sep 10, 9:55 pm, finecur [EMAIL PROTECTED] wrote:
  Hi
 
  Here is my table:
 
  Table School
  (
  id integer,
  name text
  );
 
  Table Department
  (
  id integer,
  school_id integer reference school(id),
  name text
  );
 
  Table Course
  (
  department_id integer references department(id),
  name text,
  course_number text
  )
 
  I would like to make the course_number unique, but not in the course
  table, nor in department. I would like to make the course_number
  unique in the scope of School. So, you may see the same course_number
  in Course table, but (school_id, course_number) should be always
  unique. How can I make the constrain?

ALTER TABLE Course PRIMARY KEY(school_id, course_number);

and add the school_id column to the Course table.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Security Advances in Postgresql over other RDBMS

2007-09-07 Thread Bill Moran

In response to Jasbinder Singh Bali [EMAIL PROTECTED]:

 Hi,
 
 The way postgres has the concept of host base authentication, is this a step
 forward over other RDBMS like sql server and oracle?

Host-based auth has been around a long time.  It's just another layer.
Some folks even consider HBA antiquated, which may be why MS doesn't
use it.  It's difficult to use if you've got a DHCP driven network where
IPs change a lot.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)

2007-09-01 Thread Bill Moran

Markus Schiltknecht [EMAIL PROTECTED] wrote:

 Hi,
 
 Bill Moran wrote:
  While true, I feel those applications are the exception, not the rule.
  Most DBs these days are the blogs and the image galleries, etc.  And
  those don't need or want the overhead associated with synchronous
  replication.
 
 Uhm.. do blogs and image galleries need replication at all?

Ever read anything on how myspace is laid out?  The big ones need
replication to handle the traffic.

 I'm thinking more of the business critical applications, where high 
 availability is a real demand - and where your data *should* better be 
 distributed among multiple data centers just to avoid a single point of 
 failure.

Agreed.  Judged by importance, they're bigger.  Judged by raw numbers, I
suspect that they're less prominent than the blogs and image galleries.
I guess it depends on your viewpoint.

 rant for most other stuff MySQL is good enough /rant

I hate when people say that.  Any company that says that is of limited
viability in my opinion.  For one thing, saying it's good enough is
setting yourself up for trouble when your company expands its
requirements.

  It's synchronous for the reason you describe,
  but it's asynchronous because a query that has returned successfully
  is not _guaranteed_ to be committed everywhere yet.  Seems like we're
  dealing with a limitation in the terminology :)
 
 Certainly! But sync and async replication are so well known and used 
 terms... on the other hand, I certainly agree that in Postgres-R, the 
 nodes do not process transactions synchronously, but asynchronous.

Good point.

 Maybe it's really better to speak of eager and lazy replication, as in 
 some literature (namely the initial Postgres-R paper of Bettina Kemme).
 
  This could potentially be a problem on (for example) a web application,
  where a particular user's experience may be load-balanced to another
  node at any time.  Of course, you just have to write the application
  with that knowledge.
 
 IMO, such heavily dynamic load-balancing is rarely useful.
 
 With application support, it's easily doable: let the first transaction 
 on node A query the (global) transaction identifier and after connecting 
 to the next node B, ask that to wait until that transaction has committed.
 
 It gets a little harder without application support: the load balancer 
 would have to keep track of sessions and their last (writing) transaction.
 
 Again, thank you for pointing this out.

Keep up the good work.  I wish I had some spare cycles to work on
Postgres-R -- it seems like a very interesting problem.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Export data to MS Excel

2007-09-01 Thread Bill Bartlett

For quick/simple table format reports, you can just use psql to create
the output in HTML format, then import that directly into Excel.  For
example, I use the following psql line to generate an HTML-format report
of server IP information; this file can then be directly opened in
Excel.  (Excel 2002 and later will open and parse most HTML format files
without even needing to import them -- just open the file.)

   psql -H -P tableattr='cellspacing=0 cellpadding=6' -P title='Server
IP Information' -f get_server_ip_info.sql -o get_server_ip_info.html

Alternatively, you can bury the formatting commands in the SQL file
itself -- this is handy if the formatting is longer or doesn't change,
or you want different formats or titles for different reports.  For
example:

 list_of_unsigned_images_report.sql:
 \pset format html
 \pset title 'List of Unsigned Images'
 \pset tableattr 'cellspacing=0 cellpadding=6'
 SELECT p.last_name, p.first_name, [blah blah blah]...

- Bill

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Phoenix Kiula
 Sent: Saturday, September 01, 2007 5:24 AM
 To: Ashish Karalkar
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Export data to MS Excel
 
 
 On 01/09/07, Ashish Karalkar 
 [EMAIL PROTECTED] wrote:
 
 
  Hello All,
  I want to export data from PostgreSQL tables to MS Excel.
  Is there any way?
 
 
 
 Sure, write SQL in a program (php, perl, jsp, asp) to dump 
 the tables in HTML tabletrtd rows format. Then import 
 that HTML page program into Excel from Tools -- Data 
 Sources. Once you save the xls file, you can always just 
 refresh the data because it already points to your program.
 
 I guess there must be other ways, but this one works fine for 
 me. Excel has a limit of 65,700 rows (or thereabouts) so it's 
 not a huge deal for a db like pgsql.
 
 ---(end of 
 broadcast)---
 TIP 6: explain analyze is your friend
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)

2007-09-01 Thread Bill Moran

chris smith [EMAIL PROTECTED] wrote:

  Ever read anything on how myspace is laid out?  The big ones need
  replication to handle the traffic.
 
 Actually no.
 
 http://highscalability.com/livejournal-architecture
 
 Using MySQL replication only takes you so far. (Yeh it's mysql but
 the point is valid regardless).
 You can't keep adding read slaves and scale.
 
 A lot use sharding now to keep scaling (limiting to X users/accounts
 per database system and just keep adding more database servers for the
 next X accounts).

I got the impression that they hadn't moved _all_ of their DB needs to
sharding.  Just the ones that exceeded the scalability of replication,
but they don't explicitly say, IIRC.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)

2007-08-28 Thread Bill Moran

In response to Markus Schiltknecht [EMAIL PROTECTED]:

 Hi,
 
 Bill Moran wrote:
  First off, clustering is a word that is too vague to be useful, so
  I'll stop using it.  There's multi-master replication, where every
  database is read-write, then there's master-slave replication, where
  only one server is read-write and the rest are read-only.  You can
  add failover capabilities to master-slave replication.  Then there's
  synchronous replication, where all servers are guaranteed to get
  updates at the same time.  And asynchronous replication, where other
  servers may take a while to get updates.  These descriptions aren't
  really specific to PostgreSQL -- every database replication system
  has to make design decisions about which approaches to support.
 
 Good explanation!
 
  Synchronous replication is only
  really used when two servers are right next to each other with a
  high-speed link (probably gigabit) between them.
 
 Why is that so? There's certainly very valuable data which would gain 
 from an inter-continental database system. For money transfers, for 
 example, I'd rather wait half a second for a round trip around the 
 world, to make sure the RDBS does not 'loose' my money.

While true, I feel those applications are the exception, not the rule.
Most DBs these days are the blogs and the image galleries, etc.  And
those don't need or want the overhead associated with synchronous
replication.

  PostgreSQL-R is in development, and targeted to allow multi-master,
  asynchronous replication without rewriting your application.  As
  far as I know, it works, but it's still beta.
 
 Sorry, this is nitpicking, but for some reason (see current naming 
 discussion on -advocacy :-) ), it's Postgres-R.

Sorry.

 Additionally, Postgres-R is considered to be a *synchronous* replication 
 system, because once you get your commit confirmation, your transaction 
 is guaranteed to be deliverable and *committable* on all running nodes 
 (i.e. it's durable and consistent). Or put it another way: asynchronous 
 systems have to deal with conflicting, but already committed 
 transactions - Postgres-R does not.

I find that line fuzzy.  It's synchronous for the reason you describe,
but it's asynchronous because a query that has returned successfully
is not _guaranteed_ to be committed everywhere yet.  Seems like we're
dealing with a limitation in the terminology :)

 Certainly, this is slightly less restrictive than saying that a 
 transaction needs to be *committed* on all nodes, before confirming the 
 commit to the client. But as long as a database session is tied to a 
 node, this optimization does not alter any transactional semantics. And 
 despite that limitation, which is mostly the case in reality anyway, I 
 still consider this to be synchronous replication.

This could potentially be a problem on (for example) a web application,
where a particular user's experience may be load-balanced to another
node at any time.  Of course, you just have to write the application
with that knowledge.

 [ To get a strictly synchronous system with Postgres-R, you'd have to 
 delay read only transactions on a node which hasn't applied all remote 
 transactions, yet. In most cases, that's unwanted. Instead, a consistent 
 snapshot is enough, just as if the transaction started *before* the 
 remote ones which still need to be applied. ]

Agreed.

  BTW: does anyone know of a link that describes these high-level concepts?
  If not, I think I'll write this up formally and post it.
 
 Hm.. somewhen before 8.3 was released, we had lots of discussions on 
 -docs about the high availability and replication section of the 
 PostgreSQL documentation. I'd have liked to add these fundamental 
 concepts, but Bruce - rightly - wanted to keep focused on existing 
 solutions. And unfortunately, most existing solutions are async, 
 single-master. So explaining all these wonderful theoretic concepts only 
 to state that there are no real solutions would have been silly.

Someone else posted a link, and the docs look pretty comprehensive at this
point ... enough so that I'm not going to bother writing up my own
explanation.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Can this function be declared IMMUTABLE?

2007-08-28 Thread Bill Moran

In response to [EMAIL PROTECTED]:

 Hello,
 
 I have a question about whether I can safely declare a function IMMUTABLE. 
  Citing the PostgreSQL documentation under Function Volatility 
 Categories in the section on Extending SQL:
 
 It is generally unwise to select from database tables within an IMMUTABLE 
 function at all, since the immutability will be broken if the table 
 contents ever change.
 
 
 Well, I am considering a function that does select from a table, but the 
 table contents change extremely infrequently (the table is practically a 
 list of constants).  Would it be safe to declare the function IMMUTABLE 
 provided that the table itself is endowed with a trigger that will drop 
 and recreate the function any time the table contents are modified?  In 
 this way, it seems that the database would gain the performance benefit of 
 an immutable function for the long stretches of time in between changes to 
 the table.

Is this a table that will only change during upgrades/maintenance?  If so,
then immutable is probably safe, as the table will change under
controlled circumstances.

The utmost gauge of this is what happen if the function is immutable and
the data _does_ change?  if the result of such a scenario is acceptable,
then you can probably use immutable.

Another rule to take into account is the Law of Premature Optimization.
The law states that trying to optimize too soon will cause pain.  Have
you determined that the extra performance gain that immutable will give
you is even necessary?  If not, then start out with a more conservative
approach and approach the immutability problem _if_ you see performance
issues.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Turning off atime on PostgreSQL DB volumes

2007-08-28 Thread Bill Moran

In response to Keaton Adams [EMAIL PROTECTED]:

 After reading several articles on the performance drag that Linux atime
 has on file systems we would like to mount our DB volumes with the
 noatime parameter to see just what type of a performance gain we will
 achieve.  Does PostgreSQL use atime in any way when reading/writing
 data?  If we turn off/disable atime on the DB volumes will that cause
 any type of issue at all with PostgreSQL 8.1 on Red Hat Enterprise
 Linux? 

I frequently run with noatime and have never noticed any problems.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] EXPLAIN ANALYZE not considering primary and unique indices!

2007-08-27 Thread Bill Moran

In response to Sanjay [EMAIL PROTECTED]:

 Hi All,
 
 Say I have a simple table WEBSITE(website_id int4 PRIMARY KEY, name
 VARCHAR(30)). While I try this:
 
 EXPLAIN ANALYZE SELECT * FROM WEBSITE WHERE website_id = 1
 
 the output is:
 --
 Seq Scan on website  (cost=0.00..1.31 rows=1 width=162) (actual
 time=0.047..0.051 rows=1 loops=1)
   Filter: (website_id = 1)
 Total runtime: 0.102 ms
 ---
 Wondering why it is not using the index, which would have been
 automatically created for the primary key.

Because PG thinks the seq scan is faster than an index scan.

Depending on other factors, it may be right.  If there's only a few
rows in the table, then a seq scan is going to be faster than
scanning the index, only to grab most of the table in to memory
anyway.

If it's wrong, it's either because your analyze data isn't up to date,
or your tuning parameters don't match your hardware.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Geographic High-Availability/Replication

2007-08-27 Thread Bill Moran

In response to Markus Schiltknecht [EMAIL PROTECTED]:

 Hi,
 
 Bill Moran wrote:
  I'm curious as to how Postgres-R would handle a situation where the
  constant throughput exceeded the processing speed of one of the nodes.
 
 Well, what do you expect to happen? This case is easily detectable, but 
 I can only see two possible solutions: either stop the node which is to 
 slow or stop accepting new transactions for a while.

It appears as if I miscommunicated my point.  I'm not expecting
PostgreSQL-R to break the laws of physics or anything, I'm just
curious how it reacts.  This is the difference between software
that will be really great one day, and software that is great now.

Great now would mean the system would notice that it's too far behind
and Do The Right Thing automatically.  I'm not exactly sure what The
Right Thing is, but my first guess would be force the hopelessly
slow node out of the cluster.  I expect this would be non-trivial,
as you've have to have a way to ensure it was a problem isolated to
a single (or few) nodes, and not just the whole cluster getting hit
with unexpected traffic.

 This technique is not meant to allow nodes to lag behind several 
 thousands of transactions - that should better be avoided. Rather it's 
 meant to decrease the commit delay necessary for synchronous replication.

Of course not, that's why the behaviour when that non-ideal situation
occurs is so interesting.  How does PostgreSQL-R fail?  PostgreSQL
fails wonderfully: A hardware crash will usually result in a system
that can recover without operator intervention.  In a system like
PostgreSQL-R, the failure scenarios are more numerous, and probably
more complicated.

  I can see your system working if it's just spike loads and the slow
  nodes can catch up during slow periods, but I'm wondering about the
  scenarios where an admin has underestimated the hardware requirements
  and one or more nodes is unable to keep up.
 
 Please keep in mind, that replication per se does not speed your 
 database up, it rather adds a layer of reliability, which *costs* some 
 performance. To increase the transactional throughput you would need to 
 add partitioning to the mix. Or you could try to make use of the gained 
 reliability and abandon WAL - you won't need that as long as at least 
 one replica is running - that should increase the single node's 
 throughput and therefore the cluster's throughput, too.

I understand.  I'm not asking it to do something it's not designed to.
At least, I don't _think_ I am.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Geographic High-Availability/Replication

2007-08-26 Thread Bill Moran

Markus Schiltknecht [EMAIL PROTECTED] wrote:

 Hi,
 
 Gregory Stark wrote:
  Only if your application is single-threaded. By single-threaded I don't 
  refer
  to operating system threads but to the architecture. If you're processing a
  large batch file handling records one by one and waiting for each commit
  before proceeding then it's single threaded. If you have a hundred 
  independent
  clients on separate connections doing separate things then each one of them
  could get 6tps. Which you have will depend on your application and your 
  needs,
  it may not be something you can change.
 
 Correct.
 
 Plus, as in the implementation of Postgres-R, performance is *not* bound 
 to the slowest node. Instead, every node can process transactions at 
 it's own speed. Slower nodes might then have to queue transactions from 
 those until they catch up again.

I'm curious as to how Postgres-R would handle a situation where the
constant throughput exceeded the processing speed of one of the nodes.

I can see your system working if it's just spike loads and the slow
nodes can catch up during slow periods, but I'm wondering about the
scenarios where an admin has underestimated the hardware requirements
and one or more nodes is unable to keep up.

Just musing, really.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)

2007-08-25 Thread Bill Moran

Phoenix Kiula [EMAIL PROTECTED] wrote:

 We're moving from MySQL to PG, a move I am rather enjoying, but we're
 currently running both databases. As we web-enable our financial
 services in fifteen countries, I would like to recommend the team that
 we move entirely to PG.
 
 In doing research on big installations of the two databases, I read
 this from a MySQL senior exec on Slashdot:

Senior MySQL exec means this is a marketing blurb, which means it's
exaggerated, lacking any honest assessment of challenges and difficulties,
and possibly an outright lie.  I've no doubt that MySQL can do clusters
if you know what you're doing, but if you want the facts, you're going
to have to look deeper than that obviously biased quote.  I seem to
remember a forum thread with someone having considerable difficulty
with MySQL cluster, and actual MySQL employees jumping in to try to
help and no solution ever found.  Anyone have that link lying around?

In any event, replication is a large and complex topic.  To do it well
takes research, planning, and know-how.  Anyone who tells you their
solution will just drop in and work is either lying or charging you
a bunch of money for their consultants to investigate your scenario
and set it up for you.

First off, clustering is a word that is too vague to be useful, so
I'll stop using it.  There's multi-master replication, where every
database is read-write, then there's master-slave replication, where
only one server is read-write and the rest are read-only.  You can
add failover capabilities to master-slave replication.  Then there's
synchronous replication, where all servers are guaranteed to get
updates at the same time.  And asynchronous replication, where other
servers may take a while to get updates.  These descriptions aren't
really specific to PostgreSQL -- every database replication system
has to make design decisions about which approaches to support.

PostgreSQL has some built-in features to allow synchronous multi-master
database replication.  Two-phase commit allows you to reliably commit
transactions to multiple servers concurrently, but it requires support
at the application level, which will require you to rewrite any existing
applications.

Pgcluster is multi-master synchronous replication, but I believe it's
still in beta.

Note that no synchronous replication system works well over
geographically large distances.  The time required for the masters
to synchronize over (for example) the Internet kills performance to
the point of uselessness.  Again, this is not a PostgreSQL problem,
MSSQL suffers the same problem.  Synchronous replication is only
really used when two servers are right next to each other with a
high-speed link (probably gigabit) between them.

PostgreSQL-R is in development, and targeted to allow multi-master,
asynchronous replication without rewriting your application.  As
far as I know, it works, but it's still beta.

pgpool supports multi-master synchronous replication as well as
failover.

Slony supports master-slave asynchronous replication and works _very_
well over long distances (such as from an east coast to a west coast
datacenter)

Once you've looked at your requirements, start looking at the tool
that matches those requirements, and I think you'll find what you
need.

BTW: does anyone know of a link that describes these high-level concepts?
If not, I think I'll write this up formally and post it.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Apache + PHP + Postgres Interaction

2007-08-24 Thread Bill Moran

In response to Hannes Dorbath [EMAIL PROTECTED]:

  On 24.08.2007 02:43, Bill Moran wrote:
  Hannes Dorbath [EMAIL PROTECTED] wrote:
  Bill Moran wrote:
  I guess I just feel that broken is a bit of a harsh term.  If
  your expectations are for full-blown connection management from
  pconnect(), then you will be disappointed.  If you take it for
  what it is: persistent connections, then those limitations would
  be expected.
  It's broken because persistent connections get randomly garbage
  collected where they should not. So broken in the sense of bugged.
  Expect connections to die for no reason, especially under load.
  
  It's funny that you should mention that, since I haven't seen that
  behaviour in 18 months of load testing over a dozen servers.
 
 Please reply to the list as well.

Your reply to me did not have the list in the CC.

 How did you verify that? It will spawn a new connection silently, if the 
 old got dropped. Did you really verify your logs, that you don't get 
 more new connections than Apache spawns workers? This might not be 
 noticeable for you, if you are running Apache. In a FCGI environment 
 where you have a fixed amount of workers, you notice new connections, as 
 there should not be any.

As I stated in the other reply to an email that looked similar to this
one -- I'm not sure I understand the behaviour you're trying to describe.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-24 Thread Bill Thoen

I'm a bit out of my depth with using these debugging tools and 
interpreting their results, but I think the problem is due to the output 
being just too big for interactive display. Using the same query with 
tighter limits in the WHERE clause works perfectly. When I changed the 
SQL script to write output into a table it worked with the same query 
using even looser limits in the WHERE clause. So sending output to a 
table instead of to the monitor when the queries produce a large amount 
of output is reliable, faster and doesn't tie up the machine.


I tried using strace, but it produced so much telemetry and 
unfortunately I couldn't understand it anyway that I don't think this 
would do me any good. I don't want to bug the PostgreSQL list with a 
problem that's probably not a PostgreSQL one, but if someone here would 
be willing to help me track down this apparent popen or fork problem I'd 
appreciate it. However, I managed to get the results I needed, so we 
could also call this fixed via workaround.


Thanks for the help, Tom and others!
- Bill Thoen

Tom Lane wrote:

Bill Thoen [EMAIL PROTECTED] writes:
  

(gdb) bt
#0  0x003054264571 in fputc () from /lib64/libc.so.6
#1  0x0040dbc2 in print_aligned_text (title=0x0, headers=0x5665d0,
cells=0x2f8fc010, footers=0x557c90,
opt_align=0x557ef0 'l' repeats 18 times, rr, 'l' repeats 12 
times, rl lll, opt_tuples_only=0 '\0', opt_numeric_locale=0 '\0', 
opt_border=1,

encoding=8, fout=0x0) at print.c:448
#2  0x0040f0eb in printTable (title=0x0, headers=0x5665d0,
cells=0x2f8fc010, footers=0x557c90,
align=0x557ef0 'l' repeats 18 times, rr, 'l' repeats 12 times, 
rl lll, opt=0x7fff3e3be8c0, fout=0x3054442760, flog=0x0) at 
print.c:1551



OK, so the problem is that print_aligned_text is being passed fout = NULL.
Since that wasn't what was passed to printTable, the conclusion must be
that PageOutput() was called and returned NULL --- that is, that its
popen() call failed.  Obviously we should put in some sort of check for
that.  I can see three reasonable responses: either make psql abort
entirely (akin to its out-of-memory behavior), or have it fall back to
not using the pager, either silently or after printing an error
message.  Any thoughts which way to jump?

Meanwhile, the question Bill needs to look into is why popen() is
failing for him.  I'm guessing it's a fork() failure at bottom, but
why so consistent?  strace'ing the psql run might provide some more
info.

regards, tom lane

  



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Add Column BEFORE/AFTER another column

2007-08-24 Thread Bill Moran

In response to Matthew [EMAIL PROTECTED]:

 Hi gang,
  In MySQL it is possible to add a column before/after another column. I
 have not been able to find such syntax in the Postgres manual. Does this
  ability exist?

It does not.

 Is my only solution to create a new table with the new column, copy the
 data, delete the old table and rename the new one?

The fact that you're asking this question is a hint that you're using
unsafe coding practices that may bite you at some point.  The order
of columns is not guaranteed to be the same as when you do CREATE
TABLE, it's just coincidence that they usually are.

To get your columns in a specific order, specify the column names in
that order in your SELECT statement.  The SQL standard doesn't provide
for any other way to guarantee column order, and neither does Postgres.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Apache + PHP + Postgres Interaction

2007-08-23 Thread Bill Moran

In response to Joshua D. Drake [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Max Zorloff wrote:
  Hello.
  
  I have a subject setup and a few questions.
  
  The first one is this. PHP establishes a connection to the Postgres
  database through pg_pconnect(). 
 
 Don't use pconnect. Use pgbouncer or pgpool.
 
  Then it
  runs some query, then the script returns, leaving the persistent
  connection hanging. But the trouble
  is that in this case any query takes significantly more time to execute
  than in the case of one PHP script
  running the same query with different parameters for N times. How can I
  achieve the same performance in the first
  case? Persistent connections help but not enough - the queries are still
  10 times slower than they would be on
  the 2nd time.
 
 Well you haven't given us any indication of data set or what you are
 trying to do. However, I can tell you, don't use pconnect, its broke ;)

Broke?  How do you figure?

I'm not trying to argue the advantages of a connection pooler such as
pgpool, but, in my tests, pconnect() does exactly what it's supposed
to do: reuse existing connections.  In our tests, we saw a 2x speed
improvement over connect().  Again, I understand that pgpool will do
even better ...

Also, I'm curious as to whether he's timing the actual _query_ or the
entire script execution.  If you're running a script multiple times
to get multiple queries, most of your time is going to be tied up in
PHP's parsing and startup -- unless I misunderstood the question.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Apache + PHP + Postgres Interaction

2007-08-23 Thread Bill Moran

In response to Josh Trutwin [EMAIL PROTECTED]:

 On Thu, 23 Aug 2007 13:29:46 -0400
 Bill Moran [EMAIL PROTECTED] wrote:
 
   Well you haven't given us any indication of data set or what you
   are trying to do. However, I can tell you, don't use pconnect,
   its broke ;)
  
  Broke?  How do you figure?
 
 I asked that question earlier this month - this thread has some
 interesting discussion on pconnect:
 
 http://archives.postgresql.org/pgsql-general/2007-08/msg00602.php

Thanks to you and Erik for the link.  Not sure how I missed that
thread.

I guess I just feel that broken is a bit of a harsh term.  If
your expectations are for full-blown connection management from
pconnect(), then you will be disappointed.  If you take it for
what it is: persistent connections, then those limitations would
be expected.

*shrug*

I'm just glad there aren't any unknown problems waiting to bite
me ...

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Geographic High-Availability/Replication

2007-08-23 Thread Bill Moran

Decibel! [EMAIL PROTECTED] wrote:

 On Aug 22, 2007, at 3:37 PM, Joshua D. Drake wrote:
  You can not do multi master cross continent reliably.
 
 I'm pretty sure that credit card processors and some other companies  
 do it... it just costs a LOT to actually do it well.

Isn't this sort of requirement the entire reason for 2-phase commit?

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]

2007-08-23 Thread Bill Moran

Decibel! [EMAIL PROTECTED] wrote:

 On Aug 19, 2007, at 7:23 AM, Bill Moran wrote:
  Assumptions:
  a. After pg_stop_backup(), Pg immediately recycles log files and  
  hence wal
  logs can be copied to backup. This is a clean start.
 
  I don't believe so.  ARAIK, all pg_stop_backup() does is remove the
  marker that pg_start_backup() put in place to tell the recovery  
  process
  when the filesystem backup started.
 
 I'm pretty certain that's not the case. For a PITR to ensure that  
 data is back to a consistent state after a recovery, it has to replay  
 all the transactions that took place between pg_start_backup and  
 pg_stop_backup; so it needs to know when pg_stop_backup() was  
 actually run.

Sounds likely ... but I don't believe that forces any specific log
cycling activity, like the OP suggested.

Be nice if someone who knew for sure would chime in ;)

  By not backing up pg_xlog, you are
  going to be behind by however many transactions are in the most recent
  transaction log that has not yet been archived.  Depending on how  
  often
  your databases are updated, this is likely acceptable.  If you need
  anything more timely than that, you'll probably want to implement
  Slony or some other replication system.
 
 Just keep in mind that Slony is *not* a backup solution (though you  
 could possibly argue that it's log shipping is).

True.  This rides the fine line of the difference between an HA setup
and backup.  Specifically: HA won't allow you to recovery from user
error.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-22 Thread Bill Thoen

On Tue, Aug 21, 2007 at 04:38:42PM -0500, Scott Marlowe wrote:
 On 8/21/07, Bill Thoen [EMAIL PROTECTED] wrote:
  How would you suggest I try to track down this problem?
  I run the following query:
 
  SELECT a.* FROM compliance_2006 a, ers_regions b
WHERE a.fips_st_cd=b.fips_st
  AND a.fips_cnty_cd=b.fips_cou AND b.region =1
  AND a.fips_st_cd='17' AND a.fips_cnty_cd='003';
 
  and it works. But when I try this:
 
  SELECT a.* FROM compliance_2006 a, ers_regions b
WHERE a.fips_st_cd=b.fips_st
  AND a.fips_cnty_cd=b.fips_cou AND b.region =1
  AND a.fips_st_cd='17' ;
 
  psql dies with the message:
  Segmentation Fault.
 
 so the client psql is what's dieing right?  In that case you likely
 are getting too big a result set for psql to handle at once.  Trying
 declaring a cursor to hold your query and fetching 100 or 1000 or so
 rows at a time.
 
 Just guessing.  What's the exact text of the error message?
 

The exact message was:

Segmentation Fault.


But the table compliance_2006 is very big (18 million plus records) so I'll
try that cursor idea. But even so, an error like that makes me think that
something's broken.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-22 Thread Bill Thoen

On Wed, Aug 22, 2007 at 09:46:21AM +1200, Andrej Ricnik-Bay wrote:
 On 8/22/07, Bill Thoen [EMAIL PROTECTED] wrote:
  How would you suggest I try to track down this problem?
  Any suggestions?
 postgres version?
 Operating system?
 Anything in the log(s)?

PostgreSQL Version is 8.1.5, running on Linux (Fedora Core 6). The last few
lines in the Serverlog are:
LOG:  unexpected EOF on client connection
LOG:  transaction ID wrap limit is 1073746500, limited by database
postgres
LOG:  transaction ID wrap limit is 1073746500, limited by database
postgres

(I ran VACUUM FULL after it crashed to make sure there was no loose disk
space floating around, so that last line was probably from that.) I assume
that bit about transaction wrap limit is informational and not related to
this problem.

My PostgreSQL is working great for small SQL queries even from my large
table (18 million records). But when I ask it to retrieve anything that
takes it more than 10 minutes to assemble, it crashes with this
Segmentation Fault error. I get so little feedback and I'm still pretty
unfamiliar with Postgresql that I don't even know where to begin.

This version of PostgreSQL was compiled from source with support for various
other packages needed for GIS support, but the tables I'm trying to extract
data from contain no GIS information. So I believe that this operation is
plain PostgreSQL.

Any help you can offer as to how I can track down what's wrong would be
greatly appreciated. If I can't get this to work and can only use small
tables in PG, then its usefulnes to me will be pretty limited.

- Bill Thoen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-22 Thread Bill Thoen


As you requested, here's what bt in gbd reports:
(gdb) bt
#0  0x003054264571 in fputc () from /lib64/libc.so.6
#1  0x0040dbd2 in print_aligned_text ()
#2  0x0040f10b in printTable ()
#3  0x0041020b in printQuery ()
#4  0x00407906 in SendQuery ()
#5  0x00409153 in MainLoop ()
#6  0x0040b16e in main ()

Please tell me what it means if you can and if I can fix this problem.

Thanks,
- Bill Thoen

Alvaro Herrera wrote:

Martijn van Oosterhout escribió:

  

That said, it would be nice if it returned an error instead of
crashing.



In my opinion it isn't just a matter of would be nice.  It is a
possible bug that should be investigated.

A look at a stack trace from the crashing process would be the first
place to start.  In order to do that, please set ulimit -c unlimited
and rerun the query under psql.  That should produce a core file.  Then
run
gdb psql core
and inside gdb, execute bt.  Please send that output our way.

  



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] do you have an easy example of postgis and mapserver?

2007-08-21 Thread Bill Thoen

On Tue, Aug 21, 2007 at 12:44:49PM +0200, Ottavio Campana wrote:
 I'm sorry this mail is not very in topic, but I hope you can help me.

Just so you know, perhaps a better list to contact with this is the
MapServer mailing list at
http://lists.umn.edu/archives/mapserver-users.html, or maybe the PostGIS
list at http://www.postgis.org/mailman/listinfo/postgis-users.

 I'm trying to learn how postgis and mapserver work together, but I
 cannot understand nearly anything. I mean, I read the documentation of
 postgis and I think I understood it, but I cannot do anything useful
 with it.

I think before you try to do something customized with python you should
get familiar the mapserver MAP file which contains all the instructions to
map features from various sources like shape files, MapInfo TAB files,
PostGIS data sources, and so on. An example for mapping a layer from a
PostGIS source in a MAP file looks like this:

  LAYER
NAME states
TYPE POLYGON
STATUS ON
CONNECTION user=gisuser password=*** dbname=us_data host=localhost
port=5432
CONNECTIONTYPE POSTGIS
DATA the_geom from states
CLASSITEM 'name'
PROJECTION
  +proj=latlong +datum=WGS84
END
CLASS
  NAME US States
  OUTLINECOLOR 0 0 0
  COLOR 255 255 196
END
  END

The key bits here related to PostGIS are the CONNECTION, where you specify
all the info needed to get access to your PostgreSQL database, the
CONNECTIONTYPE which you set to POSTGIS, and the DATA directive in which
you supply a string that contains the SQL expression that pulls out the
geometry features you want to map.

If you do this with python, then you would first need to create an instance
of a mapObj, and then create a layerObj and add it to the map object,
setting the properties for the connection. However, my knowledge of this
approach is still very sketchy.

Documentation is spread around, but the two sites you should peruse are
the MapServer docs at: http://mapserver.gis.umn.edu/docs and the PostGIS docs
at: http://postgis.refractions.net/docs/.

- Bill Thoen


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] PG Seg Faults Performing a Query

2007-08-21 Thread Bill Thoen


How would you suggest I try to track down this problem?
I run the following query:

SELECT a.* FROM compliance_2006 a, ers_regions b
 WHERE a.fips_st_cd=b.fips_st
   AND a.fips_cnty_cd=b.fips_cou AND b.region =1
   AND a.fips_st_cd='17' AND a.fips_cnty_cd='003';

and it works. But when I try this:

SELECT a.* FROM compliance_2006 a, ers_regions b
 WHERE a.fips_st_cd=b.fips_st
   AND a.fips_cnty_cd=b.fips_cou AND b.region =1
   AND a.fips_st_cd='17' ;

psql dies with the message:
Segmentation Fault.

Any suggestions?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-20 Thread Bill Thoen

Something is really screwy here. I tried what looked like a simpler task 
than I tried when I started this message thread. The only unusual issue 
here is that the table compliance_2006 is rather big (over 18 million 
records). The table ers_regions is pretty small (about 3100 records) and 
all the WHERE fields are indexed (except e.region). Here's the 
not-too-complicated SQL:

SELECT a.* FROM compliance_2006 a, ers_regions e
 WHERE a.fips_st_cd=e.fips_st
   AND a.fips_cnty_cd=e.fips_cou
   AND e.region=1;

I ran VACUUM ANALYZE just before I launched this and there were no other 
postgress jobs running. I'm the only user as well. I also ran EXPLAIN 
prior to the run and got this:


Nested Loop  (cost=11.71..28800.34 rows=7219 width=584)
  -  Seq Scan on ers_regions e  (cost=0.00..71.90 rows=16 width=28)
Filter: (region = 1)
  -  Bitmap Heap Scan on compliance_2006 a  (cost=11.71..1788.76 
rows=451 width=584)
Recheck Cond: ((a.fips_st_cd = outer.fips_st) AND 
(a.fips_cnty_cd = outer.fips_cou))
-  Bitmap Index Scan on key_tract  (cost=0.00..11.71 rows=451 
width=0)
  Index Cond: ((a.fips_st_cd = outer.fips_st) AND 
(a.fips_cnty_cd = outer.fips_cou))

(7 rows)

So I launched it to see what would happen. This resulted in a very busy 
disk drive for about an hour, and one by one things started dying on my 
FC6 Linux machine. First the mouse disappeared, then the cursor in the 
terminal window where I was running psql vanished, the clock stopped, 
and the keyboard stopped responding. Meanwhile, the disk drive thrashed 
on. Finally the screen saver kicked in and shortly thereafter I logged 
back in only to see the word Killed sitting there on the last line and 
all my machine's services were running again. Just no data output.


I looked in the log and saw this:
LOG:  transaction ID wrap limit is 1073746500, limited by database 
postgres
LOG:  transaction ID wrap limit is 1073746500, limited by database 
postgres

LOG:  could not send data to client: Broken pipe

I guess this is what killed it, but does it make sense that such a 
simple request wrapped around the transaction limit? Is the bad behavior 
of my machine while running this query consistent with an error like 
this or is the machine broken and that's what caused Postgresql to 
crater? What should I set the wrap limit to? What suggestions would you 
make for tracking down the exact problem and fixing it?


Any help would be appreciated

- Bill Thoen

Tom Lane wrote:

Bill Thoen [EMAIL PROTECTED] writes:
  

Tom, here's the explain results: Does this help explain what went wrong?
(And yes, I think there will be a *lot* of groups.)



  

explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;



  
 QUERY PLAN 


 Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
   Sort Key: count(*)
   -  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
width=160)
 -  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
   Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
   -  Seq Scan on compliance_2006  (cost=0.00..1039927.18
rows=18048318 width=160)
(6 rows)



Hmm ... no, actually, that shows the planner doing the right thing for
lotsa groups: picking GroupAggregate instead of HashAggregate.  The
estimated number of groups is 1804832, which might or might not have
much to do with reality but in any case seems enough to keep it away
from HashAggregate.

Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE
or VACUUM ANALYZE?  The only theory I can think of at this point is that
your database statistics are more correct now than they were when you
had the problem.

If you try the query again, does it behave more sanely?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

  



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]

2007-08-19 Thread Bill Moran

 to restore some databases before you trust it.
Getting PITR working effectively is tricky, no matter how many questions
you ask of knowledgeable people.  You're going to want to have first-
hand experience going through the process.

HTH.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Thoen

I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have
a database with a little more than 18 million records that takes up about
3GB. I need to check to see if there are duplicate records, so I tried a
command like this:

SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable 
  GROUP BY fld1, fld2, fld3, fld4
  ORDER BY 1 DESC;

I knew this would take some time, but what I didn't expect was that about
an hour into the select, my mouse and keyboard locked up and also I
couldn't log in from another computer via SSH. This is a Linux machine
running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
the disc too.

I finally had to shut the power off and reboot to regain control of my
computer (that wasn't good idea, either, but eventually I got everything
working again.)

Is this normal behavior by PG with large databases? Did I misconfigure
something? Does anyone know what might be wrong?

- Bill Thoen


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Moran

Bill Thoen [EMAIL PROTECTED] wrote:

 I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have
 a database with a little more than 18 million records that takes up about
 3GB. I need to check to see if there are duplicate records, so I tried a
 command like this:
 
 SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable 
   GROUP BY fld1, fld2, fld3, fld4
   ORDER BY 1 DESC;
 
 I knew this would take some time, but what I didn't expect was that about
 an hour into the select, my mouse and keyboard locked up and also I
 couldn't log in from another computer via SSH. This is a Linux machine
 running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
 the disc too.
 
 I finally had to shut the power off and reboot to regain control of my
 computer (that wasn't good idea, either, but eventually I got everything
 working again.)
 
 Is this normal behavior by PG with large databases?

No.  Something is wrong.

 Did I misconfigure
 something? Does anyone know what might be wrong?

Possibly, but I would be more inclined to guess that your hardware is
faulty and you encountered a RAM error, or the CPU overheated or
something along those lines.  I'm not familiar with Linux systems
hard-locking like that unless there is a hardware issue.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Thoen

Tom, here's the explain results: Does this help explain what went wrong?
(And yes, I think there will be a *lot* of groups.)

explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;

 QUERY PLAN 


 Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
   Sort Key: count(*)
   -  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
width=160)
 -  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
   Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
   -  Seq Scan on compliance_2006  (cost=0.00..1039927.18
rows=18048318 width=160)
(6 rows)



On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote:
 Bill Thoen [EMAIL PROTECTED] writes:
  I knew this would take some time, but what I didn't expect was that about
  an hour into the select, my mouse and keyboard locked up and also I
  couldn't log in from another computer via SSH. This is a Linux machine
  running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
  the disc too.
 
  I finally had to shut the power off and reboot to regain control of my
  computer (that wasn't good idea, either, but eventually I got everything
  working again.)
 
 I've seen Fedora go nuts like that when it ran out of memory.  Once it
 starts to swap heavily, performance goes into the tank; and once the
 kernel realizes it's in memory trouble, it starts to kill processes
 more or less at random.  That might explain why ssh stopped working.
 
 One thing to do to make it more robust is to disable memory overcommit.
 I suspect also that configuring it with lots of swap space is
 counterproductive, because that just encourages the kernel to allow lots
 of swapping.  I haven't actually experimented with that part though.
 
 As for why PG ran the system out of memory, I suspect that the planner
 drastically underestimated the number of groups to be created by your
 GROUP BY, and thought it could get away with a hash aggregation.  We
 don't currently have any provision for spilling hash aggregation to
 disk, so if there's a very large number of groups the table just gets
 very big :-(.  The planner is not supposed to choose hash agg if the
 estimated table size exceeds work_mem ... but if it had out-of-date
 statistics to work with it might have gotten the wrong answer.  Have
 you ANALYZEd this table recently?  What does EXPLAIN show as the
 estimated number of result rows?
 
   regards, tom lane
 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-16 Thread Bill Moran

In response to Rainer Bauer [EMAIL PROTECTED]:

Trevor Talbot wrote:

On 8/16/07, Rainer Bauer [EMAIL PROTECTED] wrote:

But if you go to eBay, they always give you an accurate count. Even if
the no.
of items found is pretty large (example: http://search.ebay.com/new).

And I'd bet money that they're using a full text search of some kind to
get those results, which isn't remotely close to the same thing as a
generic SELECT count(*).

Without text search (but with a category restriction):
http://collectibles.listings.ebay.com/_W0QQsacatZ1QQsocmdZListingItemList

I only wanted to show a counter-example for a big site which uses
pagination
to display result sets and still reports accurate counts.

Categories are still finite state: you can simply store a count for
each category. Again it's just a case of knowing your data and
queries; it's not trying to solve a general infinite-possibilities
situation.

Consider this query with multiple WHERE conditions:
http://search.ebay.com/ne-ol-an_W0QQfasiZ1QQfbdZ1QQfcdZ1QQfcidZ77QQfclZ3QQfmcZ1QQfrppZ50QQfsooZ1QQfsopZ1QQftidZ1QQpriceZ1QQsabdhiZ100QQsacurZ999QQsalicZQ2d15QQsaprchiZ5QQsatitleZQ28neQ2aQ2colQ2aQ2canQ2aQ29QQsojsZ0

My point is that whatever search criterias are involved and how many items
are found eBay always returns the *accurate* number of items found.

While I don't _want_ to argue with you ... I can't seem to help myself.

How do you _know_ that's the exact number of items? There are 50 items on
that page, the paginator at the bottom shows 97,686 pages, but there's no
way (that I can find) to go to the _last_ page to ensure that said numbers
are correct. It could simply be estimating the number of items and
calculating the # of pages based on that. With 4mil items, a few 1000 off
isn't anything anyone would notice.

Before this drifts off:
* I do know *why* count(*) is slow using Postgres.
* I *think* that count(*) is fast on eBay because count is cheaper using
Oracle (which eBay does: http://www.sun.com/customers/index.xml?c=ebay.xml).

That could be possible, but it's still speculation at this point.
If someone with Oracle-fu could say for sure one way or the other,
that would be interesting ...

Unless there's data on that sun.com page that provides more detail. It
doesn't seem to be willing to load for me at this point ...

* I realize that pagination for multi-million tuple results does not make
sense.

Then what is the point to this thread? Are we just shooting the breeze at
this point?

--
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Bill Moran

In response to Phoenix Kiula [EMAIL PROTECTED]:

  Yes, optimization. :)  You don't need an exact count to tell someone
  that there's more data and they can go to it.
 
 
 In general, I agree. But my example of Amazon was only to illustrate
 the point about two queries and why they may be needed. I seem to see
 many more pages than you do, but in any case, Google and Amazon can
 afford to be less precise.
 
 Thanks for the suggestion of using EXPLAIN and parsing an
 approximation, but when you need to show a trader how many trades he
 has made, for instance, then approximation is not a possibility at
 all. Especially not if the numbers sway so wildly --
 
 
 FIRSTDB=# explain select * from trades where t_id = 'kXjha';
 QUERY PLAN
 ---
  Bitmap Heap Scan on trades (cost=15.77..1447.12 rows=374 width=224)
Recheck Cond: ((t_id)::text = 'kXjha'::text)
-  Bitmap Index Scan on trades_tid_date  (cost=0.00..15.67 rows=374 
 width=0)
  Index Cond: ((t_id)::text = 'kXjha'::text)
 (4 rows)
 
 FIRSTDB=# select count(*) from trades where t_id = 'kXjha';
  count
 ---
   3891
 (1 row)
 
 
 
 Could I do something so that the EXPLAIN showed up with slightly more
 close-to-accurate stats? The above query is just after a vacuum
 analyze!

In the above case, you could probably materialize the data with a trigger
that updates a counter in a separate table every time a new trade is
added.  This will give you 100% accurate results with _very_ fast
response time.

Part of the problem is that there's no one answer to your question, there
are multiple approaches to solving it, depending on the details of the
problem and the acceptable time/accuracy of the answers.  Some basic
approaches:

1) Materialize the data.  MySQL actually does this automatically for you
   with MyISAM tables, which is why count(*) is so fast.  But if you
   absolutely need fast, accurate counts, you can build your own
   triggers in PG.  This is unlikely to be practical with all queries.
2) Estimate.  The accuracy of estimates can vary wildly by query and
   how often the database is analyzed, etc.  For something like,
   show results 1 - 10 of about 50,000, estimates are great and fast,
   but for other cases, not acceptable.  The good news is you can get
   a fast estimate from any query with no up-front work.
3) Accept that sometimes to get accurate answers it's going to take
   time.  Around here, we call it the Orbitz technique, because when
   we discuss it, everyone thinks of the please wait while I process
   your query page you get from orbitz.com.  You'd be surprised how
   willing your users are to wait, as long as they know they have to
   wait.
4) Throw more hardware at it.  If you absolutely _must_have_ super-
   accurate results faster, then you may need to buy more RAM, faster
   disks and faster CPUs to accomplish it.
5) Come up with something revolutionary that nobody's every thought of
   before.  Good luck with this one.

Of course, all of these ideas are only practical if you've already
ensured that your system is properly tuned.  Crappy values for
shared_buffers and other tuning will lead you to waste time trying
to redesign something that should work just fine, so verify all your
configuration first.  You may be able to get more acceptable estimates
by increasing your statistics targets, for example.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Using PITR Backup and Recovery

2007-08-14 Thread Bill Moran

In response to [EMAIL PROTECTED] [EMAIL PROTECTED]:

 We recently moved to PITR backup and recovery solution as defined in the
 documentation.  Our basic setup executes the backup start command, and then
 takes a snapshot of the filesystem and backups wal files.  However, we have
 database files ( not wal files ) that change while the system is in backup
 mode.  This happens during every backup.  Is this normal?

Maybe.  Not entirely sure I understand you're meaning, but ...

My understanding is that pg_start_backup()'s purpose is to mark the database
so it knows what the last complete transaction was before it was started.  This
doesn't prevent PostgreSQL from making changes to DB files, it just ensures
that in the event of a restore, PG knows where to start as far as the data
files and the WAL log are concerned.

I'm curious as to why files would be changing if you made a filesystem snapshot,
but that wouldn't be a problem with PostgreSQL, it would be a problem with the
filesystem code.  Or I could be misunderstanding what you mean.

In any event, if database activity is occurring while the backup is running,
PostgreSQL's data files will continue to change, but the archive of WAL logs
will allow the system to recover from inconsistent changes during the
recovery phase.

I don't know if anyone's done extensive testing to know just how reliable PITR
is, but it's worked every time for me.  One caveat: you can't recover PITR
data from an amd64 system to an i386 system :D  PostgreSQL's data (and possibly
the WAL logs as well) is architecture dependent.  This can be a royal pain if
you don't know about it and you have a mix of architectures.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Moving to postgresql and some ignorant questions

2007-08-14 Thread Bill Moran

In response to Phoenix Kiula [EMAIL PROTECTED]:

  If you issue a BEGIN then nothing gets committed until you issue a COMMIT. 
  If
  anything happens in the meantime then everything you've done since the BEGIN
  disappears.
 
 There are some cases where I would like to bunch queries into a
 transaction purely for speed purposes, but they're not interdependent
 for integrity. E.g.,
 
   BEGIN TRANSACTION;
   UPDATE1;
   UPDATE2;
   UPDATE3;
   COMMIT;
 
 If UPDATE2 fails because it, say, violates a foreign key constraint,
 then so be it. I want UPDATE3 to go ahead. Is this not possible, or is
 there an option I can use to do these kind of independent-query
 transactions?

That's not possible, by design.

However, your application could keep track of which queries have run, and
if one fails, start the transaction over without the failing query.

But the rule is, if any query within the transaction errors, then all queries
within the transaction are rolled back.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Problem Designing Index

2007-08-13 Thread Bill Moran

In response to Alan J Batsford [EMAIL PROTECTED]:
 
 Hello,
 
 I'm doing some select statements on my table that look like:
 SELECT * FROM table WHERE prod_num = '1234567' AND transaction_timestamp 
 '2007-07-18 21:29:57' OR prod_num  '1234567' ORDER BY prod_num ASC,
 transaction_timestamp ASC LIMIT 1;
 
 I've added two indices one for prod_num and another transaction_timestamp.
 This table has 151,000 rows and the above statement returns in less than a
 millisecond. If I change the above statement from '' to '' it takes 8
 seconds to complete. Prod_num '1234567' is towards the end of the 151k
 rows. If i use a prod_num like '123' towards the front the problem is
 reversed with '' and ''.
 
 I tried adding a third index that uses both prod_num and
 transaction_timestamp. The average performance at each end of the data for
 both '' and '' improved but the problem wasn't resolved. Selects at the
 end of the data with '' conditions (Like the original statement) then
 become broken and take 500 ms to finish, which is unacceptable for the
 application.
 
 I did analyze on the table with no effect.
 
 Is it possible to design an index that can account for all the scenerios?
 Thanks for any help you can provide.

While it's difficult to be sure, I'm guessing you have either a hardware
problem, or a tuning problem -- but I don't think your indexes are a problem.

Keep in mind that once PostgreSQL has determined which rows to return, it
has to actually read all those rows off disk and send them to the client
application.  In my opinion, 8 seconds to read in over 100,000 rows isn't
unreasonable (especially if those rows are wide).

If 8 seconds is an unacceptable time, then you're liable to need hardware to
fix it: more RAM to cache those rows, or faster disks or both.

However, this is just speculation.  You didn't provide analyze output, table
schema, hardware details, or configuration information ... so it's entirely
possible that there is something else wrong.  I'm just making an educated
guess.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Database Select Slow

2007-08-10 Thread Bill Moran

In response to .ep [EMAIL PROTECTED]:

 On Aug 10, 9:42 pm, [EMAIL PROTECTED] (A.
 Kretschmer) wrote:
  am  Fri, dem 10.08.2007, um 17:46:11 +0800 mailte carter ck folgendes:
 
   Hi all,
 
   I am facing a performance issue here. Whenever I do a count(*) on a table
   that contains about 300K records, it takes few minutes to complete. 
   Whereas
   my other application which is counting  500K records just take less than
   10 seconds to complete.
 
   I have indexed all the essential columns and still it does not improve the
   speed.
 
  Indexes don't help in this case, a 'select count(*)' forces a seq. scan.
  Do you realy need this information? An estimate for the number of rows
  can you find in the system catalog (reltuples in pg_class, 
  seehttp://www.postgresql.org/docs/current/interactive/catalog-pg-class.html)
 
 
 
 Hi, what if I need to do a count with a WHERE condition? E.g.,
 
 SELECT count(*) from customers where cust_id = 'georgebush' and
 created_on  current_date - interval '1 week' ;
 
 Can I get the info about this from somewhere in the pg system tables
 as well? Queries like these are very common in most applications, so
 I'm hoping I can avoid the sequential scans!
 
 Many thanks for any tips.

If you only need an estimate, you can do an explain of the query, and
grep out the row count.  The accuracy of this will vary depending on the
statistics, but it's very fast and works with a query of any complexity.

If you need fast, accurate counts, your best bet is to set up triggers on
your tables to maintain counts in a separate table.  This can be rather
complex to set up, and you take a performance hit during inserts and updates,
but I don't know of any other way to do it.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Asynchronous Query processing

2007-08-06 Thread Bill Moran

In response to Brendon Sablinski [EMAIL PROTECTED]:

 I have a need to do asynchronous query processing.  I am a little confused
 by the order and frequency of calls.  Can someone tell me if the below is
 correct?  The code comments are littered with my questions.
 
 PQsendQueryParams(pgconn, SELECT .., );
 
 while(1)
 {
   epoll/poll/select PQsocket(pgconn) [read]
 
   // call this each time poll indicates data is ready?
   PQconsumeInput(pgconn);

What poll?  PQconsumeInput()/PQisBusy() _is_ the poll.

Also, you should check the return code.  If you get 0, something has gone wrong.

   // If isBusy returns 1, just poll again?  Will this indicate 0 when I have
 a full PGResult set?
   // Or, can it split the results of a single statement across multiple
 result sets.
   if(PQisBusy())
 conitnue;

Is this just experimenting?  Note that by simply staying within the loop,
you basically wrap asynchronous access with synchronous code.  In your real
code, you should return to whatever else the program has to work on.

   // I know the docs say keep calling getResult until it returns NULL.  In
 my
   // case, I am wondering if that is needed.  I only send one statement at a
   // time to the server.  The connection is not listening for notifies
 either.  Do
   // I still need to call getResult multiple times?  I saw many examples
 online
   // that DO NOT call it multiple times, all of them executed a single
 statement.

If you know for a fact that there are no more statements in processing, there's
no need to call PQgetResult() any more.

However, if you know for a fact that there could never be more than a single
query in the pipeline, it's unlikely that your code is written to handle
asynchronous processing.

   res = PQgetResult(pgconn);
 
   process_results(res);
 }
 
 thanks, brendon
 


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] v8.2 ... command line interface on Windows

2007-08-04 Thread Bill Bartlett

Couldn't this be offered pre-built as an alternative in the Windows
installer or as a psql2.exe program?  Seems like there might be more
than a few PostgreSQL users running with US keyboard layouts under
native Windows, and providing the prebuilt version along with the
normal one might eliminate many of the postings like this that keep
reappearing on these lists.

- Bill

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Magnus Hagander
 Sent: Saturday, August 04, 2007 3:09 PM
 To: nac1967
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] v8.2 ... command line interface on Windows
 
 
 nac1967 wrote:
  I am a PostgreSQL user, using the native windows version 
 8.2. I am new 
  to PostgreSQL and am wondering if other users have suggestions 
  regarding command line interfaces. I cannot for the life of me get 
  readline to work on the Windows version. If you use IPython, for 
  example, readline works fantastically with smart ctrl-P search of 
  previous commands and tab completion. Does anyone know of a 
 way either 
  to get readline to work or another good command line interface?
 
 Readline only works with US keyboard layouts under native 
 windows, that's why it's been turned off in the binary 
 builds. You are left with the very limited commandline 
 editing in the default windows command shell.
 
 If you are using only US keyboard layout, you can rebuild 
 psql from source (using the MingW build system) with 
 readline. You only need to rebuild psql - not libpq or the 
 backend itself.
 
 //Magnus
 
 
 ---(end of 
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
   http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] v8.2 ... command line interface on Windows

2007-08-04 Thread Bill Bartlett

We use the Cygwin version of the psql.exe program under Windows instead
of the native psql.exe program, even though the rest of the PostgreSQL
installation uses the standard native Windows versions of all the other
PostgreSQL components (database, tools, etc.).  (So before I get flamed,
I want to clarify that the ONLY component of PostgreSQL that we use from
Cygwin is psql.exe, and it's run from an alternate directory.)

The main advantage that we get by using the Cygwin version of psql is
that is runs in interactive mode regardless of how it is run, whereas
the native psql program runs in non-interactive mode (showing almost no
output, no prompts, no readline support, etc.) when run from most
shell programs.  We frequently run psql via a remote SSH connection or
from the RXVT terminal program or from inside Emacs, and in all of these
cases the native psql program runs in non-interactive mode whereas the
Cygwin version of psql is fully-functional.

- Bill

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of nac1967
 Sent: Saturday, August 04, 2007 3:01 PM
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] v8.2 ... command line interface on Windows
 
 
 I am a PostgreSQL user, using the native windows version 8.2. 
 I am new to PostgreSQL and am wondering if other users have 
 suggestions regarding command line interfaces. I cannot for 
 the life of me get readline to work on the Windows version. 
 If you use IPython, for example, readline works fantastically 
 with smart ctrl-P search of previous commands and tab 
 completion. Does anyone know of a way either to get readline 
 to work or another good command line interface?
 
 Thank you.
 
 
 ---(end of 
 broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] file-system snapshot under freebsd for backup

2007-08-02 Thread Bill Moran

In response to ProAce [EMAIL PROTECTED]:

 I already put the data directory on hds san storage, but there is no
 snapshot license on it.
 Could I use mksnap_ffs under freebsd to make snapshot for backup ?

Maybe.  I'm confused by your question, but:
* If the data directory is on an FFS2 volume on a FreeBSD machine, you
  can use mksnap_ffs to back it up, and that approach is actually
  recommended for PITR-type backups.
* If the data is mounted via NFS or something similar and the filesystem
  is not FFS2, then snapshots are not available on FreeBSD.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] file-system snapshot under freebsd for backup

2007-08-02 Thread Bill Moran

In response to ProAce [EMAIL PROTECTED]:

 I use SAN, not NAS.   :)
 
 Because some bugdet issue, I just have a basic SAN environment.
 For multipathing, I use geom_fox to complete.
 For snapshot, I want to use mksnap_ffs.
 
 But I don't have any experience of using mksnap_ffs to backup pgsql.
 Does anyone give me some advice?

If you don't want to do PITR, read this:
http://www.postgresql.org/docs/8.2/static/backup-file.html

You can also use the mount command to make filesystem snapshots.
Assuming your PG data directory is under the mount point /var/db:

mount -o snapshot -u /var/db/snapshot /var/db
mdconfig -a -t vnode -f /var/db/snapshot -u 4
mount -r /dev/md4 /mnt
[ ... do whatever you do to back up /mnt (which is the snapshot of
 /var/db ...]
umount /mnt
mdconfig -d -u 4
rm /var/db/snapshot

 2007/8/2, Bill Moran [EMAIL PROTECTED]:
 
  Maybe.  I'm confused by your question, but:
  * If the data directory is on an FFS2 volume on a FreeBSD machine, you
   can use mksnap_ffs to back it up, and that approach is actually
   recommended for PITR-type backups.
  * If the data is mounted via NFS or something similar and the filesystem
   is not FFS2, then snapshots are not available on FreeBSD.
 
  --
  Bill Moran
  http://www.potentialtech.com
 


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1176 matches

Mail list logo