Re: [GENERAL] entry log

2007-08-19 Thread Robin Helgelin
On 8/19/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:
> As you mention, you could use a trigger instead of explicitly setting
> updated_at to DEFAULT, which might be more convenient because you
> don't need remember to set the updated_at column explicitly on update.
>
> Whether or not this information is *interesting* is really up to the
> specifics of your application, rather than answerable in a general
> sense.

I'm thinking it's probably going to make more sense to have a
logging/history table. What's the use of seeing when an entry was
updated when you don't know what was updated anyway :).

I guess that could be solved with triggers, each table have a trigger
that fires on update and runs a stored procedure.

-- 
regards,
Robin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Tom Lane
Bill Thoen <[EMAIL PROTECTED]> writes:
> Tom, here's the "explain" results: Does this help explain what went wrong?
> (And yes, I think there will be a *lot* of groups.)

> explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
> tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
> compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
> field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;

>  QUERY PLAN   
>   
> 
>  Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
>Sort Key: count(*)
>->  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
> width=160)
>  ->  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
>Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
> field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
>->  Seq Scan on compliance_2006  (cost=0.00..1039927.18
> rows=18048318 width=160)
> (6 rows)

Hmm ... no, actually, that shows the planner doing the right thing for
lotsa groups: picking GroupAggregate instead of HashAggregate.  The
estimated number of groups is 1804832, which might or might not have
much to do with reality but in any case seems enough to keep it away
from HashAggregate.

Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE
or VACUUM ANALYZE?  The only theory I can think of at this point is that
your database statistics are more correct now than they were when you
had the problem.

If you try the query again, does it behave more sanely?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] tsearch2: plainto_tsquery() with OR?

2007-08-19 Thread Mike Rylander
On 8/9/07, cluster <[EMAIL PROTECTED]> wrote:
> Thanks for your response! Let me try to elaborate what I meant with my
> original post.
>
> If R is the set of words in the tsvector for a given table row and S is
> the set of keywords to search for (entered by e.g. a website user) I
> would like to receive all rows for which the intersection between R and
> S is nonempty. That is: The row should be return if just there is SOME
> match. S does not necessarily need to be a subset of R.
>

You could just wrap up a simple query in an SQL function called
plainto_or_tsquery or the like.

CREATE OR REPLACE FUNCTION plainto_or_tsquery (TEXT) RETURNS tsquery AS $$
  SELECT to_tsquery( regexp_replace( $1, E'[\\s\'|:&()!]+','|','g') );
$$ LANGUAGE SQL STRICT IMMUTABLE;

Paste this into a PG database that has tsearch2 loaded (after creating
the above function, of course):

select
  rank_cd(to_tsvector('hi . there web 2.0'), plainto_or_tsquery('hello
. web 2.0')),
  to_tsvector('hi . there web 2.0') @@ plainto_or_tsquery('hello . web
2.0') as matches;

> Furthermore I would like a measure for how "nonempty" the intersection
> is (we would call this measure "the rank").
> Example:
> For R = "three big houses" and S = "three small houses" the rank should
> be higher than for R = "three big houses" and S = "four small houses" as
> the first case has two words in common while the second case has only one.

Both rank() and rank_cd() work fine for all-ORed queries, full match
or otherwise.  The more "matchy", the better the rank.

>
> A version of plainto_tsquery() with a simple OR operator instead of AND
> would solve this problem somewhat elegant:
> 1) I can now use the conventional "tsvector @@ tsquery" syntax in my
> WHERE clause as the "@@" operator will return true and thus include the
> row in the result. Example:
>select to_tsvector('simple', 'three small houses')
>   @@ 'four|big|houses'::tsquery;
> would return "true".
>

Um... it does.

forge=# select to_tsvector('simple', 'three small houses') @@
'four|big|houses'::tsquery;
 ?column?
--
 t
(1 row)

> 2) The rank() of the @@ operator is automatically higher when there is a
> good match.
>

Again, that's already the case.

forge=# select rank(to_tsvector('hi . there web 2.0'),
plainto_or_tsquery('hello . web 2.0')), rank(to_tsvector('hi . there
web 2.0'), plainto_or_tsquery('hi . web 2.0'));
   rank|   rank
---+---
 0.0405285 | 0.0607927
(1 row)

The second is a better match; "hi" vs "hello" in the queries.

>
> An example where this OR-version of plainto_tsquery() could be useful is
> for websites using tags. Each website entry is associated with some tags
> and each user has defined some "tags of interest". The search should
> then return all website entries where there is a match (not necessarily
> complete) with the users tags of interest. Of course the best matching
> entries should be displayed top most.
>

See above.  Though, again, you'd need to put in a little work to make
sure everything is completely protected.  Probably less time than it's
take you to discuss this so far, though.  And you'd want to create a
2-param version that could accept the correct tsearch2 config.

>
> I find it important that this function is a part of tsearch2 itself as:
> 1) The user can input arbitrary data. Also potentially harmful data if
> they are not escaped right.

That's not tsearch2's problem in particular.  You should be using
parameterized queries in your app (or applying the correct quoting
functions) for all data, not just directly user supplied strings.  All
data is user supplied at some level.

> 2) Special characters should be stripped in just the same way as
> to_tsvector() does it. E.g. stripping the dot in "Hi . there" but
> keeping it in "web 2.0". Only tsearch2 can do that in a clean consistent
> way - it would be fairly messy if some thirdparty or especially some
> website-developer-homecooked stripping functionality is used for this.
>

The simple example above uses to_tsquery to do that.  Try it out, and
if  you improve it please feel free to share.

On a more general note, IMO this should not be getting in the way of
integrating tsearch2 into core.  The example above shows how trivial
it is to do the "simple" thing, but that's probably not going to be
the "right" thing.  Any time you find yourself forcing your user to is
"x OR y OR z" to get a result is a time you should probably be
augmenting your tsvectors (or tsquerys now, with query rewriting) with
thesauri.

BTW, Google /does/ just AND everything.  They just don't tell you
about everything they're ANDing.

All this is, of course, disregarding the actual utility of an all-OR
query (not much in practice, IME) and the speed of such queries on
non-trivial datasets (not good -- essentially a scan per ORed
component)).

Anyway, I hope that helps...

--miker

---(end of broadcast)---
TIP 5: don't forget to increase your 

Re: [GENERAL] POSTGRE CRASH AND CURRVAL PROBLEM HELP!

2007-08-19 Thread Gregory Stark
"David Azevedo" <[EMAIL PROTECTED]> writes:

> 2007-08-16 13:49:43||/labrsangioro/forms/recepcao_temp.php||ERROR:  could
> not open relation 1663/1491040/1555634: Invalid argument

> 2007-08-16 14:12:36||/labrsangioro/forms/recepcao_temp.php||PANIC:  could
> not open file "pg_xlog/00010046" (log file 0, segment 70):
> Invalid argument

These look pretty bad. Can you confirm whether these files are really missing?

Some people have reported strange errors not entirely unlike these being
caused by virus firewalls which prevent Postgres from doing file operations it
expects to be able to do. Can you disable any virus firewalls on this machine,
at least temporarily or as far as local file operations are affected?

> i always used nextval to insert autoincrement values, and currval to get the
> inserted value.
>
> after i backup one of the databases and restore ( just it, backup and
> restore in the same server ) the curval function stoped to work and i get
> the following message:

Is this the same server as is getting the errors above? If so then following
an error it will try to reconnect to the server which will be a new connection
and not have the sequence value set.

Are there any other errors in your logs? If there were any other errors
earlier it might have prevented your nextval() from getting called.

How did you back it up and restore it? Are you referring to pg_dump/pg_restore
or some system backup?

Can you do \ds on the working and non-working database and compare the
results?


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] POSTGRE CRASH AND CURRVAL PROBLEM HELP!

2007-08-19 Thread Scott Marlowe
On 8/19/07, David Azevedo <[EMAIL PROTECTED]> wrote:
> Please, i need help desperately.
>
> Im running postgresql 8.1.9 on windows 2003 server standard edition service
> pack 2. intel pentium 4 3.2 - 1 gb ram
> I have 5 databases in this server  and they all have action all day (
> inserts, selects etc )
>
> i have 2 problems.
>
> First, postgre crashes many times in the day and we have to restart the
> service to keep working. i created a log using the database conection class
> with php. when a query makes an error it logs in a txt file. i will show
> here the errors that im getting and i dont know what they mean.

On stable hardware with a stable OS, postgresql crashing once a year
is too often.

>
> 2007-08-16
> 13:49:43||/labrsangioro/forms/recepcao_temp.php||ERROR:
> could not open relation 1663/1491040/1555634: Invalid argument
> ||update tb_atendimento set total_faturamento='33.6', tipo_desconto='%',
> total_desconto='30' where cod_atendimento=2683
> 2007-08-16

This type of error is indicative of two possibilities.  Either the
hardware has issues, or the OS is doing something funny.

On a unix machine I would generally suspect the hardware, but on
Windows I tend to suspect some kind of anti-virus software holding a
lock on your files when the database goes to access them.

Turn off / remove / uninstall all anti-virus / firewall / email filter
software on the box.

However, it is quite likely that by now you have a corrupted data
store and will need to restore from a backup for reliability.

Oh, and buy more RAM for that box.  1 gig is REALLY small for a db server. :)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] POSTGRE CRASH AND CURRVAL PROBLEM HELP!

2007-08-19 Thread David Azevedo
Please, i need help desperately.

Im running postgresql 8.1.9 on windows 2003 server standard edition service
pack 2. intel pentium 4 3.2 - 1 gb ram
I have 5 databases in this server  and they all have action all day (
inserts, selects etc )

i have 2 problems.

First, postgre crashes many times in the day and we have to restart the
service to keep working. i created a log using the database conection class
with php. when a query makes an error it logs in a txt file. i will show
here the errors that im getting and i dont know what they mean.

2007-08-16 13:49:43||/labrsangioro/forms/recepcao_temp.php||ERROR:  could
not open relation 1663/1491040/1555634: Invalid argument
||update tb_atendimento set total_faturamento='33.6', tipo_desconto='%',
total_desconto='30' where cod_atendimento=2683
2007-08-16 14:12:36||/labrsangioro/forms/recepcao_temp.php||PANIC:  could
not open file "pg_xlog/00010046" (log file 0, segment 70):
Invalid argument
||Update tb_atendimento set valor_subtotal='17', alteradopor='18',
alteradoem='2007-08-16 14:12:35', ultimaacao='del_ex' where
cod_atendimento=465
2007-08-16 16:26:29||/labrsangioro/forms/resultado_temp.php||ERROR:  could
not open relation 1663/1491040/2689: Invalid argument
||SELECT * FROM tb_campo_exame WHERE status_campo AND tb_exame_cod_exame=631
AND tipo <> 2 AND status_campo order by ordem
2007-08-17 8:05:40||/labrsangioro/forms/recepcao_temp.php||ERROR:  could not
open relation 1663/1491040/1555635: Invalid argument
||SELECT * FROM tb_atendimento WHERE status_atendimento AND
tb_paciente_cod_paciente=2843 AND tb_local_cod_local=1 order by
data_atendimento desc
2007-08-17 8:26:40||/labrsangioro/forms/recepcao_temp.php||ERROR:  could not
open relation 1663/1491040/2601: Invalid argument
||update tb_paciente set matricula='BP0010005809', endereco='', sexo='M',
diabetico='Sim' where cod_paciente=2848
2007-08-17 8:27:00||/labrsangioro/forms/recepcao_temp.php||ERROR:  could not
open relation 1663/1491040/2601: Invalid argument

i tryed to reindex all the databases but the error keep ocurring in all the
databases.
as you can see im getting the error: could not open relation
/xx/: Invalid argument  in queries that should execute normaly.

how i can solve this problem?
This maybe the cause that postgre crash or these errors ocur because postgre
crash and cant complete the query?
what whould cause postgre to crash in my server?



Now the second problem.

i always used nextval to insert autoincrement values, and currval to get the
inserted value.

after i backup one of the databases and restore ( just it, backup and
restore in the same server ) the curval function stoped to work and i get
the following message:

2007-08-19 17:18:43||/labrsangioro/forms/cad_combo_temp.php||ERROR:  currval
of sequence "cod_combo_seq" is not yet defined in this session
||SELECT * from tb_dados_combo WHERE cod_combo=currval('
public.cod_combo_seq') AND desc_opcao='ss'

Notice that it always worked, and i aways call nextval before call currval.
Here is a sample of my code:

$query="INSERT INTO tb_atendimento VALUES(nextval('
public.tb_atendimento_cod_atendimento_seq'
),$convenio,$local,$cod_medico,$cod_paciente,'$obs_atendimento','0','$hoje_1','$ano_mes','$valor_total','$valor_subtotal','$urgencia','0','0','1','$horario_urg','$paciente','','0','$hoje_1','0','0','$total_desconto','$tipo_desconto',$cod_tabela,$Usuario,'$hoje_1',NULL,NULL,'$acao','$domicilio','f','1','$pago',NULL,NULL,'','','','',NULL,'','$data_entrega','$observacao')";
$db->exec_query($query);

$query="SELECT currval('public.tb_atendimento_cod_atendimento_seq') as
cod_atend FROM tb_atendimento";
$db->exec_query($query);
$currval_atend = $db->Regs();
$currval_atend = $currval_atend[0]['cod_atend'];


so i have backedup and restored my 5 databases, in 2 of them curval stoped
to work, and in the another 3, it work normal.

Im realy lost, someone help please.

--


Re: [GENERAL] entry log

2007-08-19 Thread Michael Glaesemann


On Aug 19, 2007, at 14:04 , Robin Helgelin wrote:


When I started with MySQL I exploited their "bug" with timestamp
fields and always had a entered and updated field on my tables.

As I'm blissfully ignorant of MySQL's peculiarities, without a more  
detailed explanation of what you're trying to do, I'm not sure if  
this suggestion will help, but here I go anyway:


If you want created and updated timestamps, you can do something like  
this:


CREATE TABLE foos
(
foo text PRIMARY KEY
, created_at TIMESTAMP WITH TIME ZONE NOT NULL
DEFAULT CURRENT_TIMESTAMP
, updated_at TIMESTAMP WITH TIME ZONE NOT NULL
DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO foos (foo) VALUES ('a foo');
SELECT *
FROM foos;
  foo  |  created_at   |  updated_at
---+---+---
a foo | 2007-08-19 15:18:27.271103-05 | 2007-08-19 15:18:27.271103-05
(1 row)

UPDATE foos
SET updated_at = DEFAULT
, foo = 'foo b'
WHERE foo = 'a foo';
SELECT *
FROM foos;
  foo  |  created_at   |  updated_at
---+---+---
foo b | 2007-08-19 15:18:27.271103-05 | 2007-08-19 15:18:35.575783-05
(1 row)


My question, is this interesting information enough to save on the
table itself? If so, I guess this could easily be solved with a
trigger, however, should one instead create a log table and log
changes, etc?


As you mention, you could use a trigger instead of explicitly setting  
updated_at to DEFAULT, which might be more convenient because you  
don't need remember to set the updated_at column explicitly on update.


Whether or not this information is *interesting* is really up to the  
specifics of your application, rather than answerable in a general  
sense.


Hope that helps.

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Thoen
Tom, here's the "explain" results: Does this help explain what went wrong?
(And yes, I think there will be a *lot* of groups.)

explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;

 QUERY PLAN 


 Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
   Sort Key: count(*)
   ->  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
width=160)
 ->  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
   Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
   ->  Seq Scan on compliance_2006  (cost=0.00..1039927.18
rows=18048318 width=160)
(6 rows)



On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote:
> Bill Thoen <[EMAIL PROTECTED]> writes:
> > I knew this would take some time, but what I didn't expect was that about
> > an hour into the select, my mouse and keyboard locked up and also I
> > couldn't log in from another computer via SSH. This is a Linux machine
> > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
> > the disc too.
> 
> > I finally had to shut the power off and reboot to regain control of my
> > computer (that wasn't good idea, either, but eventually I got everything
> > working again.)
> 
> I've seen Fedora go nuts like that when it ran out of memory.  Once it
> starts to swap heavily, performance goes into the tank; and once the
> kernel realizes it's in memory trouble, it starts to kill processes
> more or less at random.  That might explain why ssh stopped working.
> 
> One thing to do to make it more robust is to disable memory overcommit.
> I suspect also that configuring it with lots of swap space is
> counterproductive, because that just encourages the kernel to allow lots
> of swapping.  I haven't actually experimented with that part though.
> 
> As for why PG ran the system out of memory, I suspect that the planner
> drastically underestimated the number of groups to be created by your
> GROUP BY, and thought it could get away with a hash aggregation.  We
> don't currently have any provision for spilling hash aggregation to
> disk, so if there's a very large number of groups the table just gets
> very big :-(.  The planner is not supposed to choose hash agg if the
> estimated table size exceeds work_mem ... but if it had out-of-date
> statistics to work with it might have gotten the wrong answer.  Have
> you ANALYZEd this table recently?  What does EXPLAIN show as the
> estimated number of result rows?
> 
>   regards, tom lane
> 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Transactional DDL

2007-08-19 Thread Tino Wildenhain

Ron Mayer schrieb:

Scott Marlowe wrote:

On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote:

Hi,
I read a few lines about SP compilation in postgres

http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1179016,00.html

1. stored procedure compilation is transactional.
"You can recompile a stored procedure on a live system, and only
transactions starting after that compilation will see the changes," he said.
"Transactions in process can complete with the old version. Oracle just
blocks on the busy procedure."


Really?

When I tried it [1] - changing a function definition during the
middle of a long-running-query that used the function gave
me the surprising result that some rows were processed using
the old definition of the function and some with the new one.

The explanation from Tom [2] was that there was some good
reason function lookups used SnapshotNow.


Yes - if you want to see transactional DDL, put your
function change in a transaction. If you do that, you
will see your long running other transaction is seeing
the old definition the whole time (or shortly after
you commit the function changing transaction). This is
basically "read committed".

Regards
Tino







  Ron






[1] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00163.php

  I have a long query something like

select slow_function(col) from large_table;

  and half way through the query, in a separate connection, I

CREATE OR REPLACE slow_function 

  I was surprised to see that some of the rows in my select
  were processed by the old definition and some by the new.


[2] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00179.php

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] entry log

2007-08-19 Thread Robin Helgelin
Hi,

When I started with MySQL I exploited their "bug" with timestamp
fields and always had a entered and updated field on my tables.

My question, is this interesting information enough to save on the
table itself? If so, I guess this could easily be solved with a
trigger, however, should one instead create a log table and log
changes, etc?

Hints and tips are appreciated :)

-- 
regards,
Robin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Transactional DDL

2007-08-19 Thread Harpreet Dhaliwal
So is there really any version control mechanism of functions in postgresql
or not ?

~Harpreet

On 8/18/07, Ron Mayer <[EMAIL PROTECTED]> wrote:
>
> Scott Marlowe wrote:
> > On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >> I read a few lines about SP compilation in postgres
> >>
> >>
> http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1179016,00.html
> >>
> >> 1. stored procedure compilation is transactional.
> >> "You can recompile a stored procedure on a live system, and only
> >> transactions starting after that compilation will see the changes," he
> said.
> >> "Transactions in process can complete with the old version. Oracle just
> >> blocks on the busy procedure."
>
> Really?
>
> When I tried it [1] - changing a function definition during the
> middle of a long-running-query that used the function gave
> me the surprising result that some rows were processed using
> the old definition of the function and some with the new one.
>
> The explanation from Tom [2] was that there was some good
> reason function lookups used SnapshotNow.
>
>   Ron
>
>
>
>
>
>
> [1] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00163.php
>
>   I have a long query something like
>
> select slow_function(col) from large_table;
>
>   and half way through the query, in a separate connection, I
>
> CREATE OR REPLACE slow_function 
>
>   I was surprised to see that some of the rows in my select
>   were processed by the old definition and some by the new.
>
>
> [2] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00179.php
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org/
>


Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Tom Lane
Bill Thoen <[EMAIL PROTECTED]> writes:
> I knew this would take some time, but what I didn't expect was that about
> an hour into the select, my mouse and keyboard locked up and also I
> couldn't log in from another computer via SSH. This is a Linux machine
> running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
> the disc too.

> I finally had to shut the power off and reboot to regain control of my
> computer (that wasn't good idea, either, but eventually I got everything
> working again.)

I've seen Fedora go nuts like that when it ran out of memory.  Once it
starts to swap heavily, performance goes into the tank; and once the
kernel realizes it's in memory trouble, it starts to kill processes
more or less at random.  That might explain why ssh stopped working.

One thing to do to make it more robust is to disable memory overcommit.
I suspect also that configuring it with lots of swap space is
counterproductive, because that just encourages the kernel to allow lots
of swapping.  I haven't actually experimented with that part though.

As for why PG ran the system out of memory, I suspect that the planner
drastically underestimated the number of groups to be created by your
GROUP BY, and thought it could get away with a hash aggregation.  We
don't currently have any provision for spilling hash aggregation to
disk, so if there's a very large number of groups the table just gets
very big :-(.  The planner is not supposed to choose hash agg if the
estimated table size exceeds work_mem ... but if it had out-of-date
statistics to work with it might have gotten the wrong answer.  Have
you ANALYZEd this table recently?  What does EXPLAIN show as the
estimated number of result rows?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Webb Sprague
> The command you gave only shows currently
> active users, not the historic peak of connections for instance. I'll
> keep digging tha manual but would love any nudges in the right
> direction, thanks!

Can you set up a snapshot in a cronjob?  It would still only be sample
of a sample, but?

>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org/
>

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Moran
Bill Thoen <[EMAIL PROTECTED]> wrote:
>
> I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have
> a database with a little more than 18 million records that takes up about
> 3GB. I need to check to see if there are duplicate records, so I tried a
> command like this:
> 
> SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable 
>   GROUP BY fld1, fld2, fld3, fld4
>   ORDER BY 1 DESC;
> 
> I knew this would take some time, but what I didn't expect was that about
> an hour into the select, my mouse and keyboard locked up and also I
> couldn't log in from another computer via SSH. This is a Linux machine
> running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
> the disc too.
> 
> I finally had to shut the power off and reboot to regain control of my
> computer (that wasn't good idea, either, but eventually I got everything
> working again.)
> 
> Is this normal behavior by PG with large databases?

No.  Something is wrong.

> Did I misconfigure
> something? Does anyone know what might be wrong?

Possibly, but I would be more inclined to guess that your hardware is
faulty and you encountered a RAM error, or the CPU overheated or
something along those lines.  I'm not familiar with Linux systems
hard-locking like that unless there is a hardware issue.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Thoen
I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have
a database with a little more than 18 million records that takes up about
3GB. I need to check to see if there are duplicate records, so I tried a
command like this:

SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable 
  GROUP BY fld1, fld2, fld3, fld4
  ORDER BY 1 DESC;

I knew this would take some time, but what I didn't expect was that about
an hour into the select, my mouse and keyboard locked up and also I
couldn't log in from another computer via SSH. This is a Linux machine
running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
the disc too.

I finally had to shut the power off and reboot to regain control of my
computer (that wasn't good idea, either, but eventually I got everything
working again.)

Is this normal behavior by PG with large databases? Did I misconfigure
something? Does anyone know what might be wrong?

- Bill Thoen


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] WAITING in PG_STATS_ACTIVITY

2007-08-19 Thread Tom Lane
[EMAIL PROTECTED] writes:
> when is WAITING in PG_STATS_ACTIVITYset to TRUE ?
> When this connection is waiting on a lock , or are there any other
> reasons, waiting on another resource ?

Only when waiting on a lock.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] posgres tunning

2007-08-19 Thread Gavin M. Roy
We use PHP, but think of it as a universal PgSQL proxy..  If you connect to
a connection you setup in pgBouncer via psql, it looks like a normal
database.   Nothing is different in your code but where you connect (for us,
it's the same as our core DB server on a different server).  Let me know if
that answers your question, would be happy to elaborate further if needed.

On 8/19/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> On Jul 23, 5:18 am, [EMAIL PROTECTED] ("Gavin M. Roy") wrote:
> > You might want to look at pgBouncer to pool your drupal pgsql
> needs.  I've
> > found with 2000 needed connections, I can pool out to only 30 backends
> and
> > still push 8k transactions per second.
> >
>
>
>
> How you do use pgBouncer -- through an application developed in PHP or
> Perl? It would be lovely if you can share some info about this
> seemingly useful app which comes with so little documentation on how
> to actually get using. The Skype site mentions the install bits, but
> not how to actually query the pgBouncer stuff instead of the database.
> Very hard to come by some actual code. Could you please share some?
> Many thanks!
>
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>


Re: [GENERAL] posgres tunning

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Gavin M. Roy <[EMAIL PROTECTED]> wrote:
> We use PHP, but think of it as a universal PgSQL proxy..  If you connect to
> a connection you setup in pgBouncer via psql, it looks like a normal
> database.   Nothing is different in your code but where you connect (for us,
> it's the same as our core DB server on a different server).  Let me know if
> that answers your question, would be happy to elaborate further if needed.



Thanks much Gavin. That's useful info. Is it easy to set up without
breaking a live, production pgsql environment? I don't mind a quick
restart but significant downtime may not be an option.

I guess this may be too much, but would you have some PHP code you
could share? I like the idea of connecting to pgbouncer as though it
were a db, but it's unclear how that would get me to my tables in my
real db?

TIA!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula wrote:
> No need to match. If you have 150 relations, 200 is a reasonable value.
> But once you get the proper output from the vacuum command, it tells you
> that as well (74 in my example above)



Found it! You get those words if you do a generic "vacuum verbose",
not a specific "vacuum verbose MYTABLE". In hindsight, the conf
variable is for the entire database, so it makes sense to do a generic
one!

Here is my output:

[--
INFO:  free space map contains 76059 pages in 32 relations
DETAIL:  A total of 136688 page slots are in use (including overhead).
136688 page slots are required to track all free space.
Current limits are:  15 page slots, 200 relations, using 893 kB.
VACUUM
Time: 202065.807 ms
--]

Now, will this value of "136688" keep incrementing, or will autovacuum
keep it in check? I have increased my max_fsm_pages to 150,000 as you
can see.

Thanks for the "Monitoring Stats" link. Looks like pgsql is a bit more
involved. I was looking for information that would allow me to set my
"max_connections" well. The command you gave only shows currently
active users, not the historic peak of connections for instance. I'll
keep digging tha manual but would love any nudges in the right
direction, thanks!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Magnus Hagander
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
>> Phoenix Kiula wrote:
> 
> ...snip
> 
>> There should be a line like this at the end of a "VACUUM VERBOSE" command:
>> INFO:  free space map contains 33 pages in 74 relations
>> DETAIL:  A total of 1184 page slots are in use (including overhead).
>> 1184 page slots are required to track all free space.
>> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
>> VACUUM
>>
> 
> 
> I ran the vacuum analyze verbose again, and did not see anything like
> that. Should I run a vacuum alone?

It shows up in both variants for me.
Oh, hang on. I think it only shows up if you're logged in with a
superuser - table owner is not enough. Check that.


> In any case, in your example, which number would I take note of, and
> derive the max_fsm_pages from?

The 1184 number (the one for "page slots are required to track")


> I do notice this in my own output:
> 
>  There were 2959498 unused item pointers.
>  133616 pages contain useful free space.
>  0 pages are entirely empty.
> 
> Does this mean I should have over 133,616 in my max_fsm_pages. Should
> I set it up at 150,000 for example?

Probably not enough - that's for a single table, no?

> Secondly, the max_fsm_relations -- if I have about 150 "relations" in
> my database (relations as per PGSQL lingo) then can this figure be,
> say, 200? Or does this have to match max_fsm_pages?

No need to match. If you have 150 relations, 200 is a reasonable value.
But once you get the proper output from the vacuum command, it tells you
that as well (74 in my example above)

As for your other question,how to view connections. Use "SELECT * FROM
pg_stat_activity". See
http://www.postgresql.org/docs/8.2/static/monitoring-stats.html.

//Magnus



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
Btw, related to one my earlier questions: where can I see how many
connections are being made to the DB, what was the maximum number
attempted at any given time, and so on? The connections related info.

Thanks!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula wrote:

...snip

> There should be a line like this at the end of a "VACUUM VERBOSE" command:
> INFO:  free space map contains 33 pages in 74 relations
> DETAIL:  A total of 1184 page slots are in use (including overhead).
> 1184 page slots are required to track all free space.
> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
> VACUUM
>


I ran the vacuum analyze verbose again, and did not see anything like
that. Should I run a vacuum alone?

In any case, in your example, which number would I take note of, and
derive the max_fsm_pages from?

I do notice this in my own output:

 There were 2959498 unused item pointers.
 133616 pages contain useful free space.
 0 pages are entirely empty.

Does this mean I should have over 133,616 in my max_fsm_pages. Should
I set it up at 150,000 for example?

Secondly, the max_fsm_relations -- if I have about 150 "relations" in
my database (relations as per PGSQL lingo) then can this figure be,
say, 200? Or does this have to match max_fsm_pages?

Many thanks

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] query large amount of data in c++ using libpq

2007-08-19 Thread Henrik


19 aug 2007 kl. 06:34 skrev Felix Ji:


Hi all,
i am using PQexecParams() to "SELECT" about 3 million record in C+ 
+, and it takes several minutes to make it done with used memory  
dramatically incresed(about 200MB).
it seems when i using PQexecParams(), i can't use the query result  
before all the data is sent to client.

is there something like server side cursor in libpq?
or is there any other way to do this in c++?
some articles say that query for large amount of data is  
automaticall handled well by pg, but i can't see how.


thanks

libpqxx is what you need. then you can use cursors in c++.

cheers,
henke

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Magnus Hagander
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
>> Phoenix Kiula wrote:
>>> On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> 
> 
>>> should we do one (VACUUM FULL) now given that we've overrun our 
>>> max_fsm_pages?
>> Yes, but not until you've fixed it. And only once.
>>
> 
> 
> 
> FIxed what - the max_fsm_pages? That was my question: how to know what
> value to set for this. If the "vacuum verbose" won't give me the info
> you suggested because it is likely overlapping with autovacuum, should
> I temporarily turn autovacuum off and then run vacuum verbose? Also,
> while running vacuum full, any precautions to take?

Yeah, you can do that - or you can just trawl back through the logs to
find that information - it's there somewhere. grep would be helpful to
find it.

vacuum full will take out blocking locks on your database, so run it
during a maintenance window or at least during a low-traffic time.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]

2007-08-19 Thread Bill Moran
"Joey K." <[EMAIL PROTECTED]> wrote:
>
> Greetings,

I'm guessing you're in a hurry or in a pinch that you need to repost
after one day on a weekend.  I was waiting to let someone more
knowledgeable answer, but I've had some experience with this, so
I'll answer to the best of my ability.

> We have several web applications with Pg 8.2.x running on isolated servers
> (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have
> been using nightly filesystem backup (stop pg, tar backup to ftp, start pg)
> and it worked well.

Any reason why you haven't been using pg_dump?  There are a LOT of
drawbacks to doing filesystem level backups.  For example, you can't
restore to disparate hardware (a filesystem backup made from PG on
an i386 system won't work on an amd64 system, for example)

> We would like to move to PITR backups since the database size will increase
> moving forward and our current backup method might increase server
> downtimes.

How much do you expect it to increase?  2G is _tiny_ by modern
standards.  Even if you expect it to increase an order of magnitude,
it's still a reasonable size for pg_dump.

Some huge advantages to pg_dump:
*) architecture-neutral dumps
*) No need to stop the database
*) Extremely simple procedure for backup and restore
*) Human parseable backups (you may not even need to restore, just
   look through the data to see what was there in some cases)
*) Can restore a database without shutting down a server, thus you
   can move a database from one server to another (for example)
   without affecting work occurring on the second server.

> We have a central ftp backup server (yes, ftp :-) which we would like to use
> for weekly full and daily incremental PITR backups.
> 
> After reading the docs, PITR is still fuzzy. Our ideas for backup are (do
> not worry about the syntax),
> 
> ** START **
> 
> tmpwal = "/localhost/tmp"   # tmp space on server 1 for storing wal files
> before ftp
> Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"
> 
> Day 1:
> % psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
> % psql pg_stop_backup()
> % ftp put pgdata.tar ftpserver:/server1/day1/pgdata
> % ftp put $tmpwal/* ftpserver:/server1/day1/wal
> % rm -f $tmpwal/* pgdata.tar
> 
> Day 2:
> % ftp put $tmpwal/* ftpserver:/server1/day2/wal
> % rm -f $tmpwal/*
> 
> Day 3:
> ...
> ...
> 
> Day 7:
> % rm -f $tmpwal/*
> Start over
>
> Recovery on server1 (skeleton commands),
> % rm -f $tmpwal/*

Why are you removing this day's WAL logs before recovery?  If the
disaster occurs close to your backup time, this will result in the
loss of an entire day's data changes.

> % mv pgdata pgdata.hosed
> % ftp get ftpbackup:/server1/day1/pgdata.tar  .
> % tar -xvf pgdata.tar
> % ftp get ftpbackup:/server1/day1/wal/*  $tmpwal
> % ftp get ftpbackup:/server1/day2/wal/*  $tmpwal
> .
> .
> % cp -r pgdata.hosed/pg_xlog pgdata/
> % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
> % start pg (recovery begins)
> 
> ** END **

Again, how much WAL traffic are you generating?  Make sure you have
enough free space on the recovery system to hold all of the WAL logs
in the event you need to recover.

Note that this procedure does not do a good job of protecting you
from catastrophic hardware failure.  In the event that your RAID
system goes insane, you can lose as much as an entire day's worth
of updates, and there's no reason to.

Is there a reason you can't use (for example) NFS to transfer each WAL
log to the backup system on the fly?  This would better protect you
from IO system failures, and reduce the amount of disk space you need
on each server, since you don't need to keep WAL logs there.

> Assumptions:
> a. After pg_stop_backup(), Pg immediately recycles log files and hence wal
> logs can be copied to backup. This is a clean start.

I don't believe so.  ARAIK, all pg_stop_backup() does is remove the
marker that pg_start_backup() put in place to tell the recovery process
when the filesystem backup started.  By not backing up pg_xlog, you are
going to be behind by however many transactions are in the most recent
transaction log that has not yet been archived.  Depending on how often
your databases are updated, this is likely acceptable.  If you need
anything more timely than that, you'll probably want to implement
Slony or some other replication system.

> b. New wal files since (a) are incremental backups

Yes.

> We are not sure if WAL log filenames are unique and possibly overwrite older
> wal files during recovery.

They are unique.  Eventually they will recycle, but if you look at the
names and the number of possible names, you'll see that it will take a
long time before the names recycle.

> I'm seeking suggestions from others with experience performing PostgreSQL
> PITR backups from multiple servers to a central backup server.

Again, I recommend the use of NFS (or similar) for real-time backup
to protect you from hardware failure.

I _highly_ recommend y

Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula wrote:
> > On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:


> > should we do one (VACUUM FULL) now given that we've overrun our 
> > max_fsm_pages?
>
> Yes, but not until you've fixed it. And only once.
>



FIxed what - the max_fsm_pages? That was my question: how to know what
value to set for this. If the "vacuum verbose" won't give me the info
you suggested because it is likely overlapping with autovacuum, should
I temporarily turn autovacuum off and then run vacuum verbose? Also,
while running vacuum full, any precautions to take?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-19 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 10:54:11PM -0700, Vance Maverick wrote:

> This is my experience with a Java client too.  Writing the data with
> PreparedStatement.setBinaryStream works great for long strings, but
> reading it with the complementary method ResultSet.getBinaryStream runs
> into the memory problem, killing the Java VM.
Again, I am observing this under Python with a libpq-based driver
running on Windows and during retrieval only.

Are we seeing a pattern ?

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] posgres tunning

2007-08-19 Thread [EMAIL PROTECTED]
On Jul 23, 5:18 am, [EMAIL PROTECTED] ("Gavin M. Roy") wrote:
> You might want to look at pgBouncer to pool your drupal pgsql needs.  I've
> found with 2000 needed connections, I can pool out to only 30 backends and
> still push 8k transactions per second.
>



How you do use pgBouncer -- through an application developed in PHP or
Perl? It would be lovely if you can share some info about this
seemingly useful app which comes with so little documentation on how
to actually get using. The Skype site mentions the install bits, but
not how to actually query the pgBouncer stuff instead of the database.
Very hard to come by some actual code. Could you please share some?
Many thanks!


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]

2007-08-19 Thread Hannes Dorbath
Joey K. wrote:
> After reading the docs, PITR is still fuzzy. Our ideas for backup are (do
> not worry about the syntax),

Maybe consider using LVM snapshots in conjunction with rdiff-backup as
an option well. Simple to setup, simple to manage, no downtime, very
space efficient.


-- 
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Magnus Hagander
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
>> Phoenix Kiula wrote:
> 
>> There are ways to do this, but if you can't just use timeouts to expire
>> from the cache, things can become pretty complicated pretty fast. But
>> perhaps you can isolate some kinds of queries that can be cached for 
>> minutes, and keep the rest without caching?
> 
> 
> Thanks. In fact we need caching on a very specific part of our
> application, for only three queries which hit the DB hard with
> thousands of simultaneous SELECTs.
> 
> Do pgmemcache or pgbouncer allow for very specific usage? Both look
> way too complex. I don't mind the initial headachy setup and config,
> but then I would like the system to hum on its own, and the querying
> should be simple and intuitive.
> 
> I need a simple mechanism to query the cache, and invalidate a
> specific query in the cache when the underlying table is UPDATED so
> that the query gets cached afresh when issued later. (And a way to use
> this mechanism through PHP or Perl would be splendid).
> 
> TIA for any tips!

You can use LISTEN and NOTIFY to clear the cache, if you have many
clients that can cause cache invalidations. If you only have a single
app that can update the database, you can invalidate the cache from that
applications code directly (such as using asp.net output caching if you
were doing it in .net).
I've implemented the prior a couple of times, but it does get a bit
complex. The second part would be easier, but I don't have any direct
pointers on that since it depends on the app development framework
you're using.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Magnus Hagander
Phoenix Kiula wrote:
> On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
>> Phoenix Kiula wrote:
> 
> .snipped
> 
>>> I can merrily increase the "max_fsm_pages" directive, but the manual
>>> also caveats that with "this can use more system V memory than
>>> available on your system". My full verbose vacuum info below includes
>>> the line:
>> Do you actually run VACUUM FULL, or do you just mean you run VACUUM over
>> the full database? If you run VACUUM FULL, you need to stop doing that
>> :-) However, you will need to run it at least once over the whole
>> database once you've fixed your max_fsm_pages setting.
> 
> 
> 
> No we only do a "vacuum analyze" when we do something manually.
> Otherwise, it's all "autovacuum". Never done  a "vacuum full" --

Ok. That's good.

> should we do one now given that we've overrun our max_fsm_pages?

Yes, but not until you've fixed it. And only once.

>>> [-
>>> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
>>> rows and 0 dead rows; 3000 rows in sample, 2710124
>>> estimated total rows
>>> -]
>> There should be a line like this at the end of a "VACUUM VERBOSE" command:
>> INFO:  free space map contains 33 pages in 74 relations
>> DETAIL:  A total of 1184 page slots are in use (including overhead).
>> 1184 page slots are required to track all free space.
>> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
>> VACUUM
>>
> 
> 
> Nope, there's no line that includes such useful info. The entire
> verbose output was included in my note. I did not see the words
> "Current limits are". Do I need to enable something in the conf file
> to get more verbose output? "debug2", "debug3" -- kind of stuff?

Strange. It comes out at level INFO, and you do see other stuff at INFO
level. Any chance this just got mixed up with an autovacuum run and that
input it somewhere in the middle of your output? (that this "traders"
info is from autovac)

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula wrote:

> There are ways to do this, but if you can't just use timeouts to expire
> from the cache, things can become pretty complicated pretty fast. But
> perhaps you can isolate some kinds of queries that can be cached for 
> minutes, and keep the rest without caching?


Thanks. In fact we need caching on a very specific part of our
application, for only three queries which hit the DB hard with
thousands of simultaneous SELECTs.

Do pgmemcache or pgbouncer allow for very specific usage? Both look
way too complex. I don't mind the initial headachy setup and config,
but then I would like the system to hum on its own, and the querying
should be simple and intuitive.

I need a simple mechanism to query the cache, and invalidate a
specific query in the cache when the underlying table is UPDATED so
that the query gets cached afresh when issued later. (And a way to use
this mechanism through PHP or Perl would be splendid).

TIA for any tips!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] WAITING in PG_STATS_ACTIVITY

2007-08-19 Thread ruediger . papke
Hi,

little question:
when is WAITING in PG_STATS_ACTIVITYset to TRUE ?
When this connection is waiting on a lock , or are there any other
reasons, waiting on another resource ?
TIA
ruediger


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula wrote:

.snipped

> > I can merrily increase the "max_fsm_pages" directive, but the manual
> > also caveats that with "this can use more system V memory than
> > available on your system". My full verbose vacuum info below includes
> > the line:
>
> Do you actually run VACUUM FULL, or do you just mean you run VACUUM over
> the full database? If you run VACUUM FULL, you need to stop doing that
> :-) However, you will need to run it at least once over the whole
> database once you've fixed your max_fsm_pages setting.



No we only do a "vacuum analyze" when we do something manually.
Otherwise, it's all "autovacuum". Never done  a "vacuum full" --
should we do one now given that we've overrun our max_fsm_pages?



> > [-
> > INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> > rows and 0 dead rows; 3000 rows in sample, 2710124
> > estimated total rows
> > -]
>
> There should be a line like this at the end of a "VACUUM VERBOSE" command:
> INFO:  free space map contains 33 pages in 74 relations
> DETAIL:  A total of 1184 page slots are in use (including overhead).
> 1184 page slots are required to track all free space.
> Current limits are:  153600 page slots, 1000 relations, using 965 kB.
> VACUUM
>


Nope, there's no line that includes such useful info. The entire
verbose output was included in my note. I did not see the words
"Current limits are". Do I need to enable something in the conf file
to get more verbose output? "debug2", "debug3" -- kind of stuff?


TIA

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] server closed the connection unexpectedly

2007-08-19 Thread Magnus Hagander
Muhyiddin A.M Hayat wrote:
> Dear all,
>  
> i'm unable to connect postgres server with error :
>  
> C:\Program Files\PostgreSQL\8.2\bin>psql -U postgres siakad
> Password for user postgres:
> psql: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>  
>  
> pg_log :
>  
> 2007-08-19 03:00:50 LOG:  database system was shut down at 2007-08-19
> 02:58:26 Malay Peninsula Standard Time
> 2007-08-19 03:00:50 LOG:  checkpoint record is at 0/75A808
> 2007-08-19 03:00:50 LOG:  redo record is at 0/75A808; undo record is at
> 0/0; shutdown TRUE
> 2007-08-19 03:00:50 LOG:  next transaction ID: 0/1931; next OID: 16737
> 2007-08-19 03:00:50 LOG:  next MultiXactId: 1; next MultiXactOffset: 0
> 2007-08-19 03:00:51 LOG:  database system is ready
> 2007-08-19 03:01:43 LOG:  could not receive data from client: An
> operation was attempted on something that is not a socket.
>  
>  
> 2007-08-19 03:01:43 LOG:  incomplete startup packet
>  
> somebody help me please

This error is most likely some antivirus or firewall software that's
misbehaving. Try uninstalling any such software from the server (you can
try disabling it first, but often a complete uninstall is needed) and
try again.

//Magnus

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Stefan Kaltenbrunner
Phoenix Kiula wrote:
> On 19/08/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:

[... ]

> Well based on some past posts, I looked into my pg_log stuff and found
> a number of these lines:
> 
> 
> [
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> ]
> 
> 
> Which suggests that our guess of running out of connections is the right one.
> 
> So, we have three options (to begin with) --
> 
> 1. Increase the number of max_connections. This seems to be a voodoo
> art and a complex calculation of database size (which in our case is
> difficult to predict; it grows very fast), hardware, and such. I
> cannot risk other apps running on this same machine.sql 

this error is a sign that the OS(!) is running out of resources(or at
least won't allow pg to fork another process) - either you hit an ulimit
for the user postgresql runs under or you need to flip some kernel
setting to increase the number of processes. increasing max_connections
wil NOT help because you are not even hitting the current one yet ...

> 
> 2. Use connection pooling. I've found pgpool2 and pgbouncer from the
> Skype group. Does anyone have experience using either? The latter
> looks good, although we're usually skeptical about connection pooling
> in general (or is that just the mysqli_pconnect() hangover?)

pgbouncer works quite fine here.

Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Magnus Hagander
Phoenix Kiula wrote:
> [Sorry for the length of this post. It stretched as I provided as much
> info as possible..]
> 
> So the rubber meets the road. We've put postgresql in a production
> environment with some heavy simultaneous usage. It works well in
> general, but often PG doesn't respond. How should I test what is going
> wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
> are happening but not much, it's mostly SELECTs. Is PGSQL running out
> of connections? We can temporarily fix this by restarting pgsql but
> I'd like a more tenable solution.
> 
> Speculating that it could be some conf variable somewhere
> (max_fsm_pages in particular) I am including three things at the
> bottom of this post:
> 
> 1. Our PS output (for "postgres")
> 2. *Verbose* vacuum info for a table that shows max_fsm warning
> 3. Our postgresql.conf settings
> 
> My question 1 -- how should we test and tweak our production
> installation? Where should we look. In MySQL we could do a "show
> status" at the console and it would give a mountain of information.
> Then there was that handy little "tuning-primer" script that made it
> all come alive. I suppose this stuff is also available in pg_catalog
> but is there any website that goes in depth into HOW to tune, what
> different values mean, and such?
> 
> My question 2 -- in production, we're constantly seeing this message
> while vacuuming one table with less than 3 million rows, but one that
> we expect to keep growing:
> 
> [-
> WARNING:  relation "public.links" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> VACUUM
> -]
> 
> I can merrily increase the "max_fsm_pages" directive, but the manual
> also caveats that with "this can use more system V memory than
> available on your system". My full verbose vacuum info below includes
> the line:

Do you actually run VACUUM FULL, or do you just mean you run VACUUM over
the full database? If you run VACUUM FULL, you need to stop doing that
:-) However, you will need to run it at least once over the whole
database once you've fixed your max_fsm_pages setting.


> [-
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124
> estimated total rows
> -]

There should be a line like this at the end of a "VACUUM VERBOSE" command:
INFO:  free space map contains 33 pages in 74 relations
DETAIL:  A total of 1184 page slots are in use (including overhead).
1184 page slots are required to track all free space.
Current limits are:  153600 page slots, 1000 relations, using 965 kB.
VACUUM


(note that my numbers are for a more or less empty database. Yours will
be much higher)

If your database size is reasonably stable, pick a good value a bit
above the numbers suggested. If you expect it to grow a lot, add some
more overhead, but monitor this value.


> Does this mean my table needs nearly 200,000 pages, and that should be
> the setting of max_fsm_pages? This server is on a fairly common setup

No. It means that the table "traders" is using 199396 pages - most of
them aren't free, so they are not tracked in the FSM.


//Magnus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Magnus Hagander
Phoenix Kiula wrote:

> 
> 
> Well based on some past posts, I looked into my pg_log stuff and found
> a number of these lines:
> 
> 
> [
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable
> ]
> 
> 
> Which suggests that our guess of running out of connections is the right one.

No, you're not running out of "connections". You are, however, running
over some kernel limit.

> So, we have three options (to begin with) --
> 
> 1. Increase the number of max_connections. This seems to be a voodoo
> art and a complex calculation of database size (which in our case is
> difficult to predict; it grows very fast), hardware, and such. I
> cannot risk other apps running on this same machine.

No. You are not yet reaching max_connections, that would give you an
error message that actually says so. This message indicates that you
have an ulimit for the account that postgresql runs under that limits
some resources - in this case most likely the number of processes. And
this limit is not "compatible" with your settings for max_connections.
You need to find this ulimit, and at least change it, or even remove it.


> 2. Use connection pooling. I've found pgpool2 and pgbouncer from the
> Skype group. Does anyone have experience using either? The latter
> looks good, although we're usually skeptical about connection pooling
> in general (or is that just the mysqli_pconnect() hangover?)

Connection pooling "in general", is something that pretty much *every*
larger app will always use. It may be implemented in the app (something
which has often been troublesome in PHP solutions, but it's certainly
the norm for Java or .Net apps) or in middleware like pgpool or
pgbouncer. There should be no need to be sceptical about it in general ;-)

Can't speak for either of those apps specifically, as I haven't used
them in production.


> 3. Use caching of queries. Memcache comes recommended, but there's a
> discussion as recently as Jan 2007 on this list about race conditions
> and such (most of which I don't quite understand) which cautions
> against its use. We do expect plenty of transactions and if something
> that has been updated is not very correctly and promptly invalidated
> in the cache, it has huge business repercussions for us.

There are ways to do this, but if you can't just use timeouts to expire
from the cache, things can become pretty complicated pretty fast. But
perhaps you can isolate some kinds of queries that can be cached for 
minutes, and keep the rest without caching?


//Magnus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Martijn van Oosterhout
On Sun, Aug 19, 2007 at 05:15:34PM +0800, Phoenix Kiula wrote:
> Well based on some past posts, I looked into my pg_log stuff and found
> a number of these lines:
> [
> LOG:  could not fork new process for connection: Resource temporarily
> unavailable

Usually this message means that the *kernel* refused to let you fork a
new process. Some resource was unavailable. So this means (usually) one
of two things:

- You have a maximum on the number of processes on the system and
you're exceeding it
- You have a limit of the amount of memory. If you have overcommit
disabled this may be causing the issue.

I couldn't find quickly details of your setup but if you're running
linux with overcommit disabled, make sure you have at least twice as
much space allocated for swap as you have real memory.

Make sure you don't have lots of idle postgres processes lying around.
You'll have to provide more detail about your system before getting any
better recommendations.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
On 19/08/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> [Sorry for the length of this post. It stretched as I provided as much
> info as possible..]
>
> So the rubber meets the road. We've put postgresql in a production
> environment with some heavy simultaneous usage. It works well in
> general, but often PG doesn't respond. How should I test what is going
> wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
> are happening but not much, it's mostly SELECTs. Is PGSQL running out
> of connections? We can temporarily fix this by restarting pgsql but
> I'd like a more tenable solution.
>
> Speculating that it could be some conf variable somewhere
> (max_fsm_pages in particular) I am including three things at the
> bottom of this post:
>
> 1. Our PS output (for "postgres")
> 2. *Verbose* vacuum info for a table that shows max_fsm warning
> 3. Our postgresql.conf settings
>
> My question 1 -- how should we test and tweak our production
> installation? Where should we look. In MySQL we could do a "show
> status" at the console and it would give a mountain of information.
> Then there was that handy little "tuning-primer" script that made it
> all come alive. I suppose this stuff is also available in pg_catalog
> but is there any website that goes in depth into HOW to tune, what
> different values mean, and such?
>
> My question 2 -- in production, we're constantly seeing this message
> while vacuuming one table with less than 3 million rows, but one that
> we expect to keep growing:
>
> [-
> WARNING:  relation "public.links" contains more than "max_fsm_pages"
> pages with useful free space
> HINT:  Consider compacting this relation or increasing the
> configuration parameter "max_fsm_pages".
> VACUUM
> -]
>
> I can merrily increase the "max_fsm_pages" directive, but the manual
> also caveats that with "this can use more system V memory than
> available on your system". My full verbose vacuum info below includes
> the line:
>
> [-
> INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
> rows and 0 dead rows; 3000 rows in sample, 2710124
> estimated total rows
> -]
>
> Does this mean my table needs nearly 200,000 pages, and that should be
> the setting of max_fsm_pages? This server is on a fairly common setup
> these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I
> don't mind letting postgres use up to 1GB of the memory for itself,
> but the rest is needed for others.
>
> From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html
> , it seems the "max_fsm_relations" is about how many tables and
> indexes can be tracked in the free space map. Does this mean the
> number of actual tables and indexes in postgres databases, or
> instances of these tables? For example, if I only run 5 databases,
> each of which have about 10 tables and 20 indexes, then I have only
> 150 (5 * 30) actual "relations" in postgresql lingo. So my
> max_fsm_relations setting can be 150? (Which seems a little low
> compared to what I see online in several posts online).
>
> Assuming 150 is ok, that manual page, and other tweaking stuff such as -
> http://www.revsys.com/writings/postgresql-performance.html -- suggest
> that "max_fsm_pages" is even more critical. The manual says this
> should be at least 16 times that of max_fsm_relations, so in my
> example, it should be at least 150 * 16, which is about  2400. This
> seems abysmally low! If I up this figure to, say, 24000 instead, I
> still keep seeing the kinds of errors posted above.
>
>
> My question no. 3 -- for a SELECT-heavy database, is there any
> tried-and-tested caching tool that could be of use? I'd like to skip
> connection pooling if possible, but would be very interested in good
> caching products or contribs. I noticed pgmemcached (
> http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2.
> Would love some thoughts from people who have used it...is it worth
> the effort?
>
>
> More of my info below. TIA for your thoughts and advice!
>
> -/Phoenix
>
>
>
>
> ==EXHIBIT 1: PS OUTPUT ==
> > ps auxnm | grep postgres
>   26 20665  0.0  0.0 11760  612 ?-Aug18   0:00
> postgres: logger process
>   26 20670  0.0  1.1 188684 48312 ?  -Aug18   0:00
> postgres: writer process
>   26 20671  0.0  0.0 12032  804 ?-Aug18   0:28
> postgres: stats collector process
>   26 14497  0.0  4.1 452108 172656 ? -02:05   0:02
> postgres: traders_traders traders 127.0.0.1(56204) VACUUM
>0  9444  0.0  0.0  5008  656 pts/0-02:53   0:00 grep postgres
>
>
>
>
> ==EXHIBIT 2: POSTGRES.CONF ==
> listen_addresses = 'localhost,*'
>
> max_connections = 250
> shared_buffers = 21000   # Not much more: http://snipr.com/pgperf
> effective_cache_size = 32000
> max_fsm_relations = 500
> max_fsm_pages = 6
> sort_mem = 4096   # Low when not needed: http://snipr.com/pgperf
> 

[GENERAL] Postgresql performance in production environment

2007-08-19 Thread Phoenix Kiula
[Sorry for the length of this post. It stretched as I provided as much
info as possible..]

So the rubber meets the road. We've put postgresql in a production
environment with some heavy simultaneous usage. It works well in
general, but often PG doesn't respond. How should I test what is going
wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs
are happening but not much, it's mostly SELECTs. Is PGSQL running out
of connections? We can temporarily fix this by restarting pgsql but
I'd like a more tenable solution.

Speculating that it could be some conf variable somewhere
(max_fsm_pages in particular) I am including three things at the
bottom of this post:

1. Our PS output (for "postgres")
2. *Verbose* vacuum info for a table that shows max_fsm warning
3. Our postgresql.conf settings

My question 1 -- how should we test and tweak our production
installation? Where should we look. In MySQL we could do a "show
status" at the console and it would give a mountain of information.
Then there was that handy little "tuning-primer" script that made it
all come alive. I suppose this stuff is also available in pg_catalog
but is there any website that goes in depth into HOW to tune, what
different values mean, and such?

My question 2 -- in production, we're constantly seeing this message
while vacuuming one table with less than 3 million rows, but one that
we expect to keep growing:

[-
WARNING:  relation "public.links" contains more than "max_fsm_pages"
pages with useful free space
HINT:  Consider compacting this relation or increasing the
configuration parameter "max_fsm_pages".
VACUUM
-]

I can merrily increase the "max_fsm_pages" directive, but the manual
also caveats that with "this can use more system V memory than
available on your system". My full verbose vacuum info below includes
the line:

[-
INFO:  "traders": scanned 3000 of 199396 pages, containing 40775 live
rows and 0 dead rows; 3000 rows in sample, 2710124
estimated total rows
-]

Does this mean my table needs nearly 200,000 pages, and that should be
the setting of max_fsm_pages? This server is on a fairly common setup
these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I
don't mind letting postgres use up to 1GB of the memory for itself,
but the rest is needed for others.

>From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html
, it seems the "max_fsm_relations" is about how many tables and
indexes can be tracked in the free space map. Does this mean the
number of actual tables and indexes in postgres databases, or
instances of these tables? For example, if I only run 5 databases,
each of which have about 10 tables and 20 indexes, then I have only
150 (5 * 30) actual "relations" in postgresql lingo. So my
max_fsm_relations setting can be 150? (Which seems a little low
compared to what I see online in several posts online).

Assuming 150 is ok, that manual page, and other tweaking stuff such as -
http://www.revsys.com/writings/postgresql-performance.html -- suggest
that "max_fsm_pages" is even more critical. The manual says this
should be at least 16 times that of max_fsm_relations, so in my
example, it should be at least 150 * 16, which is about  2400. This
seems abysmally low! If I up this figure to, say, 24000 instead, I
still keep seeing the kinds of errors posted above.


My question no. 3 -- for a SELECT-heavy database, is there any
tried-and-tested caching tool that could be of use? I'd like to skip
connection pooling if possible, but would be very interested in good
caching products or contribs. I noticed pgmemcached (
http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2.
Would love some thoughts from people who have used it...is it worth
the effort?


More of my info below. TIA for your thoughts and advice!

-/Phoenix




==EXHIBIT 1: PS OUTPUT ==
> ps auxnm | grep postgres
  26 20665  0.0  0.0 11760  612 ?-Aug18   0:00
postgres: logger process
  26 20670  0.0  1.1 188684 48312 ?  -Aug18   0:00
postgres: writer process
  26 20671  0.0  0.0 12032  804 ?-Aug18   0:28
postgres: stats collector process
  26 14497  0.0  4.1 452108 172656 ? -02:05   0:02
postgres: traders_traders traders 127.0.0.1(56204) VACUUM
   0  9444  0.0  0.0  5008  656 pts/0-02:53   0:00 grep postgres




==EXHIBIT 2: POSTGRES.CONF ==
listen_addresses = 'localhost,*'

max_connections = 250
shared_buffers = 21000   # Not much more: http://snipr.com/pgperf
effective_cache_size = 32000
max_fsm_relations = 500
max_fsm_pages = 6
sort_mem = 4096   # Low when not needed: http://snipr.com/pgperf
work_mem = 4096
temp_buffers = 4096
authentication_timeout = 10s
ssl = off

#VACUUM SETTINGS
autovacuum = on
vacuum_cost_delay = 10
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 300
autovacuum_analyze_threshold = 100

#FOR BACKG