Re: [GENERAL] entry log
On 8/19/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > As you mention, you could use a trigger instead of explicitly setting > updated_at to DEFAULT, which might be more convenient because you > don't need remember to set the updated_at column explicitly on update. > > Whether or not this information is *interesting* is really up to the > specifics of your application, rather than answerable in a general > sense. I'm thinking it's probably going to make more sense to have a logging/history table. What's the use of seeing when an entry was updated when you don't know what was updated anyway :). I guess that could be solved with triggers, each table have a trigger that fires on update and runs a stored procedure. -- regards, Robin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> writes: > Tom, here's the "explain" results: Does this help explain what went wrong? > (And yes, I think there will be a *lot* of groups.) > explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, > tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from > compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, > field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; > QUERY PLAN > > > Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) >Sort Key: count(*) >-> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 > width=160) > -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) >Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, > field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr >-> Seq Scan on compliance_2006 (cost=0.00..1039927.18 > rows=18048318 width=160) > (6 rows) Hmm ... no, actually, that shows the planner doing the right thing for lotsa groups: picking GroupAggregate instead of HashAggregate. The estimated number of groups is 1804832, which might or might not have much to do with reality but in any case seems enough to keep it away from HashAggregate. Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE or VACUUM ANALYZE? The only theory I can think of at this point is that your database statistics are more correct now than they were when you had the problem. If you try the query again, does it behave more sanely? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] tsearch2: plainto_tsquery() with OR?
On 8/9/07, cluster <[EMAIL PROTECTED]> wrote: > Thanks for your response! Let me try to elaborate what I meant with my > original post. > > If R is the set of words in the tsvector for a given table row and S is > the set of keywords to search for (entered by e.g. a website user) I > would like to receive all rows for which the intersection between R and > S is nonempty. That is: The row should be return if just there is SOME > match. S does not necessarily need to be a subset of R. > You could just wrap up a simple query in an SQL function called plainto_or_tsquery or the like. CREATE OR REPLACE FUNCTION plainto_or_tsquery (TEXT) RETURNS tsquery AS $$ SELECT to_tsquery( regexp_replace( $1, E'[\\s\'|:&()!]+','|','g') ); $$ LANGUAGE SQL STRICT IMMUTABLE; Paste this into a PG database that has tsearch2 loaded (after creating the above function, of course): select rank_cd(to_tsvector('hi . there web 2.0'), plainto_or_tsquery('hello . web 2.0')), to_tsvector('hi . there web 2.0') @@ plainto_or_tsquery('hello . web 2.0') as matches; > Furthermore I would like a measure for how "nonempty" the intersection > is (we would call this measure "the rank"). > Example: > For R = "three big houses" and S = "three small houses" the rank should > be higher than for R = "three big houses" and S = "four small houses" as > the first case has two words in common while the second case has only one. Both rank() and rank_cd() work fine for all-ORed queries, full match or otherwise. The more "matchy", the better the rank. > > A version of plainto_tsquery() with a simple OR operator instead of AND > would solve this problem somewhat elegant: > 1) I can now use the conventional "tsvector @@ tsquery" syntax in my > WHERE clause as the "@@" operator will return true and thus include the > row in the result. Example: >select to_tsvector('simple', 'three small houses') > @@ 'four|big|houses'::tsquery; > would return "true". > Um... it does. forge=# select to_tsvector('simple', 'three small houses') @@ 'four|big|houses'::tsquery; ?column? -- t (1 row) > 2) The rank() of the @@ operator is automatically higher when there is a > good match. > Again, that's already the case. forge=# select rank(to_tsvector('hi . there web 2.0'), plainto_or_tsquery('hello . web 2.0')), rank(to_tsvector('hi . there web 2.0'), plainto_or_tsquery('hi . web 2.0')); rank| rank ---+--- 0.0405285 | 0.0607927 (1 row) The second is a better match; "hi" vs "hello" in the queries. > > An example where this OR-version of plainto_tsquery() could be useful is > for websites using tags. Each website entry is associated with some tags > and each user has defined some "tags of interest". The search should > then return all website entries where there is a match (not necessarily > complete) with the users tags of interest. Of course the best matching > entries should be displayed top most. > See above. Though, again, you'd need to put in a little work to make sure everything is completely protected. Probably less time than it's take you to discuss this so far, though. And you'd want to create a 2-param version that could accept the correct tsearch2 config. > > I find it important that this function is a part of tsearch2 itself as: > 1) The user can input arbitrary data. Also potentially harmful data if > they are not escaped right. That's not tsearch2's problem in particular. You should be using parameterized queries in your app (or applying the correct quoting functions) for all data, not just directly user supplied strings. All data is user supplied at some level. > 2) Special characters should be stripped in just the same way as > to_tsvector() does it. E.g. stripping the dot in "Hi . there" but > keeping it in "web 2.0". Only tsearch2 can do that in a clean consistent > way - it would be fairly messy if some thirdparty or especially some > website-developer-homecooked stripping functionality is used for this. > The simple example above uses to_tsquery to do that. Try it out, and if you improve it please feel free to share. On a more general note, IMO this should not be getting in the way of integrating tsearch2 into core. The example above shows how trivial it is to do the "simple" thing, but that's probably not going to be the "right" thing. Any time you find yourself forcing your user to is "x OR y OR z" to get a result is a time you should probably be augmenting your tsvectors (or tsquerys now, with query rewriting) with thesauri. BTW, Google /does/ just AND everything. They just don't tell you about everything they're ANDing. All this is, of course, disregarding the actual utility of an all-OR query (not much in practice, IME) and the speed of such queries on non-trivial datasets (not good -- essentially a scan per ORed component)). Anyway, I hope that helps... --miker ---(end of broadcast)--- TIP 5: don't forget to increase your
Re: [GENERAL] POSTGRE CRASH AND CURRVAL PROBLEM HELP!
"David Azevedo" <[EMAIL PROTECTED]> writes: > 2007-08-16 13:49:43||/labrsangioro/forms/recepcao_temp.php||ERROR: could > not open relation 1663/1491040/1555634: Invalid argument > 2007-08-16 14:12:36||/labrsangioro/forms/recepcao_temp.php||PANIC: could > not open file "pg_xlog/00010046" (log file 0, segment 70): > Invalid argument These look pretty bad. Can you confirm whether these files are really missing? Some people have reported strange errors not entirely unlike these being caused by virus firewalls which prevent Postgres from doing file operations it expects to be able to do. Can you disable any virus firewalls on this machine, at least temporarily or as far as local file operations are affected? > i always used nextval to insert autoincrement values, and currval to get the > inserted value. > > after i backup one of the databases and restore ( just it, backup and > restore in the same server ) the curval function stoped to work and i get > the following message: Is this the same server as is getting the errors above? If so then following an error it will try to reconnect to the server which will be a new connection and not have the sequence value set. Are there any other errors in your logs? If there were any other errors earlier it might have prevented your nextval() from getting called. How did you back it up and restore it? Are you referring to pg_dump/pg_restore or some system backup? Can you do \ds on the working and non-working database and compare the results? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] POSTGRE CRASH AND CURRVAL PROBLEM HELP!
On 8/19/07, David Azevedo <[EMAIL PROTECTED]> wrote: > Please, i need help desperately. > > Im running postgresql 8.1.9 on windows 2003 server standard edition service > pack 2. intel pentium 4 3.2 - 1 gb ram > I have 5 databases in this server and they all have action all day ( > inserts, selects etc ) > > i have 2 problems. > > First, postgre crashes many times in the day and we have to restart the > service to keep working. i created a log using the database conection class > with php. when a query makes an error it logs in a txt file. i will show > here the errors that im getting and i dont know what they mean. On stable hardware with a stable OS, postgresql crashing once a year is too often. > > 2007-08-16 > 13:49:43||/labrsangioro/forms/recepcao_temp.php||ERROR: > could not open relation 1663/1491040/1555634: Invalid argument > ||update tb_atendimento set total_faturamento='33.6', tipo_desconto='%', > total_desconto='30' where cod_atendimento=2683 > 2007-08-16 This type of error is indicative of two possibilities. Either the hardware has issues, or the OS is doing something funny. On a unix machine I would generally suspect the hardware, but on Windows I tend to suspect some kind of anti-virus software holding a lock on your files when the database goes to access them. Turn off / remove / uninstall all anti-virus / firewall / email filter software on the box. However, it is quite likely that by now you have a corrupted data store and will need to restore from a backup for reliability. Oh, and buy more RAM for that box. 1 gig is REALLY small for a db server. :) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] POSTGRE CRASH AND CURRVAL PROBLEM HELP!
Please, i need help desperately. Im running postgresql 8.1.9 on windows 2003 server standard edition service pack 2. intel pentium 4 3.2 - 1 gb ram I have 5 databases in this server and they all have action all day ( inserts, selects etc ) i have 2 problems. First, postgre crashes many times in the day and we have to restart the service to keep working. i created a log using the database conection class with php. when a query makes an error it logs in a txt file. i will show here the errors that im getting and i dont know what they mean. 2007-08-16 13:49:43||/labrsangioro/forms/recepcao_temp.php||ERROR: could not open relation 1663/1491040/1555634: Invalid argument ||update tb_atendimento set total_faturamento='33.6', tipo_desconto='%', total_desconto='30' where cod_atendimento=2683 2007-08-16 14:12:36||/labrsangioro/forms/recepcao_temp.php||PANIC: could not open file "pg_xlog/00010046" (log file 0, segment 70): Invalid argument ||Update tb_atendimento set valor_subtotal='17', alteradopor='18', alteradoem='2007-08-16 14:12:35', ultimaacao='del_ex' where cod_atendimento=465 2007-08-16 16:26:29||/labrsangioro/forms/resultado_temp.php||ERROR: could not open relation 1663/1491040/2689: Invalid argument ||SELECT * FROM tb_campo_exame WHERE status_campo AND tb_exame_cod_exame=631 AND tipo <> 2 AND status_campo order by ordem 2007-08-17 8:05:40||/labrsangioro/forms/recepcao_temp.php||ERROR: could not open relation 1663/1491040/1555635: Invalid argument ||SELECT * FROM tb_atendimento WHERE status_atendimento AND tb_paciente_cod_paciente=2843 AND tb_local_cod_local=1 order by data_atendimento desc 2007-08-17 8:26:40||/labrsangioro/forms/recepcao_temp.php||ERROR: could not open relation 1663/1491040/2601: Invalid argument ||update tb_paciente set matricula='BP0010005809', endereco='', sexo='M', diabetico='Sim' where cod_paciente=2848 2007-08-17 8:27:00||/labrsangioro/forms/recepcao_temp.php||ERROR: could not open relation 1663/1491040/2601: Invalid argument i tryed to reindex all the databases but the error keep ocurring in all the databases. as you can see im getting the error: could not open relation /xx/: Invalid argument in queries that should execute normaly. how i can solve this problem? This maybe the cause that postgre crash or these errors ocur because postgre crash and cant complete the query? what whould cause postgre to crash in my server? Now the second problem. i always used nextval to insert autoincrement values, and currval to get the inserted value. after i backup one of the databases and restore ( just it, backup and restore in the same server ) the curval function stoped to work and i get the following message: 2007-08-19 17:18:43||/labrsangioro/forms/cad_combo_temp.php||ERROR: currval of sequence "cod_combo_seq" is not yet defined in this session ||SELECT * from tb_dados_combo WHERE cod_combo=currval(' public.cod_combo_seq') AND desc_opcao='ss' Notice that it always worked, and i aways call nextval before call currval. Here is a sample of my code: $query="INSERT INTO tb_atendimento VALUES(nextval(' public.tb_atendimento_cod_atendimento_seq' ),$convenio,$local,$cod_medico,$cod_paciente,'$obs_atendimento','0','$hoje_1','$ano_mes','$valor_total','$valor_subtotal','$urgencia','0','0','1','$horario_urg','$paciente','','0','$hoje_1','0','0','$total_desconto','$tipo_desconto',$cod_tabela,$Usuario,'$hoje_1',NULL,NULL,'$acao','$domicilio','f','1','$pago',NULL,NULL,'','','','',NULL,'','$data_entrega','$observacao')"; $db->exec_query($query); $query="SELECT currval('public.tb_atendimento_cod_atendimento_seq') as cod_atend FROM tb_atendimento"; $db->exec_query($query); $currval_atend = $db->Regs(); $currval_atend = $currval_atend[0]['cod_atend']; so i have backedup and restored my 5 databases, in 2 of them curval stoped to work, and in the another 3, it work normal. Im realy lost, someone help please. --
Re: [GENERAL] entry log
On Aug 19, 2007, at 14:04 , Robin Helgelin wrote: When I started with MySQL I exploited their "bug" with timestamp fields and always had a entered and updated field on my tables. As I'm blissfully ignorant of MySQL's peculiarities, without a more detailed explanation of what you're trying to do, I'm not sure if this suggestion will help, but here I go anyway: If you want created and updated timestamps, you can do something like this: CREATE TABLE foos ( foo text PRIMARY KEY , created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP , updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP ); INSERT INTO foos (foo) VALUES ('a foo'); SELECT * FROM foos; foo | created_at | updated_at ---+---+--- a foo | 2007-08-19 15:18:27.271103-05 | 2007-08-19 15:18:27.271103-05 (1 row) UPDATE foos SET updated_at = DEFAULT , foo = 'foo b' WHERE foo = 'a foo'; SELECT * FROM foos; foo | created_at | updated_at ---+---+--- foo b | 2007-08-19 15:18:27.271103-05 | 2007-08-19 15:18:35.575783-05 (1 row) My question, is this interesting information enough to save on the table itself? If so, I guess this could easily be solved with a trigger, however, should one instead create a log table and log changes, etc? As you mention, you could use a trigger instead of explicitly setting updated_at to DEFAULT, which might be more convenient because you don't need remember to set the updated_at column explicitly on update. Whether or not this information is *interesting* is really up to the specifics of your application, rather than answerable in a general sense. Hope that helps. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Searching for Duplicates and Hosed the System
Tom, here's the "explain" results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) -> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr -> Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote: > Bill Thoen <[EMAIL PROTECTED]> writes: > > I knew this would take some time, but what I didn't expect was that about > > an hour into the select, my mouse and keyboard locked up and also I > > couldn't log in from another computer via SSH. This is a Linux machine > > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > > the disc too. > > > I finally had to shut the power off and reboot to regain control of my > > computer (that wasn't good idea, either, but eventually I got everything > > working again.) > > I've seen Fedora go nuts like that when it ran out of memory. Once it > starts to swap heavily, performance goes into the tank; and once the > kernel realizes it's in memory trouble, it starts to kill processes > more or less at random. That might explain why ssh stopped working. > > One thing to do to make it more robust is to disable memory overcommit. > I suspect also that configuring it with lots of swap space is > counterproductive, because that just encourages the kernel to allow lots > of swapping. I haven't actually experimented with that part though. > > As for why PG ran the system out of memory, I suspect that the planner > drastically underestimated the number of groups to be created by your > GROUP BY, and thought it could get away with a hash aggregation. We > don't currently have any provision for spilling hash aggregation to > disk, so if there's a very large number of groups the table just gets > very big :-(. The planner is not supposed to choose hash agg if the > estimated table size exceeds work_mem ... but if it had out-of-date > statistics to work with it might have gotten the wrong answer. Have > you ANALYZEd this table recently? What does EXPLAIN show as the > estimated number of result rows? > > regards, tom lane > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Transactional DDL
Ron Mayer schrieb: Scott Marlowe wrote: On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote: Hi, I read a few lines about SP compilation in postgres http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1179016,00.html 1. stored procedure compilation is transactional. "You can recompile a stored procedure on a live system, and only transactions starting after that compilation will see the changes," he said. "Transactions in process can complete with the old version. Oracle just blocks on the busy procedure." Really? When I tried it [1] - changing a function definition during the middle of a long-running-query that used the function gave me the surprising result that some rows were processed using the old definition of the function and some with the new one. The explanation from Tom [2] was that there was some good reason function lookups used SnapshotNow. Yes - if you want to see transactional DDL, put your function change in a transaction. If you do that, you will see your long running other transaction is seeing the old definition the whole time (or shortly after you commit the function changing transaction). This is basically "read committed". Regards Tino Ron [1] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00163.php I have a long query something like select slow_function(col) from large_table; and half way through the query, in a separate connection, I CREATE OR REPLACE slow_function I was surprised to see that some of the rows in my select were processed by the old definition and some by the new. [2] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00179.php ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] entry log
Hi, When I started with MySQL I exploited their "bug" with timestamp fields and always had a entered and updated field on my tables. My question, is this interesting information enough to save on the table itself? If so, I guess this could easily be solved with a trigger, however, should one instead create a log table and log changes, etc? Hints and tips are appreciated :) -- regards, Robin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Transactional DDL
So is there really any version control mechanism of functions in postgresql or not ? ~Harpreet On 8/18/07, Ron Mayer <[EMAIL PROTECTED]> wrote: > > Scott Marlowe wrote: > > On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote: > >> Hi, > >> I read a few lines about SP compilation in postgres > >> > >> > http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1179016,00.html > >> > >> 1. stored procedure compilation is transactional. > >> "You can recompile a stored procedure on a live system, and only > >> transactions starting after that compilation will see the changes," he > said. > >> "Transactions in process can complete with the old version. Oracle just > >> blocks on the busy procedure." > > Really? > > When I tried it [1] - changing a function definition during the > middle of a long-running-query that used the function gave > me the surprising result that some rows were processed using > the old definition of the function and some with the new one. > > The explanation from Tom [2] was that there was some good > reason function lookups used SnapshotNow. > > Ron > > > > > > > [1] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00163.php > > I have a long query something like > > select slow_function(col) from large_table; > > and half way through the query, in a separate connection, I > > CREATE OR REPLACE slow_function > > I was surprised to see that some of the rows in my select > were processed by the old definition and some by the new. > > > [2] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00179.php > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org/ >
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> writes: > I knew this would take some time, but what I didn't expect was that about > an hour into the select, my mouse and keyboard locked up and also I > couldn't log in from another computer via SSH. This is a Linux machine > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > the disc too. > I finally had to shut the power off and reboot to regain control of my > computer (that wasn't good idea, either, but eventually I got everything > working again.) I've seen Fedora go nuts like that when it ran out of memory. Once it starts to swap heavily, performance goes into the tank; and once the kernel realizes it's in memory trouble, it starts to kill processes more or less at random. That might explain why ssh stopped working. One thing to do to make it more robust is to disable memory overcommit. I suspect also that configuring it with lots of swap space is counterproductive, because that just encourages the kernel to allow lots of swapping. I haven't actually experimented with that part though. As for why PG ran the system out of memory, I suspect that the planner drastically underestimated the number of groups to be created by your GROUP BY, and thought it could get away with a hash aggregation. We don't currently have any provision for spilling hash aggregation to disk, so if there's a very large number of groups the table just gets very big :-(. The planner is not supposed to choose hash agg if the estimated table size exceeds work_mem ... but if it had out-of-date statistics to work with it might have gotten the wrong answer. Have you ANALYZEd this table recently? What does EXPLAIN show as the estimated number of result rows? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql performance in production environment
> The command you gave only shows currently > active users, not the historic peak of connections for instance. I'll > keep digging tha manual but would love any nudges in the right > direction, thanks! Can you set up a snapshot in a cronjob? It would still only be sample of a sample, but? > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org/ > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> wrote: > > I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have > a database with a little more than 18 million records that takes up about > 3GB. I need to check to see if there are duplicate records, so I tried a > command like this: > > SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable > GROUP BY fld1, fld2, fld3, fld4 > ORDER BY 1 DESC; > > I knew this would take some time, but what I didn't expect was that about > an hour into the select, my mouse and keyboard locked up and also I > couldn't log in from another computer via SSH. This is a Linux machine > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > the disc too. > > I finally had to shut the power off and reboot to regain control of my > computer (that wasn't good idea, either, but eventually I got everything > working again.) > > Is this normal behavior by PG with large databases? No. Something is wrong. > Did I misconfigure > something? Does anyone know what might be wrong? Possibly, but I would be more inclined to guess that your hardware is faulty and you encountered a RAM error, or the CPU overheated or something along those lines. I'm not familiar with Linux systems hard-locking like that unless there is a hardware issue. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Searching for Duplicates and Hosed the System
I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have a database with a little more than 18 million records that takes up about 3GB. I need to check to see if there are duplicate records, so I tried a command like this: SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable GROUP BY fld1, fld2, fld3, fld4 ORDER BY 1 DESC; I knew this would take some time, but what I didn't expect was that about an hour into the select, my mouse and keyboard locked up and also I couldn't log in from another computer via SSH. This is a Linux machine running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on the disc too. I finally had to shut the power off and reboot to regain control of my computer (that wasn't good idea, either, but eventually I got everything working again.) Is this normal behavior by PG with large databases? Did I misconfigure something? Does anyone know what might be wrong? - Bill Thoen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] WAITING in PG_STATS_ACTIVITY
[EMAIL PROTECTED] writes: > when is WAITING in PG_STATS_ACTIVITYset to TRUE ? > When this connection is waiting on a lock , or are there any other > reasons, waiting on another resource ? Only when waiting on a lock. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] posgres tunning
We use PHP, but think of it as a universal PgSQL proxy.. If you connect to a connection you setup in pgBouncer via psql, it looks like a normal database. Nothing is different in your code but where you connect (for us, it's the same as our core DB server on a different server). Let me know if that answers your question, would be happy to elaborate further if needed. On 8/19/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > On Jul 23, 5:18 am, [EMAIL PROTECTED] ("Gavin M. Roy") wrote: > > You might want to look at pgBouncer to pool your drupal pgsql > needs. I've > > found with 2000 needed connections, I can pool out to only 30 backends > and > > still push 8k transactions per second. > > > > > > How you do use pgBouncer -- through an application developed in PHP or > Perl? It would be lovely if you can share some info about this > seemingly useful app which comes with so little documentation on how > to actually get using. The Skype site mentions the install bits, but > not how to actually query the pgBouncer stuff instead of the database. > Very hard to come by some actual code. Could you please share some? > Many thanks! > > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend >
Re: [GENERAL] posgres tunning
On 19/08/07, Gavin M. Roy <[EMAIL PROTECTED]> wrote: > We use PHP, but think of it as a universal PgSQL proxy.. If you connect to > a connection you setup in pgBouncer via psql, it looks like a normal > database. Nothing is different in your code but where you connect (for us, > it's the same as our core DB server on a different server). Let me know if > that answers your question, would be happy to elaborate further if needed. Thanks much Gavin. That's useful info. Is it easy to set up without breaking a live, production pgsql environment? I don't mind a quick restart but significant downtime may not be an option. I guess this may be too much, but would you have some PHP code you could share? I like the idea of connecting to pgbouncer as though it were a db, but it's unclear how that would get me to my tables in my real db? TIA! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql performance in production environment
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > Phoenix Kiula wrote: > No need to match. If you have 150 relations, 200 is a reasonable value. > But once you get the proper output from the vacuum command, it tells you > that as well (74 in my example above) Found it! You get those words if you do a generic "vacuum verbose", not a specific "vacuum verbose MYTABLE". In hindsight, the conf variable is for the entire database, so it makes sense to do a generic one! Here is my output: [-- INFO: free space map contains 76059 pages in 32 relations DETAIL: A total of 136688 page slots are in use (including overhead). 136688 page slots are required to track all free space. Current limits are: 15 page slots, 200 relations, using 893 kB. VACUUM Time: 202065.807 ms --] Now, will this value of "136688" keep incrementing, or will autovacuum keep it in check? I have increased my max_fsm_pages to 150,000 as you can see. Thanks for the "Monitoring Stats" link. Looks like pgsql is a bit more involved. I was looking for information that would allow me to set my "max_connections" well. The command you gave only shows currently active users, not the historic peak of connections for instance. I'll keep digging tha manual but would love any nudges in the right direction, thanks! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: >> Phoenix Kiula wrote: > > ...snip > >> There should be a line like this at the end of a "VACUUM VERBOSE" command: >> INFO: free space map contains 33 pages in 74 relations >> DETAIL: A total of 1184 page slots are in use (including overhead). >> 1184 page slots are required to track all free space. >> Current limits are: 153600 page slots, 1000 relations, using 965 kB. >> VACUUM >> > > > I ran the vacuum analyze verbose again, and did not see anything like > that. Should I run a vacuum alone? It shows up in both variants for me. Oh, hang on. I think it only shows up if you're logged in with a superuser - table owner is not enough. Check that. > In any case, in your example, which number would I take note of, and > derive the max_fsm_pages from? The 1184 number (the one for "page slots are required to track") > I do notice this in my own output: > > There were 2959498 unused item pointers. > 133616 pages contain useful free space. > 0 pages are entirely empty. > > Does this mean I should have over 133,616 in my max_fsm_pages. Should > I set it up at 150,000 for example? Probably not enough - that's for a single table, no? > Secondly, the max_fsm_relations -- if I have about 150 "relations" in > my database (relations as per PGSQL lingo) then can this figure be, > say, 200? Or does this have to match max_fsm_pages? No need to match. If you have 150 relations, 200 is a reasonable value. But once you get the proper output from the vacuum command, it tells you that as well (74 in my example above) As for your other question,how to view connections. Use "SELECT * FROM pg_stat_activity". See http://www.postgresql.org/docs/8.2/static/monitoring-stats.html. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql performance in production environment
Btw, related to one my earlier questions: where can I see how many connections are being made to the DB, what was the maximum number attempted at any given time, and so on? The connections related info. Thanks! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Postgresql performance in production environment
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > Phoenix Kiula wrote: ...snip > There should be a line like this at the end of a "VACUUM VERBOSE" command: > INFO: free space map contains 33 pages in 74 relations > DETAIL: A total of 1184 page slots are in use (including overhead). > 1184 page slots are required to track all free space. > Current limits are: 153600 page slots, 1000 relations, using 965 kB. > VACUUM > I ran the vacuum analyze verbose again, and did not see anything like that. Should I run a vacuum alone? In any case, in your example, which number would I take note of, and derive the max_fsm_pages from? I do notice this in my own output: There were 2959498 unused item pointers. 133616 pages contain useful free space. 0 pages are entirely empty. Does this mean I should have over 133,616 in my max_fsm_pages. Should I set it up at 150,000 for example? Secondly, the max_fsm_relations -- if I have about 150 "relations" in my database (relations as per PGSQL lingo) then can this figure be, say, 200? Or does this have to match max_fsm_pages? Many thanks ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] query large amount of data in c++ using libpq
19 aug 2007 kl. 06:34 skrev Felix Ji: Hi all, i am using PQexecParams() to "SELECT" about 3 million record in C+ +, and it takes several minutes to make it done with used memory dramatically incresed(about 200MB). it seems when i using PQexecParams(), i can't use the query result before all the data is sent to client. is there something like server side cursor in libpq? or is there any other way to do this in c++? some articles say that query for large amount of data is automaticall handled well by pg, but i can't see how. thanks libpqxx is what you need. then you can use cursors in c++. cheers, henke ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: >> Phoenix Kiula wrote: >>> On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > > >>> should we do one (VACUUM FULL) now given that we've overrun our >>> max_fsm_pages? >> Yes, but not until you've fixed it. And only once. >> > > > > FIxed what - the max_fsm_pages? That was my question: how to know what > value to set for this. If the "vacuum verbose" won't give me the info > you suggested because it is likely overlapping with autovacuum, should > I temporarily turn autovacuum off and then run vacuum verbose? Also, > while running vacuum full, any precautions to take? Yeah, you can do that - or you can just trawl back through the logs to find that information - it's there somewhere. grep would be helpful to find it. vacuum full will take out blocking locks on your database, so run it during a maintenance window or at least during a low-traffic time. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]
"Joey K." <[EMAIL PROTECTED]> wrote: > > Greetings, I'm guessing you're in a hurry or in a pinch that you need to repost after one day on a weekend. I was waiting to let someone more knowledgeable answer, but I've had some experience with this, so I'll answer to the best of my ability. > We have several web applications with Pg 8.2.x running on isolated servers > (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have > been using nightly filesystem backup (stop pg, tar backup to ftp, start pg) > and it worked well. Any reason why you haven't been using pg_dump? There are a LOT of drawbacks to doing filesystem level backups. For example, you can't restore to disparate hardware (a filesystem backup made from PG on an i386 system won't work on an amd64 system, for example) > We would like to move to PITR backups since the database size will increase > moving forward and our current backup method might increase server > downtimes. How much do you expect it to increase? 2G is _tiny_ by modern standards. Even if you expect it to increase an order of magnitude, it's still a reasonable size for pg_dump. Some huge advantages to pg_dump: *) architecture-neutral dumps *) No need to stop the database *) Extremely simple procedure for backup and restore *) Human parseable backups (you may not even need to restore, just look through the data to see what was there in some cases) *) Can restore a database without shutting down a server, thus you can move a database from one server to another (for example) without affecting work occurring on the second server. > We have a central ftp backup server (yes, ftp :-) which we would like to use > for weekly full and daily incremental PITR backups. > > After reading the docs, PITR is still fuzzy. Our ideas for backup are (do > not worry about the syntax), > > ** START ** > > tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files > before ftp > Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f" > > Day 1: > % psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata > % psql pg_stop_backup() > % ftp put pgdata.tar ftpserver:/server1/day1/pgdata > % ftp put $tmpwal/* ftpserver:/server1/day1/wal > % rm -f $tmpwal/* pgdata.tar > > Day 2: > % ftp put $tmpwal/* ftpserver:/server1/day2/wal > % rm -f $tmpwal/* > > Day 3: > ... > ... > > Day 7: > % rm -f $tmpwal/* > Start over > > Recovery on server1 (skeleton commands), > % rm -f $tmpwal/* Why are you removing this day's WAL logs before recovery? If the disaster occurs close to your backup time, this will result in the loss of an entire day's data changes. > % mv pgdata pgdata.hosed > % ftp get ftpbackup:/server1/day1/pgdata.tar . > % tar -xvf pgdata.tar > % ftp get ftpbackup:/server1/day1/wal/* $tmpwal > % ftp get ftpbackup:/server1/day2/wal/* $tmpwal > . > . > % cp -r pgdata.hosed/pg_xlog pgdata/ > % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf > % start pg (recovery begins) > > ** END ** Again, how much WAL traffic are you generating? Make sure you have enough free space on the recovery system to hold all of the WAL logs in the event you need to recover. Note that this procedure does not do a good job of protecting you from catastrophic hardware failure. In the event that your RAID system goes insane, you can lose as much as an entire day's worth of updates, and there's no reason to. Is there a reason you can't use (for example) NFS to transfer each WAL log to the backup system on the fly? This would better protect you from IO system failures, and reduce the amount of disk space you need on each server, since you don't need to keep WAL logs there. > Assumptions: > a. After pg_stop_backup(), Pg immediately recycles log files and hence wal > logs can be copied to backup. This is a clean start. I don't believe so. ARAIK, all pg_stop_backup() does is remove the marker that pg_start_backup() put in place to tell the recovery process when the filesystem backup started. By not backing up pg_xlog, you are going to be behind by however many transactions are in the most recent transaction log that has not yet been archived. Depending on how often your databases are updated, this is likely acceptable. If you need anything more timely than that, you'll probably want to implement Slony or some other replication system. > b. New wal files since (a) are incremental backups Yes. > We are not sure if WAL log filenames are unique and possibly overwrite older > wal files during recovery. They are unique. Eventually they will recycle, but if you look at the names and the number of possible names, you'll see that it will take a long time before the names recycle. > I'm seeking suggestions from others with experience performing PostgreSQL > PITR backups from multiple servers to a central backup server. Again, I recommend the use of NFS (or similar) for real-time backup to protect you from hardware failure. I _highly_ recommend y
Re: [GENERAL] Postgresql performance in production environment
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > Phoenix Kiula wrote: > > On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > > should we do one (VACUUM FULL) now given that we've overrun our > > max_fsm_pages? > > Yes, but not until you've fixed it. And only once. > FIxed what - the max_fsm_pages? That was my question: how to know what value to set for this. If the "vacuum verbose" won't give me the info you suggested because it is likely overlapping with autovacuum, should I temporarily turn autovacuum off and then run vacuum verbose? Also, while running vacuum full, any precautions to take? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 10:54:11PM -0700, Vance Maverick wrote: > This is my experience with a Java client too. Writing the data with > PreparedStatement.setBinaryStream works great for long strings, but > reading it with the complementary method ResultSet.getBinaryStream runs > into the memory problem, killing the Java VM. Again, I am observing this under Python with a libpq-based driver running on Windows and during retrieval only. Are we seeing a pattern ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] posgres tunning
On Jul 23, 5:18 am, [EMAIL PROTECTED] ("Gavin M. Roy") wrote: > You might want to look at pgBouncer to pool your drupal pgsql needs. I've > found with 2000 needed connections, I can pool out to only 30 backends and > still push 8k transactions per second. > How you do use pgBouncer -- through an application developed in PHP or Perl? It would be lovely if you can share some info about this seemingly useful app which comes with so little documentation on how to actually get using. The Skype site mentions the install bits, but not how to actually query the pgBouncer stuff instead of the database. Very hard to come by some actual code. Could you please share some? Many thanks! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]
Joey K. wrote: > After reading the docs, PITR is still fuzzy. Our ideas for backup are (do > not worry about the syntax), Maybe consider using LVM snapshots in conjunction with rdiff-backup as an option well. Simple to setup, simple to manage, no downtime, very space efficient. -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: >> Phoenix Kiula wrote: > >> There are ways to do this, but if you can't just use timeouts to expire >> from the cache, things can become pretty complicated pretty fast. But >> perhaps you can isolate some kinds of queries that can be cached for >> minutes, and keep the rest without caching? > > > Thanks. In fact we need caching on a very specific part of our > application, for only three queries which hit the DB hard with > thousands of simultaneous SELECTs. > > Do pgmemcache or pgbouncer allow for very specific usage? Both look > way too complex. I don't mind the initial headachy setup and config, > but then I would like the system to hum on its own, and the querying > should be simple and intuitive. > > I need a simple mechanism to query the cache, and invalidate a > specific query in the cache when the underlying table is UPDATED so > that the query gets cached afresh when issued later. (And a way to use > this mechanism through PHP or Perl would be splendid). > > TIA for any tips! You can use LISTEN and NOTIFY to clear the cache, if you have many clients that can cause cache invalidations. If you only have a single app that can update the database, you can invalidate the cache from that applications code directly (such as using asp.net output caching if you were doing it in .net). I've implemented the prior a couple of times, but it does get a bit complex. The second part would be easier, but I don't have any direct pointers on that since it depends on the app development framework you're using. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: >> Phoenix Kiula wrote: > > .snipped > >>> I can merrily increase the "max_fsm_pages" directive, but the manual >>> also caveats that with "this can use more system V memory than >>> available on your system". My full verbose vacuum info below includes >>> the line: >> Do you actually run VACUUM FULL, or do you just mean you run VACUUM over >> the full database? If you run VACUUM FULL, you need to stop doing that >> :-) However, you will need to run it at least once over the whole >> database once you've fixed your max_fsm_pages setting. > > > > No we only do a "vacuum analyze" when we do something manually. > Otherwise, it's all "autovacuum". Never done a "vacuum full" -- Ok. That's good. > should we do one now given that we've overrun our max_fsm_pages? Yes, but not until you've fixed it. And only once. >>> [- >>> INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live >>> rows and 0 dead rows; 3000 rows in sample, 2710124 >>> estimated total rows >>> -] >> There should be a line like this at the end of a "VACUUM VERBOSE" command: >> INFO: free space map contains 33 pages in 74 relations >> DETAIL: A total of 1184 page slots are in use (including overhead). >> 1184 page slots are required to track all free space. >> Current limits are: 153600 page slots, 1000 relations, using 965 kB. >> VACUUM >> > > > Nope, there's no line that includes such useful info. The entire > verbose output was included in my note. I did not see the words > "Current limits are". Do I need to enable something in the conf file > to get more verbose output? "debug2", "debug3" -- kind of stuff? Strange. It comes out at level INFO, and you do see other stuff at INFO level. Any chance this just got mixed up with an autovacuum run and that input it somewhere in the middle of your output? (that this "traders" info is from autovac) //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql performance in production environment
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > Phoenix Kiula wrote: > There are ways to do this, but if you can't just use timeouts to expire > from the cache, things can become pretty complicated pretty fast. But > perhaps you can isolate some kinds of queries that can be cached for > minutes, and keep the rest without caching? Thanks. In fact we need caching on a very specific part of our application, for only three queries which hit the DB hard with thousands of simultaneous SELECTs. Do pgmemcache or pgbouncer allow for very specific usage? Both look way too complex. I don't mind the initial headachy setup and config, but then I would like the system to hum on its own, and the querying should be simple and intuitive. I need a simple mechanism to query the cache, and invalidate a specific query in the cache when the underlying table is UPDATED so that the query gets cached afresh when issued later. (And a way to use this mechanism through PHP or Perl would be splendid). TIA for any tips! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] WAITING in PG_STATS_ACTIVITY
Hi, little question: when is WAITING in PG_STATS_ACTIVITYset to TRUE ? When this connection is waiting on a lock , or are there any other reasons, waiting on another resource ? TIA ruediger ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql performance in production environment
On 19/08/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > Phoenix Kiula wrote: .snipped > > I can merrily increase the "max_fsm_pages" directive, but the manual > > also caveats that with "this can use more system V memory than > > available on your system". My full verbose vacuum info below includes > > the line: > > Do you actually run VACUUM FULL, or do you just mean you run VACUUM over > the full database? If you run VACUUM FULL, you need to stop doing that > :-) However, you will need to run it at least once over the whole > database once you've fixed your max_fsm_pages setting. No we only do a "vacuum analyze" when we do something manually. Otherwise, it's all "autovacuum". Never done a "vacuum full" -- should we do one now given that we've overrun our max_fsm_pages? > > [- > > INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live > > rows and 0 dead rows; 3000 rows in sample, 2710124 > > estimated total rows > > -] > > There should be a line like this at the end of a "VACUUM VERBOSE" command: > INFO: free space map contains 33 pages in 74 relations > DETAIL: A total of 1184 page slots are in use (including overhead). > 1184 page slots are required to track all free space. > Current limits are: 153600 page slots, 1000 relations, using 965 kB. > VACUUM > Nope, there's no line that includes such useful info. The entire verbose output was included in my note. I did not see the words "Current limits are". Do I need to enable something in the conf file to get more verbose output? "debug2", "debug3" -- kind of stuff? TIA ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] server closed the connection unexpectedly
Muhyiddin A.M Hayat wrote: > Dear all, > > i'm unable to connect postgres server with error : > > C:\Program Files\PostgreSQL\8.2\bin>psql -U postgres siakad > Password for user postgres: > psql: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > > pg_log : > > 2007-08-19 03:00:50 LOG: database system was shut down at 2007-08-19 > 02:58:26 Malay Peninsula Standard Time > 2007-08-19 03:00:50 LOG: checkpoint record is at 0/75A808 > 2007-08-19 03:00:50 LOG: redo record is at 0/75A808; undo record is at > 0/0; shutdown TRUE > 2007-08-19 03:00:50 LOG: next transaction ID: 0/1931; next OID: 16737 > 2007-08-19 03:00:50 LOG: next MultiXactId: 1; next MultiXactOffset: 0 > 2007-08-19 03:00:51 LOG: database system is ready > 2007-08-19 03:01:43 LOG: could not receive data from client: An > operation was attempted on something that is not a socket. > > > 2007-08-19 03:01:43 LOG: incomplete startup packet > > somebody help me please This error is most likely some antivirus or firewall software that's misbehaving. Try uninstalling any such software from the server (you can try disabling it first, but often a complete uninstall is needed) and try again. //Magnus //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > On 19/08/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: [... ] > Well based on some past posts, I looked into my pg_log stuff and found > a number of these lines: > > > [ > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > ] > > > Which suggests that our guess of running out of connections is the right one. > > So, we have three options (to begin with) -- > > 1. Increase the number of max_connections. This seems to be a voodoo > art and a complex calculation of database size (which in our case is > difficult to predict; it grows very fast), hardware, and such. I > cannot risk other apps running on this same machine.sql this error is a sign that the OS(!) is running out of resources(or at least won't allow pg to fork another process) - either you hit an ulimit for the user postgresql runs under or you need to flip some kernel setting to increase the number of processes. increasing max_connections wil NOT help because you are not even hitting the current one yet ... > > 2. Use connection pooling. I've found pgpool2 and pgbouncer from the > Skype group. Does anyone have experience using either? The latter > looks good, although we're usually skeptical about connection pooling > in general (or is that just the mysqli_pconnect() hangover?) pgbouncer works quite fine here. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > [Sorry for the length of this post. It stretched as I provided as much > info as possible..] > > So the rubber meets the road. We've put postgresql in a production > environment with some heavy simultaneous usage. It works well in > general, but often PG doesn't respond. How should I test what is going > wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs > are happening but not much, it's mostly SELECTs. Is PGSQL running out > of connections? We can temporarily fix this by restarting pgsql but > I'd like a more tenable solution. > > Speculating that it could be some conf variable somewhere > (max_fsm_pages in particular) I am including three things at the > bottom of this post: > > 1. Our PS output (for "postgres") > 2. *Verbose* vacuum info for a table that shows max_fsm warning > 3. Our postgresql.conf settings > > My question 1 -- how should we test and tweak our production > installation? Where should we look. In MySQL we could do a "show > status" at the console and it would give a mountain of information. > Then there was that handy little "tuning-primer" script that made it > all come alive. I suppose this stuff is also available in pg_catalog > but is there any website that goes in depth into HOW to tune, what > different values mean, and such? > > My question 2 -- in production, we're constantly seeing this message > while vacuuming one table with less than 3 million rows, but one that > we expect to keep growing: > > [- > WARNING: relation "public.links" contains more than "max_fsm_pages" > pages with useful free space > HINT: Consider compacting this relation or increasing the > configuration parameter "max_fsm_pages". > VACUUM > -] > > I can merrily increase the "max_fsm_pages" directive, but the manual > also caveats that with "this can use more system V memory than > available on your system". My full verbose vacuum info below includes > the line: Do you actually run VACUUM FULL, or do you just mean you run VACUUM over the full database? If you run VACUUM FULL, you need to stop doing that :-) However, you will need to run it at least once over the whole database once you've fixed your max_fsm_pages setting. > [- > INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live > rows and 0 dead rows; 3000 rows in sample, 2710124 > estimated total rows > -] There should be a line like this at the end of a "VACUUM VERBOSE" command: INFO: free space map contains 33 pages in 74 relations DETAIL: A total of 1184 page slots are in use (including overhead). 1184 page slots are required to track all free space. Current limits are: 153600 page slots, 1000 relations, using 965 kB. VACUUM (note that my numbers are for a more or less empty database. Yours will be much higher) If your database size is reasonably stable, pick a good value a bit above the numbers suggested. If you expect it to grow a lot, add some more overhead, but monitor this value. > Does this mean my table needs nearly 200,000 pages, and that should be > the setting of max_fsm_pages? This server is on a fairly common setup No. It means that the table "traders" is using 199396 pages - most of them aren't free, so they are not tracked in the FSM. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql performance in production environment
Phoenix Kiula wrote: > > > Well based on some past posts, I looked into my pg_log stuff and found > a number of these lines: > > > [ > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > LOG: could not fork new process for connection: Resource temporarily > unavailable > ] > > > Which suggests that our guess of running out of connections is the right one. No, you're not running out of "connections". You are, however, running over some kernel limit. > So, we have three options (to begin with) -- > > 1. Increase the number of max_connections. This seems to be a voodoo > art and a complex calculation of database size (which in our case is > difficult to predict; it grows very fast), hardware, and such. I > cannot risk other apps running on this same machine. No. You are not yet reaching max_connections, that would give you an error message that actually says so. This message indicates that you have an ulimit for the account that postgresql runs under that limits some resources - in this case most likely the number of processes. And this limit is not "compatible" with your settings for max_connections. You need to find this ulimit, and at least change it, or even remove it. > 2. Use connection pooling. I've found pgpool2 and pgbouncer from the > Skype group. Does anyone have experience using either? The latter > looks good, although we're usually skeptical about connection pooling > in general (or is that just the mysqli_pconnect() hangover?) Connection pooling "in general", is something that pretty much *every* larger app will always use. It may be implemented in the app (something which has often been troublesome in PHP solutions, but it's certainly the norm for Java or .Net apps) or in middleware like pgpool or pgbouncer. There should be no need to be sceptical about it in general ;-) Can't speak for either of those apps specifically, as I haven't used them in production. > 3. Use caching of queries. Memcache comes recommended, but there's a > discussion as recently as Jan 2007 on this list about race conditions > and such (most of which I don't quite understand) which cautions > against its use. We do expect plenty of transactions and if something > that has been updated is not very correctly and promptly invalidated > in the cache, it has huge business repercussions for us. There are ways to do this, but if you can't just use timeouts to expire from the cache, things can become pretty complicated pretty fast. But perhaps you can isolate some kinds of queries that can be cached for minutes, and keep the rest without caching? //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql performance in production environment
On Sun, Aug 19, 2007 at 05:15:34PM +0800, Phoenix Kiula wrote: > Well based on some past posts, I looked into my pg_log stuff and found > a number of these lines: > [ > LOG: could not fork new process for connection: Resource temporarily > unavailable Usually this message means that the *kernel* refused to let you fork a new process. Some resource was unavailable. So this means (usually) one of two things: - You have a maximum on the number of processes on the system and you're exceeding it - You have a limit of the amount of memory. If you have overcommit disabled this may be causing the issue. I couldn't find quickly details of your setup but if you're running linux with overcommit disabled, make sure you have at least twice as much space allocated for swap as you have real memory. Make sure you don't have lots of idle postgres processes lying around. You'll have to provide more detail about your system before getting any better recommendations. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [GENERAL] Postgresql performance in production environment
On 19/08/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: > [Sorry for the length of this post. It stretched as I provided as much > info as possible..] > > So the rubber meets the road. We've put postgresql in a production > environment with some heavy simultaneous usage. It works well in > general, but often PG doesn't respond. How should I test what is going > wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs > are happening but not much, it's mostly SELECTs. Is PGSQL running out > of connections? We can temporarily fix this by restarting pgsql but > I'd like a more tenable solution. > > Speculating that it could be some conf variable somewhere > (max_fsm_pages in particular) I am including three things at the > bottom of this post: > > 1. Our PS output (for "postgres") > 2. *Verbose* vacuum info for a table that shows max_fsm warning > 3. Our postgresql.conf settings > > My question 1 -- how should we test and tweak our production > installation? Where should we look. In MySQL we could do a "show > status" at the console and it would give a mountain of information. > Then there was that handy little "tuning-primer" script that made it > all come alive. I suppose this stuff is also available in pg_catalog > but is there any website that goes in depth into HOW to tune, what > different values mean, and such? > > My question 2 -- in production, we're constantly seeing this message > while vacuuming one table with less than 3 million rows, but one that > we expect to keep growing: > > [- > WARNING: relation "public.links" contains more than "max_fsm_pages" > pages with useful free space > HINT: Consider compacting this relation or increasing the > configuration parameter "max_fsm_pages". > VACUUM > -] > > I can merrily increase the "max_fsm_pages" directive, but the manual > also caveats that with "this can use more system V memory than > available on your system". My full verbose vacuum info below includes > the line: > > [- > INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live > rows and 0 dead rows; 3000 rows in sample, 2710124 > estimated total rows > -] > > Does this mean my table needs nearly 200,000 pages, and that should be > the setting of max_fsm_pages? This server is on a fairly common setup > these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I > don't mind letting postgres use up to 1GB of the memory for itself, > but the rest is needed for others. > > From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html > , it seems the "max_fsm_relations" is about how many tables and > indexes can be tracked in the free space map. Does this mean the > number of actual tables and indexes in postgres databases, or > instances of these tables? For example, if I only run 5 databases, > each of which have about 10 tables and 20 indexes, then I have only > 150 (5 * 30) actual "relations" in postgresql lingo. So my > max_fsm_relations setting can be 150? (Which seems a little low > compared to what I see online in several posts online). > > Assuming 150 is ok, that manual page, and other tweaking stuff such as - > http://www.revsys.com/writings/postgresql-performance.html -- suggest > that "max_fsm_pages" is even more critical. The manual says this > should be at least 16 times that of max_fsm_relations, so in my > example, it should be at least 150 * 16, which is about 2400. This > seems abysmally low! If I up this figure to, say, 24000 instead, I > still keep seeing the kinds of errors posted above. > > > My question no. 3 -- for a SELECT-heavy database, is there any > tried-and-tested caching tool that could be of use? I'd like to skip > connection pooling if possible, but would be very interested in good > caching products or contribs. I noticed pgmemcached ( > http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2. > Would love some thoughts from people who have used it...is it worth > the effort? > > > More of my info below. TIA for your thoughts and advice! > > -/Phoenix > > > > > ==EXHIBIT 1: PS OUTPUT == > > ps auxnm | grep postgres > 26 20665 0.0 0.0 11760 612 ?-Aug18 0:00 > postgres: logger process > 26 20670 0.0 1.1 188684 48312 ? -Aug18 0:00 > postgres: writer process > 26 20671 0.0 0.0 12032 804 ?-Aug18 0:28 > postgres: stats collector process > 26 14497 0.0 4.1 452108 172656 ? -02:05 0:02 > postgres: traders_traders traders 127.0.0.1(56204) VACUUM >0 9444 0.0 0.0 5008 656 pts/0-02:53 0:00 grep postgres > > > > > ==EXHIBIT 2: POSTGRES.CONF == > listen_addresses = 'localhost,*' > > max_connections = 250 > shared_buffers = 21000 # Not much more: http://snipr.com/pgperf > effective_cache_size = 32000 > max_fsm_relations = 500 > max_fsm_pages = 6 > sort_mem = 4096 # Low when not needed: http://snipr.com/pgperf >
[GENERAL] Postgresql performance in production environment
[Sorry for the length of this post. It stretched as I provided as much info as possible..] So the rubber meets the road. We've put postgresql in a production environment with some heavy simultaneous usage. It works well in general, but often PG doesn't respond. How should I test what is going wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs are happening but not much, it's mostly SELECTs. Is PGSQL running out of connections? We can temporarily fix this by restarting pgsql but I'd like a more tenable solution. Speculating that it could be some conf variable somewhere (max_fsm_pages in particular) I am including three things at the bottom of this post: 1. Our PS output (for "postgres") 2. *Verbose* vacuum info for a table that shows max_fsm warning 3. Our postgresql.conf settings My question 1 -- how should we test and tweak our production installation? Where should we look. In MySQL we could do a "show status" at the console and it would give a mountain of information. Then there was that handy little "tuning-primer" script that made it all come alive. I suppose this stuff is also available in pg_catalog but is there any website that goes in depth into HOW to tune, what different values mean, and such? My question 2 -- in production, we're constantly seeing this message while vacuuming one table with less than 3 million rows, but one that we expect to keep growing: [- WARNING: relation "public.links" contains more than "max_fsm_pages" pages with useful free space HINT: Consider compacting this relation or increasing the configuration parameter "max_fsm_pages". VACUUM -] I can merrily increase the "max_fsm_pages" directive, but the manual also caveats that with "this can use more system V memory than available on your system". My full verbose vacuum info below includes the line: [- INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total rows -] Does this mean my table needs nearly 200,000 pages, and that should be the setting of max_fsm_pages? This server is on a fairly common setup these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I don't mind letting postgres use up to 1GB of the memory for itself, but the rest is needed for others. >From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html , it seems the "max_fsm_relations" is about how many tables and indexes can be tracked in the free space map. Does this mean the number of actual tables and indexes in postgres databases, or instances of these tables? For example, if I only run 5 databases, each of which have about 10 tables and 20 indexes, then I have only 150 (5 * 30) actual "relations" in postgresql lingo. So my max_fsm_relations setting can be 150? (Which seems a little low compared to what I see online in several posts online). Assuming 150 is ok, that manual page, and other tweaking stuff such as - http://www.revsys.com/writings/postgresql-performance.html -- suggest that "max_fsm_pages" is even more critical. The manual says this should be at least 16 times that of max_fsm_relations, so in my example, it should be at least 150 * 16, which is about 2400. This seems abysmally low! If I up this figure to, say, 24000 instead, I still keep seeing the kinds of errors posted above. My question no. 3 -- for a SELECT-heavy database, is there any tried-and-tested caching tool that could be of use? I'd like to skip connection pooling if possible, but would be very interested in good caching products or contribs. I noticed pgmemcached ( http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2. Would love some thoughts from people who have used it...is it worth the effort? More of my info below. TIA for your thoughts and advice! -/Phoenix ==EXHIBIT 1: PS OUTPUT == > ps auxnm | grep postgres 26 20665 0.0 0.0 11760 612 ?-Aug18 0:00 postgres: logger process 26 20670 0.0 1.1 188684 48312 ? -Aug18 0:00 postgres: writer process 26 20671 0.0 0.0 12032 804 ?-Aug18 0:28 postgres: stats collector process 26 14497 0.0 4.1 452108 172656 ? -02:05 0:02 postgres: traders_traders traders 127.0.0.1(56204) VACUUM 0 9444 0.0 0.0 5008 656 pts/0-02:53 0:00 grep postgres ==EXHIBIT 2: POSTGRES.CONF == listen_addresses = 'localhost,*' max_connections = 250 shared_buffers = 21000 # Not much more: http://snipr.com/pgperf effective_cache_size = 32000 max_fsm_relations = 500 max_fsm_pages = 6 sort_mem = 4096 # Low when not needed: http://snipr.com/pgperf work_mem = 4096 temp_buffers = 4096 authentication_timeout = 10s ssl = off #VACUUM SETTINGS autovacuum = on vacuum_cost_delay = 10 stats_start_collector = on stats_row_level = on autovacuum_vacuum_threshold = 300 autovacuum_analyze_threshold = 100 #FOR BACKG