Re: [PERFORM] update 600000 rows
[EMAIL PROTECTED] wrote: Hello i have a python script to update 60 rows to one table from a csv file in my postgres database and it takes me 5 hours to do the transaction... I'm on debian etch with 8.1 postgres server on a 64 bits quad bi opteron. I have desactived all index except the primary key who is not updated since it's the reference column of the update too. When i run this script the server is not used by any other user. First when i run htop i see that the memory used is never more than 150 MB. I don't understand in this case why setting shmall and shmmax kernel's parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of the transaction a lot compared to a shmall and shmax in (only) 2 GB ?! The script is run with only one transaction and pause by moment to let the time to postgres to write data to disk. If the data were writed at the end of the transaction will be the perfomance better ? i wan't that in production data regulary writed to disk to prevent loosinf of data but it there any interest to write temporary data in disk in a middle of a transaction ??? I'm completely noob to postgres and database configuration and help are welcome. thank You will get a huge improvement in time if you use batch updates instead of updating a row at a time. See: http://www.postgresql.org/docs/8.2/interactive/populate.html and http://www.postgresql.org/docs/8.2/interactive/sql-begin.html You will also get a big improvement if you can turn fsync off during the update. See: http://www.postgresql.org/docs/8.2/interactive/runtime-config-wal.html You also need to vacuum the table after doing that many updates since pg does a delete and insert on each update, there will be a lot of holes. Cheers HH ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] libgcc double-free, backend won't die
Gregory Stark wrote: 1) No Postgres function is guaranteed to be thread-safe so you better protect against concurrent calls to Postgres API functions. Also Postgres functions use longjmp which can restore the stack pointer to a value which may have been set earlier, possibly by another thread which wouldn't work. That's a whole different thing to saying that you can't use a threaded subsystem under a Postgres process. 2) Some OSes have bugs (notably glibc for a specific narrow set of versions) and don't expect to have standard library functions called before pthread_init() then called again after pthread_init(). If they expect the system to be either threaded or not threaded then they may be surprised to see that state change. Is there any particular reason not to ensure that any low-level threading support in libc is enabled right from the get-go, as a build-time option? Does it do anything that's not well defined in a threaded process? Signal handling and atfork (and posix_ exec) are tyical areas I guess. While this can potentially make malloc slower, Postgres already wraps malloc so using a caching thread-aware malloc substitute such as nedmalloc should be no problem. I don't see any issue with the setjmp usage - so long as only one thread uses any internal API. Which can be checked rather easily at runtime with low cost in a debug build. That just means you have to use a non-buggy version of your OS. Unfortunately tracking down bugs in your OS to figure out what's causing them and whether it's a particular known bug can be kind of tricky. Is that really much of an issue an the current version of any major OS though? Its reaonable to limit the use of a threaded library (in particular, the runtimes for most embeddable languages, or libraries for RPC runtimes, etc) to 'modern' platforms that support threads effectively. On many such platforms these will already implicitly link libpthread anyway. James ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] libgcc double-free, backend won't die
James Mansion [EMAIL PROTECTED] writes: Is there any particular reason not to ensure that any low-level threading support in libc is enabled right from the get-go, as a build-time option? Yes. 1) It's of no value to us 2) On many platforms there is a nonzero performance penalty regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] libgcc double-free, backend won't die
Tom Lane [EMAIL PROTECTED] writes: James Mansion [EMAIL PROTECTED] writes: Is there any particular reason not to ensure that any low-level threading support in libc is enabled right from the get-go, as a build-time option? Yes. 1) It's of no value to us 2) On many platforms there is a nonzero performance penalty And the only reason to do that would be to work around one bug in one small range of glibc versions. If you're going to use a multi-threaded library (which isn't very common since it's hard to do safely for all those other reasons) surely using a version of your OS without any thread related bugs is a better idea. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] SELECT * FROM table is too slow
Hi pgsql-performance, I've a problem with the select * on a small table. See below: x7=# EXPLAIN ANALYZE select * from megjelenesek; QUERY PLAN - Seq Scan on megjelenesek (cost=0.00..15633.07 rows=207 width=52) (actual time=103.258..18802.530 rows=162 loops=1) Total runtime: 18815.362 ms (2 rows) x7=# \d megjelenesek; Table public.megjelenesek Column|Type | Modifiers -+-+ idn | integer | not null default nextval('megjelenesek_idn_seq'::regclass) tag_id | integer | tag_wlap_id | integer | get_date| timestamp without time zone | default now() megjelent | numeric | default 0 Indexes: megjelenesek_pkey PRIMARY KEY, btree (idn) megjelenesek_tag_id hash (tag_id) megjelenesek_tag_wlap_id hash (tag_wlap_id) x7=# SELECT count(idn) from megjelenesek; count --- 162 (1 row) Why does it take cca 18-20 sec to get the results? Too many indexes? -- Adam PAPAI D i g i t a l Influence http://www.wooh.hu E-mail: [EMAIL PROTECTED] Phone: +36 30 33-55-735 (Hungary) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SELECT * FROM table is too slow
Adam PAPAI wrote: Hi pgsql-performance, I've a problem with the select * on a small table. See below: x7=# SELECT count(idn) from megjelenesek; count --- 162 (1 row) Why does it take cca 18-20 sec to get the results? Too many indexes? You likely have a huge amount of dead rows. Try dumping and restoring the table and remember to run vacuum (or autovacuum) often. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] update 600000 rows
On Dec 16, 2007 12:21 AM, [EMAIL PROTECTED] wrote: Loïc Marteau [EMAIL PROTECTED] wrote .. Steve Crawford wrote: If this is correct, I'd first investigate simply loading the csv data into a temporary table, creating appropriate indexes, and running a single query to update your other table. My experience is that this is MUCH faster. My predecessor in my current position was doing an update from a csv file line by line with perl. That is one reason he is my predecessor. Performance did not justify continuing his contract. i can try this. The problem is that i have to make an insert if the update don't have affect a rows (the rows don't exist yet). The number of rows affected by insert is minor regards to the numbers of updated rows and was 0 when i test my script). I can do with a temporary table : update all the possible rows and then insert the rows that are in temporary table and not in the production table with a 'not in' statement. is this a correct way ? That's what I did at first, but later I found better performance with a TRIGGER on the permanent table that deletes the target of an UPDATE, if any, before the UPDATE. That's what PG does anyway, and now I can do the entire UPDATE in one command. that's very clever, and probably is the fastest/best way to do it. you can even temporarily add the trigger a transaction...I am going to try this out in a couple of things (I currently do these type of things in two statements) and see how it turns out. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] libgcc double-free, backend won't die
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: James Mansion [EMAIL PROTECTED] writes: Is there any particular reason not to ensure that any low-level threading support in libc is enabled right from the get-go, as a build-time option? Yes. 1) It's of no value to us Who is us? Some of us would like to use the system for advanced scientific work, and scientific libraries are usually written in C++. 2) On many platforms there is a nonzero performance penalty I'm surprised you say this, given that you're usually the voice of reason when it comes to rejecting hypothetical statements in favor of tested facts. If building Postgres using thread-safe technology is really a performance burden, that could be easily verified. A nonzero performance penalty, what does that mean, a 0.0001% slowdown? I find it hard to believe that the performance penalty of thread-safe version would even be measurable. If nobody has the time to do such a test, or other priorities take precedence, that's understandable. But the results aren't in yet. And the only reason to do that would be to work around one bug in one small range of glibc versions. If you're going to use a multi-threaded library (which isn't very common since it's hard to do safely for all those other reasons) surely using a version of your OS without any thread related bugs is a better idea. You're jumping ahead. This problem has not been accurately diagnosed yet. It could be that the pthreads issue is completely misleading everyone, and in fact there is a genuine memory corruption going on here. Or not. We don't know yet. I have made zero progress fixing this problem. The one small range of glibc versions is a giveaway. I've seen this problem in FC3, 5, and 6 (I went through this series of upgrades all in one week trying to fix this problem). With each version, I recompiled Postgres and OpenBabel from scratch. I'm going to try FC7 next since it's now the only official supported version, but I don't believe glibc is the problem. Andrew Dalke, a regular contributor to the OpenBabel forum, suggests another problem: It could be a result of linking the wrong libraries together. The gcc/ld system has a byzantine set of rules and hacks that if I understand Andrew's posting) select different versions of the same library depending on what it thinks you might need. It's possible that the wrong version of some system library is getting linked in. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SELECT * FROM table is too slow
On Sun, Dec 16, 2007 at 07:34:45PM +0100, Adam PAPAI wrote: Why does it take cca 18-20 sec to get the results? Too many indexes? You cannot possibly have VACUUMed in a long time. Try a VACUUM FULL, and then schedule regular VACUUMs (or use autovacuum). /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] libgcc double-free, backend won't die
Craig James [EMAIL PROTECTED] writes: Gregory Stark wrote: And the only reason to do that would be to work around one bug in one small range of glibc versions. If you're going to use a multi-threaded library (which isn't very common since it's hard to do safely for all those other reasons) surely using a version of your OS without any thread related bugs is a better idea. You're jumping ahead. This problem has not been accurately diagnosed yet. It could be that the pthreads issue is completely misleading everyone, and in fact there is a genuine memory corruption going on here. Or not. We don't know yet. I have made zero progress fixing this problem. Well, no that would be you jumping ahead then... You proposed Postgres changing the way it handles threaded libraries based on Tom's suggestion that your problem was something like the glibc problem previously found. My comment was based on the known glibc problem. From what you're saying it's far from certain that the problem would be fixed by changing Postgres's behaviour in the way you proposed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large Objects and Toast
On Dec 14, 2007 10:11 PM, Campbell, Lance [EMAIL PROTECTED] wrote: PostgreSQL: 8.2 My understanding is that when one creates a large object there is no way to link the large object to a field in a table so that cascading delete can occur. Is this correct? My understanding is that you have to manually delete the large object. Yes, but you can setup a trigger to do that for you i think. I also read something about the OID ID being limited in size. What is the size limit of this OID type? I am sure that it is bigger than the number of files that I would be uploaded into my db; but I just want to get an idea of the range. Oid is an unsigned integer in postgres IIRC. -- Usama Munir Dar http://linkedin.com/in/usamadar Consultant Architect Cell:+92 321 5020666 Skype: usamadar
Re: [PERFORM] update 600000 rows
I wrote That's what I did at first, but later I found better performance with a TRIGGER on the permanent table that deletes the target of an UPDATE, if any, before the UPDATE. [INSERT!] That's what PG does anyway, and now I can do the entire UPDATE [INSERT] in one command. It's probably obvious, but what I should have said is that now I do the INSERT in one command; I have no need of an UPDATE. So I do no UPDATEs, only INSERTs. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] libgcc double-free, backend won't die
Tom Lane wrote: Yes. 1) It's of no value to us 2) On many platforms there is a nonzero performance penalty I think you have your head in the ground, but its your perogative. *You* might not care, but anyone wanting to use thread-aware libraries (and I'm *not* talking about threading in any Postgres code) will certainly value it if they can do so with some stability. There's a clear benefit to being able to use such code. I suggested a build option but you reject it out of hand. And in doing so, you also lock out the benefits that you *could* have as well, in future.. It seems religious, which is unfortunate. Are you suggesting that the performance penalty, apart from the malloc performance (which is easily dealt with) is *material*? An extra indirection in access to errno will hurt so much? Non-zero I can accept, but clinging to 'non-zero' religiously isn't smart, especially if its a build-time choice. We'll clearly move to multiple cores, and the clock speed enhancements will slow (at best). In many cases, the number of available cores will exceed the number of instantaneously active connections. Don't you want to be able to use all the horsepower? Certainly on the sort of systems I work in my day job (big derivative trading systems) its the norm that the cache hit rate on Sybase is well over 99%, and such systems are typically CPU bound. Parallelism matters, and will matter more and more in future. So, an ability to start incrementally adding parallel operation of some actions (whether scanning or updating indices or pushing data to the peer) is valuable, as is the ability to use threaded libraries - and the (potential?) ability to use embedded languages and more advanced libraries in Postgres procs is one of the advantages of the system itself. (I'd like to discount the use of a runtime in a seperate process - the latency is a problem for row triggers and functions) James ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] SELECT * FROM table is too slow
Adam PAPAI [EMAIL PROTECTED] wrote .. Hi pgsql-performance, I've a problem with the select * on a small table. I can think of two possibilities for such incredibly slow performance. One: your table has not been VACUUMed for a long time and is full of dead tuples. Try VACUUM FULL on it, or CLUSTER on the most frequently used index. Two: did you accidentally put the database on your floppy drive? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings