Re: [PERFORM] update 600000 rows

2007-12-16 Thread H. Hall


[EMAIL PROTECTED] wrote:

Hello

i have a python script to update 60 rows to one table from a csv file in my
postgres database and it takes me 5 hours to do the transaction...

I'm on debian etch with 8.1 postgres server on a 64 bits quad bi opteron.

I have desactived all index except the primary key who is not updated since it's
the reference column of the update too.

When i run this script the server is not used by any other user.

First when i run htop i see that the memory used is never more than 150 MB.
I don't understand in this case why setting shmall and shmmax kernel's
parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!

The script is run with only one transaction and pause by moment to let the time
to postgres to write data to disk.

If the data were writed at the end of the transaction will be the perfomance
better ? i wan't that in production data regulary writed to disk to prevent
loosinf of data but it there any interest to write temporary data in disk in a
middle of a transaction ???

I'm completely noob to postgres and database configuration and help  are
welcome.

thank


You will get a huge improvement in time if you use batch updates instead 
of updating a row at a time. See:


   http://www.postgresql.org/docs/8.2/interactive/populate.html
   and
   http://www.postgresql.org/docs/8.2/interactive/sql-begin.html

You will also get a big improvement if you can turn fsync off during the 
update. See:

   http://www.postgresql.org/docs/8.2/interactive/runtime-config-wal.html

You also need to vacuum the table after doing that many updates since pg 
does a delete and insert on each update, there will be a lot of holes.


Cheers
HH




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread James Mansion

Gregory Stark wrote:

1) No Postgres function is guaranteed to be thread-safe so you better protect
   against concurrent calls to Postgres API functions. Also Postgres functions
   use longjmp which can restore the stack pointer to a value which may have
   been set earlier, possibly by another thread which wouldn't work.

  
That's a whole different thing to saying that you can't use a threaded 
subsystem under a Postgres

process.


2) Some OSes have bugs (notably glibc for a specific narrow set of versions)
   and don't expect to have standard library functions called before
   pthread_init() then called again after pthread_init(). If they expect the
   system to be either threaded or not threaded then they may be surprised
   to see that state change.

  
Is there any particular reason not to ensure that any low-level 
threading support in libc is enabled right
from the get-go, as a build-time option?  Does it do anything that's not 
well defined in a threaded
process?  Signal handling and atfork (and posix_ exec) are tyical areas 
I guess.  While this can potentially
make malloc slower, Postgres already wraps malloc so using a caching 
thread-aware malloc

substitute such as nedmalloc should be no problem.

I don't see any issue with the setjmp usage - so long as only one thread 
uses any internal API.  Which

can be checked rather easily at runtime with low cost in a debug build.

That just means you have to use a non-buggy version of your OS. Unfortunately
tracking down bugs in your OS to figure out what's causing them and whether
it's a particular known bug can be kind of tricky.
  
Is that really much of an issue an the  current version of any major OS 
though?  Its reaonable to
limit the use of a threaded library (in particular, the runtimes for 
most embeddable languages, or
libraries for RPC runtimes, etc) to 'modern' platforms that support 
threads effectively.  On

many such platforms these will already implicitly link libpthread anyway.

James



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Tom Lane
James Mansion [EMAIL PROTECTED] writes:
 Is there any particular reason not to ensure that any low-level 
 threading support in libc is enabled right
 from the get-go, as a build-time option?

Yes.

1) It's of no value to us

2) On many platforms there is a nonzero performance penalty

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 James Mansion [EMAIL PROTECTED] writes:
 Is there any particular reason not to ensure that any low-level 
 threading support in libc is enabled right
 from the get-go, as a build-time option?

 Yes.
 1) It's of no value to us
 2) On many platforms there is a nonzero performance penalty

And the only reason to do that would be to work around one bug in one small
range of glibc versions. If you're going to use a multi-threaded library
(which isn't very common since it's hard to do safely for all those other
reasons) surely using a version of your OS without any thread related bugs is
a better idea.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] SELECT * FROM table is too slow

2007-12-16 Thread Adam PAPAI

Hi pgsql-performance,

I've a problem with the select * on a small table.

See below:


x7=# EXPLAIN ANALYZE select * from megjelenesek;
 QUERY PLAN

-
 Seq Scan on megjelenesek  (cost=0.00..15633.07 rows=207 width=52)
(actual time=103.258..18802.530 rows=162 loops=1)
 Total runtime: 18815.362 ms
(2 rows)

x7=# \d megjelenesek;
  Table public.megjelenesek
   Column|Type |
Modifiers
-+-+
 idn | integer | not null default
nextval('megjelenesek_idn_seq'::regclass)
 tag_id  | integer |
 tag_wlap_id | integer |
 get_date| timestamp without time zone | default now()
 megjelent   | numeric | default 0
Indexes:
megjelenesek_pkey PRIMARY KEY, btree (idn)
megjelenesek_tag_id hash (tag_id)
megjelenesek_tag_wlap_id hash (tag_wlap_id)

x7=# SELECT count(idn) from megjelenesek;
 count
---
   162
(1 row)

Why does it take cca 18-20 sec to get the results?
Too many indexes?

--
Adam PAPAI
D i g i t a l Influence
http://www.wooh.hu
E-mail: [EMAIL PROTECTED]
Phone: +36 30 33-55-735 (Hungary)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SELECT * FROM table is too slow

2007-12-16 Thread Joshua D. Drake

Adam PAPAI wrote:

Hi pgsql-performance,

I've a problem with the select * on a small table.

See below:


x7=# SELECT count(idn) from megjelenesek;
 count
---
   162
(1 row)

Why does it take cca 18-20 sec to get the results?
Too many indexes?


You likely have a huge amount of dead rows. Try dumping and restoring 
the table and remember to run vacuum (or autovacuum) often.


Sincerely,


Joshua D. Drake



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] update 600000 rows

2007-12-16 Thread Merlin Moncure
On Dec 16, 2007 12:21 AM,  [EMAIL PROTECTED] wrote:
 Loïc Marteau [EMAIL PROTECTED] wrote ..
  Steve Crawford wrote:
   If this
   is correct, I'd first investigate simply loading the csv data into a
   temporary table, creating appropriate indexes, and running a single
   query to update your other table.

 My experience is that this is MUCH faster. My predecessor in my current 
 position was doing an update from a csv file line by line with perl. That is 
 one reason he is my predecessor. Performance did not justify continuing his 
 contract.

  i can try this. The problem is that i have to make an insert if the
  update don't have affect a rows (the rows don't exist yet). The number
  of rows affected by insert is minor regards to the numbers of updated
  rows and was 0 when i test my script). I can do with a temporary table
  : update all the possible rows and then insert the rows that are in
  temporary table and not in the production table with a 'not in'
  statement. is this a correct way ?

 That's what I did at first, but later I found better performance with a 
 TRIGGER on the permanent table that deletes the target of an UPDATE, if any, 
 before the UPDATE. That's what PG does anyway, and now I can do the entire 
 UPDATE in one command.

that's very clever, and probably is the fastest/best way to do it.
you can even temporarily add the trigger a transaction...I am going to
try this out in a couple of things (I currently do these type of
things in two statements) and see how it turns out.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Craig James

Gregory Stark wrote:

Tom Lane [EMAIL PROTECTED] writes:


James Mansion [EMAIL PROTECTED] writes:
Is there any particular reason not to ensure that any low-level 
threading support in libc is enabled right

from the get-go, as a build-time option?

Yes.
1) It's of no value to us


Who is us?  Some of us would like to use the system for advanced scientific 
work, and scientific libraries are usually written in C++.


2) On many platforms there is a nonzero performance penalty


I'm surprised you say this, given that you're usually the voice of reason when it comes 
to rejecting hypothetical statements in favor of tested facts.  If building Postgres 
using thread-safe technology is really a performance burden, that could be easily 
verified.  A nonzero performance penalty, what does that mean, a 0.0001% 
slowdown?  I find it hard to believe that the performance penalty of thread-safe version 
would even be measurable.

If nobody has the time to do such a test, or other priorities take precedence, 
that's understandable.  But the results aren't in yet.


And the only reason to do that would be to work around one bug in one small
range of glibc versions. If you're going to use a multi-threaded library
(which isn't very common since it's hard to do safely for all those other
reasons) surely using a version of your OS without any thread related bugs is
a better idea.


You're jumping ahead.  This problem has not been accurately diagnosed yet.  It 
could be that the pthreads issue is completely misleading everyone, and in fact 
there is a genuine memory corruption going on here.  Or not.  We don't know 
yet.  I have made zero progress fixing this problem.

The one small range of glibc versions is a giveaway.  I've seen this problem in FC3, 5, 
and 6 (I went through this series of upgrades all in one week trying to fix this problem).  With 
each version, I recompiled Postgres and OpenBabel from scratch.  I'm going to try FC7 next since 
it's now the only official supported version, but I don't believe glibc is the problem.

Andrew Dalke, a regular contributor to the OpenBabel forum, suggests another 
problem: It could be a result of linking the wrong libraries together.  The 
gcc/ld system has a byzantine set of rules and hacks that if I understand 
Andrew's posting) select different versions of the same library depending on 
what it thinks you might need.  It's possible that the wrong version of some 
system library is getting linked in.

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SELECT * FROM table is too slow

2007-12-16 Thread Steinar H. Gunderson
On Sun, Dec 16, 2007 at 07:34:45PM +0100, Adam PAPAI wrote:
 Why does it take cca 18-20 sec to get the results?
 Too many indexes?

You cannot possibly have VACUUMed in a long time. Try a VACUUM FULL, and then
schedule regular VACUUMs (or use autovacuum).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Gregory Stark
Craig James [EMAIL PROTECTED] writes:

 Gregory Stark wrote:

 And the only reason to do that would be to work around one bug in one small
 range of glibc versions. If you're going to use a multi-threaded library
 (which isn't very common since it's hard to do safely for all those other
 reasons) surely using a version of your OS without any thread related bugs is
 a better idea.

 You're jumping ahead. This problem has not been accurately diagnosed yet. It
 could be that the pthreads issue is completely misleading everyone, and in 
 fact
 there is a genuine memory corruption going on here. Or not. We don't know yet.
 I have made zero progress fixing this problem.

Well, no that would be you jumping ahead then... You proposed Postgres
changing the way it handles threaded libraries based on Tom's suggestion that
your problem was something like the glibc problem previously found. My comment
was based on the known glibc problem. From what you're saying it's far from
certain that the problem would be fixed by changing Postgres's behaviour in
the way you proposed.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Large Objects and Toast

2007-12-16 Thread Usama Dar
On Dec 14, 2007 10:11 PM, Campbell, Lance [EMAIL PROTECTED] wrote:

  PostgreSQL: 8.2



 My understanding is that when one creates a large object there is no way
 to link the large object to a field in a table so that cascading delete can
 occur.  Is this correct?  My understanding is that you have to manually
 delete the large object.


Yes, but you can setup a trigger to do that for you i think.



 I also read something about the OID ID being limited in size.  What is the
 size limit of this OID type?  I am sure that it is bigger than the number of
 files that I would be uploaded into my db; but I just want to get an idea of
 the range.


Oid is  an unsigned integer in postgres  IIRC.



-- 
Usama Munir Dar http://linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar


Re: [PERFORM] update 600000 rows

2007-12-16 Thread andrew
I wrote
 That's what I did at first, but later I found better performance with
 a TRIGGER on the permanent table that deletes the target of an UPDATE,
 if any, before the UPDATE. [INSERT!] That's what PG does anyway, and now I 
 can do
 the entire UPDATE [INSERT] in one command.

It's probably obvious, but what I should have said is that now I do the INSERT 
in one command; I have no need of an UPDATE. So I do no UPDATEs, only INSERTs.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread James Mansion

Tom Lane wrote:

Yes.

1) It's of no value to us

2) On many platforms there is a nonzero performance penalty

  

I think you have your head in the ground, but its your perogative.  *You*
might not care, but anyone wanting to use thread-aware libraries (and I'm
*not* talking about threading in any Postgres code) will certainly value 
it if

they can do so with some stability.

There's a clear benefit to being able to use such code.  I suggested a
build option but you reject it out of hand.  And in doing so, you also 
lock out

the benefits that you *could* have as well, in future..  It seems religious,
which is unfortunate.

Are you suggesting that the performance penalty, apart from the
malloc performance (which is easily dealt with) is *material*?
An extra indirection in access to errno will hurt so much?  Non-zero I can
accept, but clinging to 'non-zero' religiously isn't smart, especially 
if its a

build-time choice.

We'll clearly move to multiple cores, and the clock speed enhancements will
slow (at best).  In many cases, the number of available cores will 
exceed the

number of instantaneously active connections.  Don't you want to be able
to use all the horsepower?

Certainly on the sort of systems I work in my day job (big derivative 
trading
systems) its the norm that the cache hit rate on Sybase is well over 
99%, and

such systems are typically CPU bound.  Parallelism matters, and will matter
more and more in future.

So, an ability to start incrementally adding parallel operation of some 
actions
(whether scanning or updating indices or pushing data to the peer) is 
valuable,
as is the ability to use threaded libraries - and the (potential?) 
ability to use
embedded languages and more advanced libraries in Postgres procs is one 
of the
advantages of the system itself.  (I'd like to discount the use of a 
runtime in a

seperate process - the latency is a problem for row triggers and functions)

James


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] SELECT * FROM table is too slow

2007-12-16 Thread andrew
Adam PAPAI [EMAIL PROTECTED] wrote ..
 Hi pgsql-performance,
 
 I've a problem with the select * on a small table.
 

I can think of two possibilities for such incredibly slow performance.

One: your table has not been VACUUMed for a long time and is full of dead 
tuples. Try VACUUM FULL on it, or CLUSTER on the most frequently used index.

Two: did you accidentally put the database on your floppy drive?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings