[GENERAL] Performance problem with query

2006-07-13 Thread Christian Rengstl
Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in 
the file myfile.txt, but with 8 million lines it takes around 5 hours and i 
just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from 
public.temp_table LIMIT 1), '_', 2) as int2)
   FROM public.temp_table; 

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from 
public.temp_table LIMIT 1), '_', 2) as int2) 
FROM public.temp_table; 

I came up with this query, because i wanted to use the COPY command to load 
huge files into the db, but i don't want to load all the columns contained in 
the file in only one table but copy some of them into one table and some in a 
second table. As i found out with logging, the data is loaded into temp_table 
within 15 minutes, but to transfer it from the temp_table to public.master took 
around 5 hours and from temp_table to public.values took again only something 
like 10 minutes. Can it be that the cast takes up so much more time than when 
reading and transferring 2 million lines?

Thanks for any advice!

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Performance problem with query

2006-07-13 Thread Merlin Moncure

On 7/13/06, Christian Rengstl
<[EMAIL PROTECTED]> wrote:

Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in 
the file myfile.txt, but with 8 million lines it takes around 5 hours and i 
just don't understand why there is such a huge discrepancy in performance.



on the surface it doesn't make sense, can you post an explain analyze?

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Performance problem with query

2006-07-14 Thread Merlin Moncure

On 7/14/06, Christian Rengstl
<[EMAIL PROTECTED]> wrote:

Hi,

somehow my reply yesterday got lost, but nevertheless here comes the
explain analyze again. It's the explain of the operation that causes
this huge performance discrepancy. Unfortunately i had to perform the
explain analyze with an empty temp_table, because after the whole
operation is done, i delete the data again to save some space.


that's not much help.  remember that explain analyze actually performs
your query. so next time you run it, do explain analyze and post
results here. just quick tip: run analyze before you run your big
query.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Antw: [GENERAL] Performance problem with query

2006-07-18 Thread Christian Rengstl
now finally after a long time i have the query plan for the whole filled table. 
I hope somebody can explain me why it takes so much longer...

QUERY PLAN



-
Result  (cost=0.09..254643.09 rows=8044000 width=39) (actual time=0.043..0.043
rows=0 loops=1)
  One-Time Filter: split_part(($1)::text, '_'::text, 2))::smallint = 1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS NOT TR
UE))
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 0.238 ms

Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual time=0.056..655772
.273 rows=8044000 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.002..0.002 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms

Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual time=1.245..1.245
rows=0 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.009..0.009 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 22.270 ms
(31 Zeilen)



>>> "Christian Rengstl" <[EMAIL PROTECTED]> 13.07.06 8.37 Uhr >>>
Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in 
the file myfile.txt, but with 8 million lines it takes around 5 hours and i 
just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from 
public.temp_table LIMIT 1), '_', 2) as int2)
   FROM public.temp_table; 

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from 
public.temp_table LIMIT 1), '_', 2) as int2) 
FROM public.temp_table; 

I came up with this query, because i wanted to use the COPY command to load 
huge files into the db, but i don't want to load all the columns contained in 
the file in only one table but copy some of them into one table and some in a 
second table. As i found out with logging, the data is loaded into temp_table 
within 15 minutes, but to transfer it from the temp_table toagain only 
something like 10 minutes. Can it be that the cast takes up so much more time 
than when reading and transferring 2 million lines?

Thanks for any advice!

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Performance problem with query

2006-07-18 Thread Merlin Moncure

On 7/18/06, Christian Rengstl
<[EMAIL PROTECTED]> wrote:

now finally after a long time i have the query plan for the whole filled table. 
I hope somebody can explain me why it takes so much longer...

QUERY PLAN



-
Result  (cost=0.09..254643.09 rows=8044000 width=39) (actual time=0.043..0.043
rows=0 loops=1)
  One-Time Filter: split_part(($1)::text, '_'::text, 2))::smallint = 1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS NOT TR
UE))
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 0.238 ms

Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual time=0.056..655772
.273 rows=8044000 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.002..0.002 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms

Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual time=1.245..1.245
rows=0 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (never executed)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=
10) (actual time=0.009..0.009 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) (nev
er executed)
Total runtime: 22.270 ms
(31 Zeilen)



>>> "Christian Rengstl" <[EMAIL PROTECTED]> 13.07.06 8.37 Uhr >>>
Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in 
the file myfile.txt, but with 8 million lines it takes around 5 hours and i 
just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from 
public.temp_table LIMIT 1), '_', 2) as int2)
   FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from 
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;



what is this phrase doing exactly?
CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2)

it looks fishy.
merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Performance problem with query

2006-07-19 Thread Christian Rengstl
The thing is that in the text file there is a column that is something like 
xyz_12 and in the table i just need the integer part of it that's what the 
query is used for. The problem though is not really in the select part, because 
running the select part on 8 million lines takes about 3 minutes, but i don't 
know why the insert is taking so long.

"Merlin Moncure" <[EMAIL PROTECTED]> wrote on 07/18/06 9:39 pm:
> On 7/18/06, Christian Rengstl
> <[EMAIL PROTECTED]> wrote:
>> now finally after a long time i have the query plan for the whole filled 
>> table. I hope somebody can explain me why it takes so much longer...
>> 
>> QUERY PLAN
>>
>> 
>> 
>> -
>> Result  (cost=0.09..254643.09 rows=8044000 width=39) (actual 
>> time=0.043..0.043
>> rows=0 loops=1)
>>   One-Time Filter: split_part(($1)::text, '_'::text, 2))::smallint = 1) 
>> IS
>> NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS 
>> NOT TR
>> UE))
>>   InitPlan
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (never executed)
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 
>> rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (never executed)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) 
>> (nev
>> er executed)
>> Total runtime: 0.238 ms
>>
>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual 
>> time=0.056..655772
>> .273 rows=8044000 loops=1)
>>   One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
>>   InitPlan
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 
>> rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 
>> rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) 
>> (act
>> ual time=0.002..191672.344 rows=8044000 loops=1)
>> Total runtime: 62259544.896 ms
>>
>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual 
>> time=1.245..1.245
>> rows=0 loops=1)
>>   One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
>>   InitPlan
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (never executed)
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 
>> rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 
>> width=
>> 10) (actual time=0.009..0.009 rows=1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000 width=39) 
>> (nev
>> er executed)
>> Total runtime: 22.270 ms
>> (31 Zeilen)
>>
>>
>>
>> >>> "Christian Rengstl" <[EMAIL PROTECTED]> 13.07.06 8.37 Uhr >>>
>> Good morning list,
>>
>> the following query takes about 15 to 20 minutes for around 2 million lines 
>> in the file myfile.txt, but with 8 million lines it takes around 5 hours and 
>> i just don't understand why there is such a huge discrepancy in performance.
>>
>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>
>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>> SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from 
>> public.temp_table LIMIT 1), '_', 2) as int2)
>>FROM public.temp_table;
>>
>> INSERT INTO public.values(smid, pos, chr)
>> SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from 
>> public.temp_table LIMIT 1), '_', 2) as int2)
>> FROM public.temp_table;
>>
> 
> what is this phrase doing exactly?
> CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2)
> 
> it looks fishy.
> merlin
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

--

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Christian Rengstl
The analyze is from the exact query and i dropped the indexes before the insert 
as well without imrpvement. The target table is as well completely empty and 
the insert is supposed to write, in this case, more or less 8 million lines in 
the table. There is a rule though, because i have inheritance table structure 
with one master table and around 20 child tables.

Q <[EMAIL PROTECTED]> wrote on 07/19/06 4:37 am:
> 
> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
> 
>> now finally after a long time i have the query plan for the whole  
>> filled table. I hope somebody can explain me why it takes so much  
>> longer...
> 
> 
> These explain analyze results don't appear to be from the queries you  
> posted previously. For these results to mean anything you need to  
> include the EXACT queries you used to generate them.
> 
>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual  
>> time=0.056..655772
>> .273 rows=8044000 loops=1)
>>   One-Time Filter: ((split_part(($1)::text, '_'::text,  
>> 2))::smallint = 1)
>>   InitPlan
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual  
>> time=0.003..0.003 rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00  
>> rows=8044000 width=
>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual  
>> time=0.006..0.007 rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00  
>> rows=8044000 width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000  
>> width=39) (act
>> ual time=0.002..191672.344 rows=8044000 loops=1)
>> Total runtime: 62259544.896 ms
> 
> 
> This is the query you want to be interested in, the others took no  
> time at all.
> 
> As a guess I would say the query is an INSERT INTO ... FROM  
> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.  
> The majority of the time appears to be taken up on the actual INSERT  
> and not the SELECT part.
> 
> How many rows are in the target table and what indexes does it have?  
> Does it have any triggers, check constraints, or rules applied to it?  
> All these things can make the insert take longer as the number of  
> rows you have already in the table increases.
> 
> More than likely you have a target table with a LOT of rows and a  
> bunch of indexes on it and your disks are being thrashed because the  
> indexes are not able to stay cached in RAM. At this point you should  
> ensure your machine is not swapping do disk, and at the very least  
> you should go through one of the many tuning guidelines available and  
> ensure you have allocated the appropriate amount of memory to  
> postgresql for your needs.
> 
> You may also want to consider dropping the indexes before you do the  
> INSERT and recreate them afterwards.
> 
> 
> 
> "Christian Rengstl" <[EMAIL PROTECTED]>  
> 13.07.06 8.37 Uhr >>>
>> Good morning list,
>>
>> the following query takes about 15 to 20 minutes for around 2  
>> million lines in the file myfile.txt, but with 8 million lines it  
>> takes around 5 hours and i just don't understand why there is such  
>> a huge discrepancy in performance.
>>
>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>
>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>> SELECT pid, smid, val1, val12, CAST(split_part((SELECT  
>> chr from public.temp_table LIMIT 1), '_', 2) as int2)
>>FROM public.temp_table;
>>
>> INSERT INTO public.values(smid, pos, chr)
>> SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from  
>> public.temp_table LIMIT 1), '_', 2) as int2)
>> FROM public.temp_table;
>>
>> I came up with this query, because i wanted to use the COPY command  
>> to load huge files into the db, but i don't want to load all the  
>> columns contained in the file in only one table but copy some of  
>> them into one table and some in a second table. As i found out with  
>> logging, the data is loaded into temp_table within 15 minutes, but  
>> to transfer it from the temp_table toagain only something like 10  
>> minutes. Can it be that the cast takes up so much more time than  
>> when reading and transferring 2 million lines?
> 
> 
> -- 
> Seeya...Q
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> 
>_  /  Quinton Dolan - [EMAIL PROTECTED] 
>__  __/  /   /   __/   /  /
>   /__  /   _//  /Gold Coast, QLD, Australia
>__/  __/ __/ /   /   -  /Ph: +61 419 729 806
>  ___  /
>  _\
> 
> 
> 
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
930

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Q

On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

The analyze is from the exact query and i dropped the indexes  
before the insert as well without imrpvement. The target table is  
as well completely empty and the insert is supposed to write, in  
this case, more or less 8 million lines in the table. There is a  
rule though, because i have inheritance table structure with one  
master table and around 20 child tables.


I would say the problem is in the rule.  Try doing the insert into a  
duplicate table with no rules or inheritance and see how long it takes.


Perhaps you should provide the actual schema of tables and rules that  
are involved in the query in question.



Q <[EMAIL PROTECTED]> wrote on 07/19/06 4:37 am:


On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:


now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...



These explain analyze results don't appear to be from the queries you
posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.


Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00
rows=8044000 width=
10) (actual time=0.002..0.002 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00
rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms



This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual INSERT
and not the SELECT part.

How many rows are in the target table and what indexes does it have?
Does it have any triggers, check constraints, or rules applied to it?
All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because the
indexes are not able to stay cached in RAM. At this point you should
ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines available and
ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do the
INSERT and recreate them afterwards.




"Christian Rengstl" <[EMAIL PROTECTED]>
13.07.06 8.37 Uhr >>>

Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
   FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command
to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out with
logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?



--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

   _  /  Quinton Dolan - [EMAIL PROTECTED]
   __  __/  /   /   __/   /  /
  /__  /   _//  /Gold Coast, QLD, Australia
   __/  __/ __/ /   /   -  /Ph: +61 419 729 806
 ___  /
 _\




---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster



--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Christian Rengstl
Well, i'll try that, but honestly i don't think it's the rule as the rule is 
really simple: it's just one simple integer comparison...

Q <[EMAIL PROTECTED]> wrote on 07/19/06 11:54 am:
> On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:
> 
>> The analyze is from the exact query and i dropped the indexes  
>> before the insert as well without imrpvement. The target table is  
>> as well completely empty and the insert is supposed to write, in  
>> this case, more or less 8 million lines in the table. There is a  
>> rule though, because i have inheritance table structure with one  
>> master table and around 20 child tables.
> 
> I would say the problem is in the rule.  Try doing the insert into a  
> duplicate table with no rules or inheritance and see how long it takes.
> 
> Perhaps you should provide the actual schema of tables and rules that  
> are involved in the query in question.
> 
>> Q <[EMAIL PROTECTED]> wrote on 07/19/06 4:37 am:
>>>
>>> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>>>
 now finally after a long time i have the query plan for the whole
 filled table. I hope somebody can explain me why it takes so much
 longer...
>>>
>>>
>>> These explain analyze results don't appear to be from the queries you
>>> posted previously. For these results to mean anything you need to
>>> include the EXACT queries you used to generate them.
>>>
 Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
 time=0.056..655772
 .273 rows=8044000 loops=1)
   One-Time Filter: ((split_part(($1)::text, '_'::text,
 2))::smallint = 1)
   InitPlan
 ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
 time=0.003..0.003 rows
 =1 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..194313.00
 rows=8044000 width=
 10) (actual time=0.002..0.002 rows=1 loops=1)
 ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
 time=0.006..0.007 rows
 =1 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..194313.00
 rows=8044000 width=
 10) (actual time=0.004..0.004 rows=1 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
 width=39) (act
 ual time=0.002..191672.344 rows=8044000 loops=1)
 Total runtime: 62259544.896 ms
>>>
>>>
>>> This is the query you want to be interested in, the others took no
>>> time at all.
>>>
>>> As a guess I would say the query is an INSERT INTO ... FROM
>>> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
>>> The majority of the time appears to be taken up on the actual INSERT
>>> and not the SELECT part.
>>>
>>> How many rows are in the target table and what indexes does it have?
>>> Does it have any triggers, check constraints, or rules applied to it?
>>> All these things can make the insert take longer as the number of
>>> rows you have already in the table increases.
>>>
>>> More than likely you have a target table with a LOT of rows and a
>>> bunch of indexes on it and your disks are being thrashed because the
>>> indexes are not able to stay cached in RAM. At this point you should
>>> ensure your machine is not swapping do disk, and at the very least
>>> you should go through one of the many tuning guidelines available and
>>> ensure you have allocated the appropriate amount of memory to
>>> postgresql for your needs.
>>>
>>> You may also want to consider dropping the indexes before you do the
>>> INSERT and recreate them afterwards.
>>>
>>>
>>>
>>> "Christian Rengstl" <[EMAIL PROTECTED]>
>>> 13.07.06 8.37 Uhr >>>
 Good morning list,

 the following query takes about 15 to 20 minutes for around 2
 million lines in the file myfile.txt, but with 8 million lines it
 takes around 5 hours and i just don't understand why there is such
 a huge discrepancy in performance.

 COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

 INSERT INTO public.master(pid,smid, val1, val2, chr)
 SELECT pid, smid, val1, val12, CAST(split_part((SELECT
 chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

 INSERT INTO public.values(smid, pos, chr)
 SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
 public.temp_table LIMIT 1), '_', 2) as int2)
 FROM public.temp_table;

 I came up with this query, because i wanted to use the COPY command
 to load huge files into the db, but i don't want to load all the
 columns contained in the file in only one table but copy some of
 them into one table and some in a second table. As i found out with
 logging, the data is loaded into temp_table within 15 minutes, but
 to transfer it from the temp_table toagain only something like 10
 minutes. Can it be that the cast takes up so much more time than
 when reading and transferring 2 million lines?
>>>
>>>
>>> -- 
>>> Seeya...Q
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Christian Rengstl
Obviously it had something to do with the rule, because now everything finished 
within 20 minutes. the problem is just that i don't really want to give up the 
inheritance design. is there a way to maintain the inheritance that doesn't 
cause this huge performance problem?

Q <[EMAIL PROTECTED]> wrote on 07/19/06 11:54 am:
> On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:
> 
>> The analyze is from the exact query and i dropped the indexes  
>> before the insert as well without imrpvement. The target table is  
>> as well completely empty and the insert is supposed to write, in  
>> this case, more or less 8 million lines in the table. There is a  
>> rule though, because i have inheritance table structure with one  
>> master table and around 20 child tables.
> 
> I would say the problem is in the rule.  Try doing the insert into a  
> duplicate table with no rules or inheritance and see how long it takes.
> 
> Perhaps you should provide the actual schema of tables and rules that  
> are involved in the query in question.
> 
>> Q <[EMAIL PROTECTED]> wrote on 07/19/06 4:37 am:
>>>
>>> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>>>
 now finally after a long time i have the query plan for the whole
 filled table. I hope somebody can explain me why it takes so much
 longer...
>>>
>>>
>>> These explain analyze results don't appear to be from the queries you
>>> posted previously. For these results to mean anything you need to
>>> include the EXACT queries you used to generate them.
>>>
 Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
 time=0.056..655772
 .273 rows=8044000 loops=1)
   One-Time Filter: ((split_part(($1)::text, '_'::text,
 2))::smallint = 1)
   InitPlan
 ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
 time=0.003..0.003 rows
 =1 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..194313.00
 rows=8044000 width=
 10) (actual time=0.002..0.002 rows=1 loops=1)
 ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
 time=0.006..0.007 rows
 =1 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..194313.00
 rows=8044000 width=
 10) (actual time=0.004..0.004 rows=1 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
 width=39) (act
 ual time=0.002..191672.344 rows=8044000 loops=1)
 Total runtime: 62259544.896 ms
>>>
>>>
>>> This is the query you want to be interested in, the others took no
>>> time at all.
>>>
>>> As a guess I would say the query is an INSERT INTO ... FROM
>>> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
>>> The majority of the time appears to be taken up on the actual INSERT
>>> and not the SELECT part.
>>>
>>> How many rows are in the target table and what indexes does it have?
>>> Does it have any triggers, check constraints, or rules applied to it?
>>> All these things can make the insert take longer as the number of
>>> rows you have already in the table increases.
>>>
>>> More than likely you have a target table with a LOT of rows and a
>>> bunch of indexes on it and your disks are being thrashed because the
>>> indexes are not able to stay cached in RAM. At this point you should
>>> ensure your machine is not swapping do disk, and at the very least
>>> you should go through one of the many tuning guidelines available and
>>> ensure you have allocated the appropriate amount of memory to
>>> postgresql for your needs.
>>>
>>> You may also want to consider dropping the indexes before you do the
>>> INSERT and recreate them afterwards.
>>>
>>>
>>>
>>> "Christian Rengstl" <[EMAIL PROTECTED]>
>>> 13.07.06 8.37 Uhr >>>
 Good morning list,

 the following query takes about 15 to 20 minutes for around 2
 million lines in the file myfile.txt, but with 8 million lines it
 takes around 5 hours and i just don't understand why there is such
 a huge discrepancy in performance.

 COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

 INSERT INTO public.master(pid,smid, val1, val2, chr)
 SELECT pid, smid, val1, val12, CAST(split_part((SELECT
 chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

 INSERT INTO public.values(smid, pos, chr)
 SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
 public.temp_table LIMIT 1), '_', 2) as int2)
 FROM public.temp_table;

 I came up with this query, because i wanted to use the COPY command
 to load huge files into the db, but i don't want to load all the
 columns contained in the file in only one table but copy some of
 them into one table and some in a second table. As i found out with
 logging, the data is loaded into temp_table within 15 minutes, but
 to transfer it from the temp_table toagain only something like 10
 minutes. Can it be that the cast takes up so much more

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Q

On 19/07/2006, at 8:49 PM, Christian Rengstl wrote:

Obviously it had something to do with the rule, because now  
everything finished within 20 minutes. the problem is just that i  
don't really want to give up the inheritance design. is there a way  
to maintain the inheritance that doesn't cause this huge  
performance problem?


That is hard to say unless you post the rule and table schema you are  
currently using.



Q <[EMAIL PROTECTED]> wrote on 07/19/06 11:54 am:

On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:


The analyze is from the exact query and i dropped the indexes
before the insert as well without imrpvement. The target table is
as well completely empty and the insert is supposed to write, in
this case, more or less 8 million lines in the table. There is a
rule though, because i have inheritance table structure with one
master table and around 20 child tables.


I would say the problem is in the rule.  Try doing the insert into a
duplicate table with no rules or inheritance and see how long it  
takes.


Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.


Q <[EMAIL PROTECTED]> wrote on 07/19/06 4:37 am:


On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:


now finally after a long time i have the query plan for the whole
filled table. I hope somebody can explain me why it takes so much
longer...



These explain analyze results don't appear to be from the  
queries you

posted previously. For these results to mean anything you need to
include the EXACT queries you used to generate them.


Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text,
2))::smallint = 1)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00
rows=8044000 width=
10) (actual time=0.002..0.002 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00
rows=8044000 width=
10) (actual time=0.004..0.004 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms



This is the query you want to be interested in, the others took no
time at all.

As a guess I would say the query is an INSERT INTO ... FROM
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
The majority of the time appears to be taken up on the actual  
INSERT

and not the SELECT part.

How many rows are in the target table and what indexes does it  
have?
Does it have any triggers, check constraints, or rules applied  
to it?

All these things can make the insert take longer as the number of
rows you have already in the table increases.

More than likely you have a target table with a LOT of rows and a
bunch of indexes on it and your disks are being thrashed because  
the
indexes are not able to stay cached in RAM. At this point you  
should

ensure your machine is not swapping do disk, and at the very least
you should go through one of the many tuning guidelines  
available and

ensure you have allocated the appropriate amount of memory to
postgresql for your needs.

You may also want to consider dropping the indexes before you do  
the

INSERT and recreate them afterwards.



"Christian Rengstl" <[EMAIL PROTECTED] 
regensburg.de>

13.07.06 8.37 Uhr >>>

Good morning list,

the following query takes about 15 to 20 minutes for around 2
million lines in the file myfile.txt, but with 8 million lines it
takes around 5 hours and i just don't understand why there is such
a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT
chr from public.temp_table LIMIT 1), '_', 2) as int2)
   FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr  
from

public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY  
command

to load huge files into the db, but i don't want to load all the
columns contained in the file in only one table but copy some of
them into one table and some in a second table. As i found out  
with

logging, the data is loaded into temp_table within 15 minutes, but
to transfer it from the temp_table toagain only something like 10
minutes. Can it be that the cast takes up so much more time than
when reading and transferring 2 million lines?



--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

   _  /  Quinton Dolan -  
[EMAIL PROTECTED]

   __  __/  /   /   __/  

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Christian Rengstl
So here's the master table including the rules:

entry_no int8 NOT NULL DEFAULT nextval('public.master_seq'::regclass),
  pid varchar(15) NOT NULL,
  val_1 varchar(1),
  val_2 varchar(1),
  chr int2 NOT NULL, 
  aendat timestamp DEFAULT now(),
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
  CONSTRAINT "UNIQUE_MASTER" UNIQUE (pid, entry_no)

CREATE OR REPLACE RULE "INSERT_INTO_1" AS
ON INSERT TO public.master
   WHERE new.chr = 1 DO INSTEAD  INSERT INTO public.table_1 (entry_no, pid, 
val_1, val_2, chr, aendat, aennam) 
  VALUES (new.entry_no, new.pid, new.val_1, new.val_2, new.chr, new.aendat, 
new.aennam);

Like this i have around 20 rules so far, but there might be more later on. The 
children tables are so far exactly as the master table.

Q <[EMAIL PROTECTED]> wrote on 07/19/06 1:52 pm:
> On 19/07/2006, at 8:49 PM, Christian Rengstl wrote:
> 
>> Obviously it had something to do with the rule, because now  
>> everything finished within 20 minutes. the problem is just that i  
>> don't really want to give up the inheritance design. is there a way  
>> to maintain the inheritance that doesn't cause this huge  
>> performance problem?
> 
> That is hard to say unless you post the rule and table schema you are  
> currently using.
> 
>> Q <[EMAIL PROTECTED]> wrote on 07/19/06 11:54 am:
>>> On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:
>>>
 The analyze is from the exact query and i dropped the indexes
 before the insert as well without imrpvement. The target table is
 as well completely empty and the insert is supposed to write, in
 this case, more or less 8 million lines in the table. There is a
 rule though, because i have inheritance table structure with one
 master table and around 20 child tables.
>>>
>>> I would say the problem is in the rule.  Try doing the insert into a
>>> duplicate table with no rules or inheritance and see how long it  
>>> takes.
>>>
>>> Perhaps you should provide the actual schema of tables and rules that
>>> are involved in the query in question.
>>>
 Q <[EMAIL PROTECTED]> wrote on 07/19/06 4:37 am:
>
> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>
>> now finally after a long time i have the query plan for the whole
>> filled table. I hope somebody can explain me why it takes so much
>> longer...
>
>
> These explain analyze results don't appear to be from the  
> queries you
> posted previously. For these results to mean anything you need to
> include the EXACT queries you used to generate them.
>
>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
>> time=0.056..655772
>> .273 rows=8044000 loops=1)
>>   One-Time Filter: ((split_part(($1)::text, '_'::text,
>> 2))::smallint = 1)
>>   InitPlan
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>> time=0.003..0.003 rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00
>> rows=8044000 width=
>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>> ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>> time=0.006..0.007 rows
>> =1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00
>> rows=8044000 width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
>> width=39) (act
>> ual time=0.002..191672.344 rows=8044000 loops=1)
>> Total runtime: 62259544.896 ms
>
>
> This is the query you want to be interested in, the others took no
> time at all.
>
> As a guess I would say the query is an INSERT INTO ... FROM
> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
> The majority of the time appears to be taken up on the actual  
> INSERT
> and not the SELECT part.
>
> How many rows are in the target table and what indexes does it  
> have?
> Does it have any triggers, check constraints, or rules applied  
> to it?
> All these things can make the insert take longer as the number of
> rows you have already in the table increases.
>
> More than likely you have a target table with a LOT of rows and a
> bunch of indexes on it and your disks are being thrashed because  
> the
> indexes are not able to stay cached in RAM. At this point you  
> should
> ensure your machine is not swapping do disk, and at the very least
> you should go through one of the many tuning guidelines  
> available and
> ensure you have allocated the appropriate amount of memory to
> postgresql for your needs.
>
> You may also want to consider dropping the indexes before you do  
> the
> INSERT and recreate them afterwards.
>
>
>
> "Christian Rengstl" <[EMAIL PROTECTED] 
> regensburg.de>
> 13.07.06 8.37 Uhr >>>
>> Good morning list,
>>
>> the fo

Re: [GENERAL] Performance problem with query

2006-07-19 Thread Q

On 19/07/2006, at 10:03 PM, Christian Rengstl wrote:


So here's the master table including the rules:

entry_no int8 NOT NULL DEFAULT nextval('public.master_seq'::regclass),
  pid varchar(15) NOT NULL,
  val_1 varchar(1),
  val_2 varchar(1),
  chr int2 NOT NULL,
  aendat timestamp DEFAULT now(),
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
  CONSTRAINT "UNIQUE_MASTER" UNIQUE (pid, entry_no)

CREATE OR REPLACE RULE "INSERT_INTO_1" AS
ON INSERT TO public.master
   WHERE new.chr = 1 DO INSTEAD  INSERT INTO public.table_1  
(entry_no, pid, val_1, val_2, chr, aendat, aennam)
  VALUES (new.entry_no, new.pid, new.val_1, new.val_2, new.chr,  
new.aendat, new.aennam);


Like this i have around 20 rules so far, but there might be more  
later on. The children tables are so far exactly as the master table.


What about the children? Do they have the same indexes?

You could try adding an 'ORDER BY chr' to your long running INSERT  
INTO ... SELECT ... query.



Obviously it had something to do with the rule, because now
everything finished within 20 minutes. the problem is just that i
don't really want to give up the inheritance design. is there a way
to maintain the inheritance that doesn't cause this huge
performance problem?


When you say "now everything finished within 20 minutes", what did  
you actually do to achieve this?



--
Seeya...Q

   -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

  _  /  Quinton Dolan - [EMAIL PROTECTED]
  __  __/  /   /   __/   /  /
 /__  /   _//  /Gold Coast, QLD, Australia
  __/  __/ __/ /   /   -  /Ph: +61 419 729 806
___  /
_\




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Antw: Re: [GENERAL] Performance problem with query

2006-07-14 Thread Christian Rengstl
Hi,

somehow my reply yesterday got lost, but nevertheless here comes the
explain analyze again. It's the explain of the operation that causes
this huge performance discrepancy. Unfortunately i had to perform the
explain analyze with an empty temp_table, because after the whole
operation is done, i delete the data again to save some space.


QUE
RY PLAN



-
 Result  (cost=0.12..16.95 rows=390 width=108) (actual time=0.025..0.025
rows=0
loops=1)
   One-Time Filter: split_part(($1)::text, '_'::text, 2))::smallint
= 1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22)
IS NOT TR
UE))
   InitPlan
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (never executed)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (nev
er executed)
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (actual
time=0.005..0.005 rows
=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (act
ual time=0.001..0.001 rows=0 loops=1)
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (actual
time=0.002..0.002 rows
=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (act
ual time=0.000..0.000 rows=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390 width=108)
(actual tim
e=0.000..0.000 rows=0 loops=1)
 Total runtime: 0.424 ms

 Result  (cost=0.08..16.90 rows=390 width=108) (actual time=0.005..0.005
rows=0
loops=1)
   One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint =
1)
   InitPlan
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (never executed)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (nev
er executed)
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (actual
time=0.002..0.002 rows
=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (act
ual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390 width=108)
(never exec
uted)
 Total runtime: 0.267 ms

 Result  (cost=0.08..16.90 rows=390 width=108) (actual time=0.005..0.005
rows=0
loops=1)
   One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint =
22)
   InitPlan
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (never executed)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (nev
er executed)
 ->  Limit  (cost=0.00..0.04 rows=1 width=28) (actual
time=0.002..0.002 rows
=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390
width=28) (act
ual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on temp_table  (cost=0.00..13.90 rows=390 width=108)
(never exec
uted)
 Total runtime: 0.189 ms
(31 Zeilen)




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Antw: Re: [GENERAL] Performance problem with query

2006-07-18 Thread Christian Rengstl
now finally after a long time i have the query plan for the whole filled
table. I hope somebody can explain me why it takes so much longer...

   QUERY PLAN



-
Result  (cost=0.09..254643.09 rows=8044000 width=39) (actual
time=0.043..0.043
rows=0 loops=1)
  One-Time Filter: split_part(($1)::text, '_'::text, 2))::smallint =
1) IS
NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22)
IS NOT TR
UE))
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (never executed)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.006 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (actual time=0.004..0.004 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=39) (nev
er executed)
Total runtime: 0.238 ms

Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=0.056..655772
.273 rows=8044000 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint =
1)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.003..0.003 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (actual time=0.002..0.002 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.006..0.007 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (actual time=0.004..0.004 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=39) (act
ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms

Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
time=1.245..1.245
rows=0 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint =
22)
  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (never executed)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (never executed)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
time=0.013..0.014 rows
=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=
10) (actual time=0.009..0.009 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
width=39) (nev
er executed)
Total runtime: 22.270 ms
(31 Zeilen)



>>> "Merlin Moncure" <[EMAIL PROTECTED]> 13.07.06 15.20 Uhr >>>
On 7/13/06, Christian Rengstl
<[EMAIL PROTECTED]> wrote:
> Good morning list,
>
> the following query takes about 15 to 20 minutes for around 2 million
lines in the file myfile.txt, but with 8 million lines it takes around 5
hours and i just don't understand why there is such a huge discrepancy
in performance.


on the surface it doesn't make sense, can you post an explain analyze?

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

>>> "Merlin Moncure" <[EMAIL PROTECTED]> 13.07.06 15.20 Uhr >>>
On 7/13/06, Christian Rengstl
<[EMAIL PROTECTED]> wrote:
> Good morning list,
>
> the following query takes about 15 to 20 minutes for around 2 million
lines in the file myfile.txt, but with 8 million lines it takes around 5
hours and i just don't understand why there is such a huge discrepancy
in performance.


on the surface it doesn't make sense, can you post an explain analyze?

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Antw: [GENERAL] Performance problem with query

2006-07-18 Thread Q


On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:

now finally after a long time i have the query plan for the whole  
filled table. I hope somebody can explain me why it takes so much  
longer...



These explain analyze results don't appear to be from the queries you  
posted previously. For these results to mean anything you need to  
include the EXACT queries you used to generate them.


Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual  
time=0.056..655772

.273 rows=8044000 loops=1)
  One-Time Filter: ((split_part(($1)::text, '_'::text,  
2))::smallint = 1)

  InitPlan
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual  
time=0.003..0.003 rows

=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00  
rows=8044000 width=

10) (actual time=0.002..0.002 rows=1 loops=1)
->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual  
time=0.006..0.007 rows

=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00  
rows=8044000 width=

10) (actual time=0.004..0.004 rows=1 loops=1)
  ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000  
width=39) (act

ual time=0.002..191672.344 rows=8044000 loops=1)
Total runtime: 62259544.896 ms



This is the query you want to be interested in, the others took no  
time at all.


As a guess I would say the query is an INSERT INTO ... FROM  
SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.  
The majority of the time appears to be taken up on the actual INSERT  
and not the SELECT part.


How many rows are in the target table and what indexes does it have?  
Does it have any triggers, check constraints, or rules applied to it?  
All these things can make the insert take longer as the number of  
rows you have already in the table increases.


More than likely you have a target table with a LOT of rows and a  
bunch of indexes on it and your disks are being thrashed because the  
indexes are not able to stay cached in RAM. At this point you should  
ensure your machine is not swapping do disk, and at the very least  
you should go through one of the many tuning guidelines available and  
ensure you have allocated the appropriate amount of memory to  
postgresql for your needs.


You may also want to consider dropping the indexes before you do the  
INSERT and recreate them afterwards.




"Christian Rengstl" <[EMAIL PROTECTED]>  
13.07.06 8.37 Uhr >>>

Good morning list,

the following query takes about 15 to 20 minutes for around 2  
million lines in the file myfile.txt, but with 8 million lines it  
takes around 5 hours and i just don't understand why there is such  
a huge discrepancy in performance.


COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT  
chr from public.temp_table LIMIT 1), '_', 2) as int2)

   FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from  
public.temp_table LIMIT 1), '_', 2) as int2)

FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command  
to load huge files into the db, but i don't want to load all the  
columns contained in the file in only one table but copy some of  
them into one table and some in a second table. As i found out with  
logging, the data is loaded into temp_table within 15 minutes, but  
to transfer it from the temp_table toagain only something like 10  
minutes. Can it be that the cast takes up so much more time than  
when reading and transferring 2 million lines?



--
Seeya...Q

   -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

  _  /  Quinton Dolan - [EMAIL PROTECTED]
  __  __/  /   /   __/   /  /
 /__  /   _//  /Gold Coast, QLD, Australia
  __/  __/ __/ /   /   -  /Ph: +61 419 729 806
___  /
_\




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster