Re: [PERFORM] Very slow update statement on 40mio rows

2013-02-26 Thread Florian Schröck
Hello Merlin,
thanks for the feedback, I forwarded this to my developer, this is an
interesting approach.


-- 

Best regards

Florian Schröck

On 02/19/2013 09:04 PM, Merlin Moncure wrote:
> On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck  wrote:
>> Hello Kevin,
>> not updating every row which doesn't need the update solved the problem!
>> Your query took only 1 minute. :)
>>
>> Thank you so much for the fast response, have a great weekend!
>>
>> PS: When you switch to "TEXT" on the explain URL you can see the final
>> runtime which was 66 minutes with the original statement.
>>
>> Best regards,
>> Florian
>>
>> On 02/15/2013 03:59 PM, Kevin Grittner wrote:
>>> Florian Schröck  wrote:
>>>
 UPDATE BackupFiles
SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
cStatus='NEW'::StatusT, bOnSetBlue=false,
bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';

 Explain analyze: http://explain.depesz.com/s/8y5
 The statement takes 60-90 minutes.
>>> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
>>> seconds.
>>>
 I tried to optimize the settings but until now without success.

 Can we optimize this update statement somehow? Do you have any
 other ideas?
>>> Are there any rows which would already have the values that you are
>>> setting?  If so, it would be faster to skip those by using this
>>> query:
>>>
>>> UPDATE BackupFiles
>>>   SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>>   cStatus='NEW'::StatusT, bOnSetBlue=false,
>>>   bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>>   WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
>>> AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
>>>  OR cStatus <> 'NEW'::StatusT
>>>  OR bOnSetBlue IS DISTINCT FROM false
>>>  OR bOnSetYellow IS DISTINCT FROM false
>>>  OR nLastBackupTS <> '0001-01-01 00:00:00');
>>>
>>> Another way to accomplish this is with the
>>> suppress_redundant_updates_trigger() trigger function:
>>>
>>> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html
> if the number of rows you actually update is not very large relative
> to size of the table, just for fun, try this:
>
>
> CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles)
> RETURNS BOOL AS
> $$
>   SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE')
>   AND
> (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter,
> ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS)
> IS DISTINCT FROM /* simple != will suffice if values are never null */
>(0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00');
> $$ LANGUAGE SQL IMMUTABLE;
>
> CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles))
>   WHERE BakupFilesCandidateReset(BackupFiles);
>
>
> SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles);
> UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles);
> etc
>
> idea here is to maintain partial boolean index representing candidate
> records to update.  plus it's nifty.   this is basic mechanism that
> can be used as foundation for very fast push pull queues.
>
> merlin
>



Re: [PERFORM] Very slow update statement on 40mio rows

2013-02-19 Thread Merlin Moncure
On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck  wrote:
> Hello Kevin,
> not updating every row which doesn't need the update solved the problem!
> Your query took only 1 minute. :)
>
> Thank you so much for the fast response, have a great weekend!
>
> PS: When you switch to "TEXT" on the explain URL you can see the final
> runtime which was 66 minutes with the original statement.
>
> Best regards,
> Florian
>
> On 02/15/2013 03:59 PM, Kevin Grittner wrote:
>> Florian Schröck  wrote:
>>
>>> UPDATE BackupFiles
>>>SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>>cStatus='NEW'::StatusT, bOnSetBlue=false,
>>>bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>>WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';
>>>
>>> Explain analyze: http://explain.depesz.com/s/8y5
>>> The statement takes 60-90 minutes.
>> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
>> seconds.
>>
>>> I tried to optimize the settings but until now without success.
>>>
>>> Can we optimize this update statement somehow? Do you have any
>>> other ideas?
>> Are there any rows which would already have the values that you are
>> setting?  If so, it would be faster to skip those by using this
>> query:
>>
>> UPDATE BackupFiles
>>   SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>   cStatus='NEW'::StatusT, bOnSetBlue=false,
>>   bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>   WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
>> AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
>>  OR cStatus <> 'NEW'::StatusT
>>  OR bOnSetBlue IS DISTINCT FROM false
>>  OR bOnSetYellow IS DISTINCT FROM false
>>  OR nLastBackupTS <> '0001-01-01 00:00:00');
>>
>> Another way to accomplish this is with the
>> suppress_redundant_updates_trigger() trigger function:
>>
>> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html

if the number of rows you actually update is not very large relative
to size of the table, just for fun, try this:


CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles)
RETURNS BOOL AS
$$
  SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE')
  AND
(($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter,
($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS)
IS DISTINCT FROM /* simple != will suffice if values are never null */
   (0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00');
$$ LANGUAGE SQL IMMUTABLE;

CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles))
  WHERE BakupFilesCandidateReset(BackupFiles);


SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles);
UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles);
etc

idea here is to maintain partial boolean index representing candidate
records to update.  plus it's nifty.   this is basic mechanism that
can be used as foundation for very fast push pull queues.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow update statement on 40mio rows

2013-02-15 Thread Florian Schröck
Hello Kevin,
not updating every row which doesn't need the update solved the problem!
Your query took only 1 minute. :)

Thank you so much for the fast response, have a great weekend!

PS: When you switch to "TEXT" on the explain URL you can see the final
runtime which was 66 minutes with the original statement.

Best regards,
Florian

On 02/15/2013 03:59 PM, Kevin Grittner wrote:
> Florian Schröck  wrote:
>
>> UPDATE BackupFiles
>>SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>cStatus='NEW'::StatusT, bOnSetBlue=false,
>>bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';
>>
>> Explain analyze: http://explain.depesz.com/s/8y5
>> The statement takes 60-90 minutes.
> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
> seconds.
>
>> I tried to optimize the settings but until now without success.
>>
>> Can we optimize this update statement somehow? Do you have any
>> other ideas?
> Are there any rows which would already have the values that you are
> setting?  If so, it would be faster to skip those by using this
> query:
>
> UPDATE BackupFiles
>   SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>   cStatus='NEW'::StatusT, bOnSetBlue=false,
>   bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>   WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
> AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
>  OR cStatus <> 'NEW'::StatusT
>  OR bOnSetBlue IS DISTINCT FROM false
>  OR bOnSetYellow IS DISTINCT FROM false
>  OR nLastBackupTS <> '0001-01-01 00:00:00');
>
> Another way to accomplish this is with the
> suppress_redundant_updates_trigger() trigger function:
>
> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html
>



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow update statement on 40mio rows

2013-02-15 Thread Kevin Grittner
Florian Schröck  wrote:

> UPDATE BackupFiles
>   SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>   cStatus='NEW'::StatusT, bOnSetBlue=false,
>   bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>   WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';
>
> Explain analyze: http://explain.depesz.com/s/8y5
> The statement takes 60-90 minutes.

The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
seconds.

> I tried to optimize the settings but until now without success.
>
> Can we optimize this update statement somehow? Do you have any
> other ideas?

Are there any rows which would already have the values that you are
setting?  If so, it would be faster to skip those by using this
query:

UPDATE BackupFiles
  SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
  cStatus='NEW'::StatusT, bOnSetBlue=false,
  bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
  WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
    AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
 OR cStatus <> 'NEW'::StatusT
 OR bOnSetBlue IS DISTINCT FROM false
 OR bOnSetYellow IS DISTINCT FROM false
 OR nLastBackupTS <> '0001-01-01 00:00:00');

Another way to accomplish this is with the
suppress_redundant_updates_trigger() trigger function:

http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very slow update statement on 40mio rows

2013-02-15 Thread Florian Schröck

  
  
Hi!
I'm new to this mailinglist and I'm new to postgres as well. It is
about our own backup software (java); we switched the DB from MySQL
to postgres and we need some help.
The backup database holds all files from the server in the database.
On my testing platform there are about 40 mio rows and the DB size
grew to about 70 GB (the same DB was about 11 GB with MySQL, but
this is another issue).

Before each backup job, there is a reset to get a consistent state
of all files:
UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0,
  nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false,
  bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE
  cStatus='NEW' OR cStatus='WRITING'OR cStatus='ONTAPE';

Explain analyze: http://explain.depesz.com/s/8y5
The statement takes 60-90 minutes. In MySQL the same statement takes
about 2 minutes on the same HW.
I tried to optimize the settings but until now without success. 

Can we optimize this update statement somehow? Do you have any other
ideas?

Any help is appreciated! Thank you!

My current config:
shared_buffers = 2GB
work_mem = 16MB 
wal_buffers = 16MB
checkpoint_segments = 45
random_page_cost = 2.0
effective_cache_size = 6GB

HW:
2x Intel E5405 @ 2.00GHz
8 GB RAM
3ware 9650SE-16ML RAID Controller, all caches enabled
DB is on a RAID6 with 14x 1TB (since the DB partition on the RAID1
was too small)

SW:
Debian Squeeze 6.0 with Kernel 3.5.4
Postgres 8.4.13 (standard Debian package)

Table:
+---+-+-+
|    Column |    Type
  | 
  Modifiers  |
+---+-+-+
| _rowid    | bigint  | not null
  default nextval('backupfiles__rowid_seq'::regclass)    |
| cfilename | bytea   | not
  null    |
| nfilesize | bigint  | not null
  default 0::bigint  |
| nfilectimets  | timestamp without time zone | not null
  default '1970-01-01 00:00:00'::timestamp without time zone |
| ntapenr   | integer | not null
  default 0  |
| nafiocounter  | bigint  | not null
  default 0::bigint  |
| nblockcounter | bigint  | not null
  default 0::bigint  |
| cstatus   | statust | not null
  default 'NEW'::statust |
| bonsetblue    | boolean | default
  false   |
| bonsetyellow  | boolean | default
  false   |
| nlastbackupts | timestamp without time zone | not null
  default '1970-01-01 00:00:00'::timestamp without time zone |
+---+-+-+
Indexes:
    "backupfiles_pkey" PRIMARY KEY, btree (_rowid)
    "cfilename_index" btree (cfilename)
    "cstatus_index" btree (cstatus)
    "nfilectimets_index" btree (nfilectimets)
    "ntapenr_index" btree (ntapenr)


Example row:
++-+---+-+-+--+---+-++--+-+
| _rowid |  cfilename  | nfilesize |   
  nfilectimets | ntapenr | nafiocounter | nblockcounter |
  cstatus | bonsetblue | bonsetyellow |    nlastbackupts    |
++-+---+-+-+--+---+-++--+-+
|  1 | /dicom/log/datadir_init.log |  1790 |
  2013-01-30 14:02:48 |   0 |    0 | 0 |
  NEW | f  | f    | 0001-01-01 00:00:00 |
++-+---+-+-+--+---+-++--+-+




  

  

  
-- 
  
  Mit freundlichen Grüßen
Best regards

Florian Schröck
IT Services

aycan Digitalsysteme GmbH
Innere Aumühlstr. 5