Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-08 Thread Jan Lentfer
 

Am 07.11.2013 12:42, schrieb Dilip kumar: 

> This patch
implementing the following TODO item 
> 
> Allow parallel cores to be
used by vacuumdb 
> 
>
http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com [1]

> 
> Like Parallel pg_dump, vacuumdb is provided with the option to run
the vacuum of multiple tables in parallel. [ VACUUMDB –J ] 
> 
> 1. One
new option is provided with vacuumdb to give the number of workers. 
>

> 2. All worker will be started in beginning and all will be waiting
for the vacuum instruction from the master. 
> 
> 3. Now, if table list
is provided in vacuumdb command using -t then, it will send the vacuum
of one table to one of the IDLE worker, next table to next IDLE worker
and so on. 
> 
> 4. If vacuum is given for one DB then, it will execute
select on pg_class to get the table list and fetch the table name one by
one and also assign the vacuum responsibility to IDLE workers. 
> 
>
[...]

For this use case, would it make sense to queue work (tables) in
order of their size, starting on the largest one? 

For the case where
you have tables of varying size this would lead to a reduced overall
processing time as it prevents large (read: long processing time) tables
to be processed in the last step. While processing large tables at first
and filling up "processing slots/jobs" when they get free with smaller
tables one after the other would safe overall execution time. 

Regards


Jan 

-- 
professional: http://www.oscar-consult.de



Links:
--
[1]
http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com


Re: [HACKERS] Changing pg_dump default file format

2013-11-07 Thread Jan Lentfer
Am 07.11.2013 19:08, schrieb Joshua D. Drake:>
> On 11/07/2013 10:00 AM, Josh Berkus wrote:
>> If we wanted to change the defaults, I think it would be easier to
>> create a separate bin name (e.g. pg_backup) than to change the existing
>> parameters for pg_dump.
> 
> I am not opposed to that. Allow pg_dump to be what it is, and create a
> pg_backup?
> 
> JD


I would definitely agree to having "one" backup utility and making -Fc
the default for SQL dumps. One could even argue if the functionality of
pg_basebackup should be part of that too. But I would be fine with
having two distinct utilities (one for file level backups and one for
logical/SQL level backups), too.

Btw, how hard would it be, to have pg_restore and now also pg_dump run
with -j option do some ordering of work by size of e.g. the tables? E.g.
if you run with -j4 it would make sense to start working on the largest
tables (and it's indexes) first and continue by descending in t´size to
keep all available "slots" filled as good as possible. Just at though.

Jan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] swapcache-style cache?

2012-02-27 Thread Jan Lentfer

Am 23.02.2012 21:57, schrieb Greg Smith:

On 02/22/2012 05:31 PM, james wrote:

Has anyone considered managing a system like the DragonFLY swapcache for
a DBMS like PostgreSQL?

ie where the admin can assign drives with good random read behaviour
(but perhaps also-ran random write) such as SSDs to provide a cache for
blocks that were dirtied, with async write that hopefully writes them
out before they are forcibly discarded.


We know that battery-backed write caches are extremely effective for
PostgreSQL writes. I see most of these tiered storage ideas as acting
like a big one of those, which seems to hold in things like SAN storage
that have adopted this sort of technique already. A SSD is quite large
relative to a typical BBWC.

[...]


-Ultimately all this data needs to make it out to real disk. The funny
thing about caches is that no matter how big they are, you can easily
fill them up if doing something faster than the underlying storage can
handle.


[...]


I don't think the idea of a swapcache is without merit; there's surely
some applications that will benefit from it. It's got a lot of potential
as a way to absorb short-term bursts of write activity. And there are
some applications that could benefit from having a second tier of read
cache, not as fast as RAM but larger and faster than real disk seeks. In
all of those potential win cases, though, I don't see why the OS
couldn't just manage the whole thing for us.


First off, thank's very much for mentioning DragonFly's swapcache on 
this mailing list, which takes the burden off me/us to self-advertise 
this feature :)


But swapcache is clearly not meant or designed to speed up any write 
activity by caching writes and delaying the write to the "target 
storage" to a later point in time. Swapcache does not affect writes in 
any way, actually.
Swapcache does its writing when a clean VM page hits the inactive VM 
page queue. VM pages related to filesystem writes are dirty, the write 
occurs normally, then they become clean.  But they still have to cycle 
into the VM page inactive queue before swapcache will touch them (write 
them out to swap).


So, basically it is designed to speed up Metadata reads, and if 
configured to do so, data reads.


So, it can take some read load burden of the disk subsystem and free the 
disk subsystem for more write activity, but that would be just a side 
effect, not a design goal.


And, yes.. it does effect pgsql performance on read loads seriously.

See BSD Mag 5/2011
http://bsdmag.org/magazine/1691-embedded-bsd-freebsd-alix

and
http://www.shiningsilence.com/dbsdlog/2011/04/12/7586.html

Jan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Jan Lentfer

Am 13.01.2012 22:50, schrieb Josh Berkus:

It occurs to me that I would find it quite personally useful if the
vacuumdb utility was multiprocess capable.

For example, just today I needed to manually analyze a database with
over 500 tables, on a server with 24 cores.   And I needed to know when
the analyze was done, because it was part of a downtime.  I had to
resort to a python script.

I'm picturing doing this in the simplest way possible: get the list of
tables and indexes, divide them by the number of processes, and give
each child process its own list.

Any reason not to hack on this for 9.3?


I don't see any reason not to do it, but plenty to do it.
Right now I have systems hosting many databases, I need to vacuum full 
from time to time. I have wrapped vacuumdb with a shell script to 
actually use all the capacity that is available. A vacuumdb -faz just 
isn't that usefull on large machines anymore.


Jan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql expanded auto

2011-11-01 Thread Jan Lentfer
I have not tried the patch (yet), but Informix'sl dbacess would do about 
the same - and it's something I really missed.


Jan
--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers