Re: [PERFORM] Backup taking long time !!!
Hello, Increments in pgbackrest are done on file level which is not really efficient. We have done parallelism, compression and page-level increments (9.3+) in barman fork [1], but unfortunately guys from 2ndquadrant-it don’t hurry to work on it. We're looking at page-level incremental backup in pgbackrest also. For larger systems, we've not heard too much complaining about it being file-based though, which is why it hasn't been a priority. Of course, the OP is on 9.1 too, so. Well, we have forked barman and made everything from the above just because we needed ~ 2 PB of disk space for storing backups for our ~ 300 TB of data. (Our recovery window is 7 days) And on 5 TB database it took a lot of time to make/restore a backup. I just have around 11 TB but switched to ZFS based backups only. I'm using snapshots therefore which gives some flexibility. I can rolback them, i can just clone it and work with a full copy as a different cluster (and just the differences are stored) and i can send them incrementally to other servers. This is very fine for my use cases but it doesn't fit everything of course. Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL on ZFS: performance tuning
On 29.07.2016 08:30, Tomas Vondra wrote: On 07/29/2016 08:04 AM, trafdev wrote: Hi. I have an OLAP-oriented DB (light occasional bulk writes and heavy aggregated selects over large periods of data) based on Postgres 9.5.3. Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, mirror). The largest table is 13GB (with a 4GB index on it), other tables are 4, 2 and less than 1GB. After reading a lot of articles and "howto-s" I've collected following set of tweaks and hints: ZFS pools creation: zfs create zroot/ara/sqldb zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql NAME PROPERTY VALUE SOURCE zroot/ara/sqldb/pgsql primarycache all local zroot/ara/sqldb/pgsql recordsize8Klocal zroot/ara/sqldb/pgsql logbias latency local zroot/ara/sqldb/pgsql compression lz4 inherited from zroot L2ARC is disabled VDEV cache is disabled pgsql -c "mkdir /ara/sqldb/pgsql/data_ix" pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data" /etc/sysctl.conf vfs.zfs.metaslab.lba_weighting_enabled=0 postgresql.conf: listen_addresses = '*' max_connections = 100 shared_buffers = 16GB effective_cache_size = 48GB It may not be a problem for your workload, but this effective_cache_size value is far too high. May i asked why? ZFS in default caches your size of RAM minus 1 GB. Getting the shared buffer from the 64 GB RAM i would asume 47 GB would be a better value. But this would not be far too high. So please can you explain this? Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql 8.4 optimize for heavy select load
On 19.09.2016 10:23, Mark Kirkwood wrote: On 19/09/16 19:40, Job wrote: Hello, i would please like to have some suggestions to optimize Postgres 8.4 for a very heavy number of select (with join) queries. The queries read data, very rarely they write. We probably need to see schema and query examples to help you (with EXPLAIN ANALYZE output). Also - err 8.4 - I (and others probably) will recommend you upgrade to a more recent (and supported for that matter) version - currently 9.5/9.6 - lots of performance improvements you are missing out on! Especially since 8.4 is out of support for 2 years: https://www.postgresql.org/support/versioning/ Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] less than 2 sec for response - possible?
On 06.07.2016 17:06, trafdev wrote: Wondering what are your CPU\RAM characteristics? Intel Core i7-2600 Quad Core 32 GB DDR3 RAM 2x 3 TB SATA III HDD HDD is: Model Family: Seagate Barracuda XT Device Model: ST33000651AS Firmware Version: CC45 User Capacity:3,000,592,982,016 bytes [3.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate:7200 rpm Form Factor: 3.5 inches Device is:In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) RAM is for example: Handle 0x002D, DMI type 17, 28 bytes Memory Device Array Handle: 0x002A Error Information Handle: No Error Total Width: 64 bits Data Width: 64 bits Size: 8192 MB Form Factor: DIMM Set: None Locator: DIMM0 Bank Locator: BANK0 Type: DDR3 Type Detail: Synchronous Speed: 1333 MHz Manufacturer: Undefined Serial Number: 4430793 Asset Tag: AssetTagNum0 Part Number: CT102464BA160B.C16 Rank: 2 OS is FreeBSD 10.3. Do you need more information? Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] less than 2 sec for response - possible?
On 05.07.2016 17:35, trafdev wrote: > [..] Without TIMESTAMP cast: QUERY PLAN HashAggregate (cost=1405666.90..1416585.93 rows=335970 width=86) (actual time=4797.272..4924.015 rows=126533 loops=1) " Group Key: subid, sid" Buffers: shared hit=1486949 -> Index Scan using ix_feed_sub_aid_date on feed_sub (cost=0.44..1313275.32 rows=3359694 width=86) (actual time=0.019..1783.104 rows=3588376 loops=1) Index Cond: ((aid = 3) AND (date >= '2016-06-01'::date) AND (date <= '2016-06-30'::date)) Buffers: shared hit=1486949 Planning time: 0.164 ms Execution time: 4941.259 ms I need to be sure it's a physical limitation of a Postgresql (when all data is in a memory and fetching\joining 1.5 mln of rows can't be done in less than 2-3 seconds) and there is no way to improve it. It could be a physical limitation of your hardware. I just did a short test on one of my databases: Aggregate (cost=532018.95..532018.96 rows=1 width=0) (actual time=3396.689..3396.689 rows=1 loops=1) Buffers: shared hit=155711 -> Index Only Scan using requests_request_time_idx on requests (cost=0.43..493109.90 rows=15563620 width=0) (actual time=0.021..2174.614 rows=16443288 loops=1) Index Cond: ((request_time >= '2016-07-01 00:00:00+00'::timestamp with time zone) AND (request_time <= '2017-07-06 00:00:00+00'::timestamp with time zone)) Heap Fetches: 31254 Buffers: shared hit=155711 Planning time: 0.143 ms Execution time: 3396.715 ms (8 rows) As you can see i can get 16.4 Mio rows within 3.4 seconds from cache. Your index-scan fetches 3.5 mio in 1.7 second, that's hardly half of the performance of my database. Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] less than 2 sec for response - possible?
On 02.07.2016 02:54, trafdev wrote: Hi. I'm trying to build an OLAP-oriented DB based on PostgresSQL. User works with a paginated report in the web-browser. Interface allows to fetch data for a custom date-range selection, display individual rows (20-50 per page) and totals (for entire selection, even not visible on the current page) and sorting by any column. The main goal is to deliver results of the basic SELECT queries to the end-user in less than 2 seconds. I was able to achieve that except for one table (the biggest one). It consist of multiple dimensions (date, gran, aid, pid, sid, fid, subid) and metrics (see below). User can filter by any dimension and sort by any metric. Here is a CREATE script for this table: CREATE TABLE stats.feed_sub ( date date NOT NULL, gran interval NOT NULL, aid smallint NOT NULL, pid smallint NOT NULL, sid smallint NOT NULL, fid smallint NOT NULL, subid text NOT NULL, rev_est_pub real NOT NULL, rev_est_feed real NOT NULL, rev_raw real NOT NULL, c_total bigint NOT NULL, c_passed bigint NOT NULL, q_total bigint NOT NULL, q_passed bigint NOT NULL, q_filt_geo bigint NOT NULL, q_filt_browser bigint NOT NULL, q_filt_os bigint NOT NULL, q_filt_ip bigint NOT NULL, q_filt_subid bigint NOT NULL, q_filt_pause bigint NOT NULL, q_filt_click_cap_ip bigint NOT NULL, q_filt_query_cap bigint NOT NULL, q_filt_click_cap bigint NOT NULL, q_filt_rev_cap bigint NOT NULL, q_filt_erpm_floor bigint NOT NULL, c_filt_click_cap_ip bigint NOT NULL, c_filt_doubleclick bigint NOT NULL, c_filt_url_expired bigint NOT NULL, c_filt_fast_click bigint NOT NULL, c_filt_delay_clicks bigint NOT NULL, c_filt_ip_mismatch bigint NOT NULL, c_filt_ref_mismatch bigint NOT NULL, c_filt_lng_mismatch bigint NOT NULL, c_filt_ua_mismatch bigint NOT NULL, res_impr bigint NOT NULL, rev_ver_pub real, rev_ver_feed real, c_ver bigint, q_filt_ref bigint NOT NULL ) WITH ( OIDS=FALSE ); CREATE INDEX ix_feed_sub_date ON stats.feed_sub USING brin (date); CREATE UNIQUE INDEX ixu_feed_sub ON stats.feed_sub USING btree (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default"); Here is some sizing info (https://wiki.postgresql.org/wiki/Disk_Usage): relation,size stats.feed_sub,5644 MB stats.ixu_feed_sub,1594 MB row_estimate 15865627 Here is the typical query (for totals beige): SELECT sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip, sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target, sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period, sum(stats.feed_sub.c_filt_fast_click) AS fast_click, sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch, sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch, sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch, sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch, sum(stats.feed_sub.c_filt_url_expired) AS url_expired, stats.feed_sub.subidAS stats_feed_sub_subid, stats.feed_sub.sid AS stats_feed_sub_sid FROM stats.feed_sub WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND stats.feed_sub.gran = '1 day' AND stats.feed_sub.aid = 3 GROUP BY stats.feed_sub.subid, stats.feed_sub.sid; You cast every date to an timestamp. Why? You can adjust the index to: CREATE UNIQUE INDEX ixu_feed_sub ON stats.feed_sub USING btree (date::timestamp, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default"); But since i see no need for the cast at all (maybe i missed it) try it without! Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Indexes for hashes
Hello Ivan, I have an application which stores a large amounts of hex-encoded hash strings (nearly 100 GB of them), which means: * The number of distinct characters (alphabet) is limited to 16 * Each string is of the same length, 64 characters * The strings are essentially random Creating a B-Tree index on this results in the index size being larger than the table itself, and there are disk space constraints. I've found the SP-GIST radix tree index, and thought it could be a good match for the data because of the above constraints. An attempt to create it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more than 12 hours (while a similar B-tree index takes a few hours at most), so I've interrupted it because "it probably is not going to finish in a reasonable time". Some slides I found on the spgist index allude that both build time and size are not really suitable for this purpose. My question is: what would be the most size-efficient index for this situation? It depends on what you want to query. What about the BRIN-Index: https://www.postgresql.org/docs/9.5/static/brin-intro.html This will result in a very small size, but depending on what you want to query it will fit or not fit your needs. Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Excessive memory used for INSERT
Hello Alessandro, 2014-12-15 17:54:07 GMT DETAIL: Failed process was running: WITH upsert AS (update MSG set (slot,MSG,HRV,VIS006,VIS008,IR_016,IR_039,WV_062,WV_073,IR_087,IR_097,IR_108,IR_120,IR_134,PRO,EPI,CLM,TAPE) = (to_timestamp('201212032145', 'MMDDHH24MI'),2,'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\x01','\x01','\x7f','LTO5_020') where slot=to_timestamp('201212032145', 'MMDDHH24MI') and MSG=2 RETURNING *) insert into MSG (slot,MSG,HRV,VIS006,VIS008,IR_016,IR_039,WV_062,WV_073,IR_087,IR_097,IR_108,IR_120,IR_134,PRO,EPI,CLM,TAPE) select to_timestamp('201212032145', 'MMDDHH24MI'),2,'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\x01','\x01','\x7f','LTO5_020' WHERE NOT EXISTS (SELECT * FROM upsert); How many rows is (SELECT * FROM upsert) returning? Without knowing more i would guess, that the result-set is very big and that could be the reason for the memory usage. I would add an WHERE clause to reduce the result-set (an correct index can fasten this method even more). Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ZFS vs. UFS
On 24.07.2012 14:51, Laszlo Nagy wrote: * UFS is not journaled. There is journal support for UFS as far as i know. Please have a look at the gjournal manpage. Yes, but gjournal works for disk devices. That isn't completly correct! gjournal works with all GEOM-devices, which could be not only disk devices, but also (remote) disk devices, (remote) files, (remote) software-raids etc. It is very easy to mirror the *complete* disk from one *server* to another. I use this technic for customers which need cheap backups of their complete server. But a RAID card will be much faster than this. I just wanted to make this clear. Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ZFS vs. UFS
On 24.07.2012 14:51, Laszlo Nagy wrote: * UFS is not journaled. There is journal support for UFS as far as i know. Please have a look at the gjournal manpage. Greetings, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance