Re: [PERFORM] Backup taking long time !!!

2017-01-23 Thread Torsten Zuehlsdorff

Hello,


Increments in pgbackrest are done on file level which is not really
efficient. We have done parallelism, compression and page-level
increments (9.3+) in barman fork [1], but unfortunately guys from
2ndquadrant-it don’t hurry to work on it.


We're looking at page-level incremental backup in pgbackrest also.  For
larger systems, we've not heard too much complaining about it being
file-based though, which is why it hasn't been a priority.  Of course,
the OP is on 9.1 too, so.


Well, we have forked barman and made everything from the above just
because we needed ~ 2 PB of disk space for storing backups for our ~ 300
TB of data. (Our recovery window is 7 days) And on 5 TB database it took
a lot of time to make/restore a backup.


I just have around 11 TB but switched to ZFS based backups only. I'm 
using snapshots therefore which gives some flexibility. I can rolback 
them, i can just clone it and work with a full copy as a different 
cluster (and just the differences are stored) and i can send them 
incrementally to other servers. This is very fine for my use cases but 
it doesn't fit everything of course.


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL on ZFS: performance tuning

2016-09-27 Thread Torsten Zuehlsdorff



On 29.07.2016 08:30, Tomas Vondra wrote:



On 07/29/2016 08:04 AM, trafdev wrote:

Hi.

I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.

Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).

The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.

After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:


ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql


zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql
NAME   PROPERTY  VALUE SOURCE
zroot/ara/sqldb/pgsql  primarycache  all   local
zroot/ara/sqldb/pgsql  recordsize8Klocal
zroot/ara/sqldb/pgsql  logbias   latency   local
zroot/ara/sqldb/pgsql  compression   lz4   inherited from zroot

L2ARC is disabled
VDEV cache is disabled


pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"


/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0


postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB


It may not be a problem for your workload, but this effective_cache_size
value is far too high.


May i asked why? ZFS in default caches your size of RAM minus 1 GB. 
Getting the shared buffer from the 64 GB RAM i would asume 47 GB would 
be a better value. But this would not be far too high. So please can you 
explain this?


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql 8.4 optimize for heavy select load

2016-09-19 Thread Torsten Zuehlsdorff

On 19.09.2016 10:23, Mark Kirkwood wrote:

On 19/09/16 19:40, Job wrote:


Hello,

i would please like to have some suggestions to optimize Postgres 8.4
for a very heavy number of select (with join) queries.
The queries read data, very rarely they write.



We probably need to see schema and query examples to help you (with
EXPLAIN ANALYZE output). Also - err 8.4 - I (and others probably) will
recommend you upgrade to a more recent (and supported for that matter)
version - currently 9.5/9.6 - lots of performance improvements you are
missing out on!


Especially since 8.4 is out of support for 2 years:
https://www.postgresql.org/support/versioning/

Greetings,
Torsten




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] less than 2 sec for response - possible?

2016-07-06 Thread Torsten Zuehlsdorff


On 06.07.2016 17:06, trafdev wrote:

Wondering what are your CPU\RAM characteristics?


Intel Core i7-2600 Quad Core
32 GB DDR3 RAM
2x 3 TB SATA III HDD

HDD is:
Model Family: Seagate Barracuda XT
Device Model: ST33000651AS
Firmware Version: CC45
User Capacity:3,000,592,982,016 bytes [3.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:7200 rpm
Form Factor:  3.5 inches
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

RAM is for example:

Handle 0x002D, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM0
Bank Locator: BANK0
Type: DDR3
Type Detail: Synchronous
Speed: 1333 MHz
Manufacturer: Undefined
Serial Number: 4430793
Asset Tag: AssetTagNum0
Part Number: CT102464BA160B.C16
Rank: 2

OS is FreeBSD 10.3. Do you need more information?

Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] less than 2 sec for response - possible?

2016-07-06 Thread Torsten Zuehlsdorff

On 05.07.2016 17:35, trafdev wrote:
> [..]

Without TIMESTAMP cast:

QUERY PLAN
HashAggregate  (cost=1405666.90..1416585.93 rows=335970 width=86)
(actual time=4797.272..4924.015 rows=126533 loops=1)
"  Group Key: subid, sid"
  Buffers: shared hit=1486949
  ->  Index Scan using ix_feed_sub_aid_date on feed_sub
(cost=0.44..1313275.32 rows=3359694 width=86) (actual
time=0.019..1783.104 rows=3588376 loops=1)
Index Cond: ((aid = 3) AND (date >= '2016-06-01'::date) AND
(date <= '2016-06-30'::date))
Buffers: shared hit=1486949
Planning time: 0.164 ms
Execution time: 4941.259 ms

I need to be sure it's a physical limitation of a Postgresql (when all
data is in a memory and fetching\joining 1.5 mln of rows can't be done
in less than 2-3 seconds) and there is no way to improve it.


It could be a physical limitation of your hardware. I just did a short 
test on one of my databases:


Aggregate  (cost=532018.95..532018.96 rows=1 width=0) (actual 
time=3396.689..3396.689 rows=1 loops=1)

   Buffers: shared hit=155711
   ->  Index Only Scan using requests_request_time_idx on requests 
(cost=0.43..493109.90 rows=15563620 width=0) (actual 
time=0.021..2174.614 rows=16443288 loops=1)
 Index Cond: ((request_time >= '2016-07-01 
00:00:00+00'::timestamp with time zone) AND (request_time <= '2017-07-06 
00:00:00+00'::timestamp with time zone))

 Heap Fetches: 31254
 Buffers: shared hit=155711
 Planning time: 0.143 ms
 Execution time: 3396.715 ms
(8 rows)

As you can see i can get 16.4 Mio rows within 3.4 seconds from cache. 
Your index-scan fetches 3.5 mio in 1.7 second, that's hardly half of the 
performance of my database.


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] less than 2 sec for response - possible?

2016-07-05 Thread Torsten Zuehlsdorff



On 02.07.2016 02:54, trafdev wrote:

Hi.

I'm trying to build an OLAP-oriented DB based on PostgresSQL.

User works with a paginated report in the web-browser. Interface allows
to fetch data for a custom date-range selection,
display individual rows (20-50 per page) and totals (for entire
selection, even not visible on the current page) and sorting by any column.

The main goal is to deliver results of the basic SELECT queries to the
end-user in less than 2 seconds.

I was able to achieve that except for one table (the biggest one).

It consist of multiple dimensions (date, gran, aid, pid, sid, fid,
subid) and metrics (see below).
User can filter by any dimension and sort by any metric.

Here is a CREATE script for this table:

CREATE TABLE stats.feed_sub
(
  date date NOT NULL,
  gran interval NOT NULL,
  aid smallint NOT NULL,
  pid smallint NOT NULL,
  sid smallint NOT NULL,
  fid smallint NOT NULL,
  subid text NOT NULL,
  rev_est_pub real NOT NULL,
  rev_est_feed real NOT NULL,
  rev_raw real NOT NULL,
  c_total bigint NOT NULL,
  c_passed bigint NOT NULL,
  q_total bigint NOT NULL,
  q_passed bigint NOT NULL,
  q_filt_geo bigint NOT NULL,
  q_filt_browser bigint NOT NULL,
  q_filt_os bigint NOT NULL,
  q_filt_ip bigint NOT NULL,
  q_filt_subid bigint NOT NULL,
  q_filt_pause bigint NOT NULL,
  q_filt_click_cap_ip bigint NOT NULL,
  q_filt_query_cap bigint NOT NULL,
  q_filt_click_cap bigint NOT NULL,
  q_filt_rev_cap bigint NOT NULL,
  q_filt_erpm_floor bigint NOT NULL,
  c_filt_click_cap_ip bigint NOT NULL,
  c_filt_doubleclick bigint NOT NULL,
  c_filt_url_expired bigint NOT NULL,
  c_filt_fast_click bigint NOT NULL,
  c_filt_delay_clicks bigint NOT NULL,
  c_filt_ip_mismatch bigint NOT NULL,
  c_filt_ref_mismatch bigint NOT NULL,
  c_filt_lng_mismatch bigint NOT NULL,
  c_filt_ua_mismatch bigint NOT NULL,
  res_impr bigint NOT NULL,
  rev_ver_pub real,
  rev_ver_feed real,
  c_ver bigint,
  q_filt_ref bigint NOT NULL
)
WITH (
  OIDS=FALSE
);

CREATE INDEX ix_feed_sub_date
  ON stats.feed_sub
  USING brin
  (date);

CREATE UNIQUE INDEX ixu_feed_sub
  ON stats.feed_sub
  USING btree
  (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default");

Here is some sizing info (https://wiki.postgresql.org/wiki/Disk_Usage):

relation,size
stats.feed_sub,5644 MB
stats.ixu_feed_sub,1594 MB

row_estimate
15865627

Here is the typical query (for totals beige):
SELECT
  sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
  sum(stats.feed_sub.c_filt_doubleclick)  AS clicks_on_target,
  sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
  sum(stats.feed_sub.c_filt_fast_click)   AS fast_click,
  sum(stats.feed_sub.c_filt_ip_mismatch)  AS ip_mismatch,
  sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
  sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
  sum(stats.feed_sub.c_filt_ua_mismatch)  AS ua_mismatch,
  sum(stats.feed_sub.c_filt_url_expired)  AS url_expired,
  stats.feed_sub.subidAS stats_feed_sub_subid,
  stats.feed_sub.sid  AS stats_feed_sub_sid
FROM stats.feed_sub
WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
  stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
  stats.feed_sub.gran = '1 day'
  AND stats.feed_sub.aid = 3
GROUP BY
  stats.feed_sub.subid, stats.feed_sub.sid;


You cast every date to an timestamp. Why? You can adjust the index to:

CREATE UNIQUE INDEX ixu_feed_sub
ON stats.feed_sub
USING btree
(date::timestamp, gran, aid, pid, sid, fid, subid COLLATE 
pg_catalog."default");


But since i see no need for the cast at all (maybe i missed it) try it 
without!


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Torsten Zuehlsdorff

Hello Ivan,


I have an application which stores a large amounts of hex-encoded hash
strings (nearly 100 GB of them), which means:

  * The number of distinct characters (alphabet) is limited to 16
  * Each string is of the same length, 64 characters
  * The strings are essentially random

Creating a B-Tree index on this results in the index size being larger
than the table itself, and there are disk space constraints.

I've found the SP-GIST radix tree index, and thought it could be a good
match for the data because of the above constraints. An attempt to
create it (as in CREATE INDEX ON t USING spgist(field_name)) apparently
takes more than 12 hours (while a similar B-tree index takes a few hours
at most), so I've interrupted it because "it probably is not going to
finish in a reasonable time". Some slides I found on the spgist index
allude that both build time and size are not really suitable for this
purpose.

My question is: what would be the most size-efficient index for this
situation?


It depends on what you want to query. What about the BRIN-Index:
https://www.postgresql.org/docs/9.5/static/brin-intro.html

This will result in a very small size, but depending on what you want to 
query it will fit or not fit your needs.


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Excessive memory used for INSERT

2014-12-17 Thread Torsten Zuehlsdorff

Hello Alessandro,


2014-12-15 17:54:07 GMT DETAIL: Failed process was running: WITH upsert
AS (update MSG set
(slot,MSG,HRV,VIS006,VIS008,IR_016,IR_039,WV_062,WV_073,IR_087,IR_097,IR_108,IR_120,IR_134,PRO,EPI,CLM,TAPE)
= (to_timestamp('201212032145',
'MMDDHH24MI'),2,'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\x01','\x01','\x7f','LTO5_020')
where slot=to_timestamp('201212032145', 'MMDDHH24MI') and MSG=2
RETURNING *) insert into MSG
(slot,MSG,HRV,VIS006,VIS008,IR_016,IR_039,WV_062,WV_073,IR_087,IR_097,IR_108,IR_120,IR_134,PRO,EPI,CLM,TAPE)
select to_timestamp('201212032145',
'MMDDHH24MI'),2,'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff','\x01','\x01','\x7f','LTO5_020'
WHERE NOT EXISTS (SELECT * FROM upsert);


How many rows is (SELECT * FROM upsert) returning? Without knowing 
more i would guess, that the result-set is very big and that could be 
the reason for the memory usage.


I would add an WHERE clause to reduce the result-set (an correct index 
can fasten this method even more).


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ZFS vs. UFS

2012-07-25 Thread Torsten Zuehlsdorff

On 24.07.2012 14:51, Laszlo Nagy wrote:


  * UFS is not journaled.


There is journal support for UFS as far as i know. Please have a look
at the gjournal manpage.



Yes, but gjournal works for disk devices.


That isn't completly correct! gjournal works with all GEOM-devices, 
which could be not only disk devices, but also (remote) disk devices, 
(remote) files, (remote) software-raids etc.


It is very easy to mirror the *complete* disk from one *server* to 
another. I use this technic for customers which need cheap backups of 
their complete server.


But a RAID card will be much faster than this. I just wanted to make 
this clear.


Greetings,
Torsten

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ZFS vs. UFS

2012-07-24 Thread Torsten Zuehlsdorff

On 24.07.2012 14:51, Laszlo Nagy wrote:


  * UFS is not journaled.


There is journal support for UFS as far as i know. Please have a look at 
the gjournal manpage.


Greetings,
Torsten

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance