On Tue, Sep 1, 2015 at 5:30 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> > Hello Amit, > > About the disks: what kind of HDD (RAID? speed?)? HDD write cache? >>> >> >> Speed of Reads - >> Timing cached reads: 27790 MB in 1.98 seconds = 14001.86 MB/sec >> Timing buffered disk reads: 3830 MB in 3.00 seconds = 1276.55 MB/sec >> > > Woops.... 14 GB/s and 1.2 GB/s?! Is this a *hard* disk?? Yes, there is no SSD in system. I have confirmed the same. There are RAID spinning drives. > > > Copy speed - >> >> dd if=/dev/zero of=/tmp/output.img bs=8k count=256k >> 262144+0 records in >> 262144+0 records out >> 2147483648 bytes (2.1 GB) copied, 1.30993 s, 1.6 GB/s >> > > Woops, 1.6 GB/s write... same questions, "rotating plates"?? One thing to notice is that if I don't remove the output file (output.img) the speed is much slower, see the below output. I think this means in our case we will get ~320 MB/s dd if=/dev/zero of=/data/akapila/output.img bs=8k count=256k 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 1.28086 s, 1.7 GB/s dd if=/dev/zero of=/data/akapila/output.img bs=8k count=256k 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 6.72301 s, 319 MB/s dd if=/dev/zero of=/data/akapila/output.img bs=8k count=256k 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 6.73963 s, 319 MB/s If I remove the file each time: dd if=/dev/zero of=/data/akapila/output.img bs=8k count=256k 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 1.2855 s, 1.7 GB/s rm /data/akapila/output.img dd if=/dev/zero of=/data/akapila/output.img bs=8k count=256k 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 1.27725 s, 1.7 GB/s rm /data/akapila/output.img dd if=/dev/zero of=/data/akapila/output.img bs=8k count=256k 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 1.27417 s, 1.7 GB/s rm /data/akapila/output.img > Looks more like several SSD... Or the file is kept in memory and not > committed to disk yet? Try a "sync" afterwards?? > > > If these are SSD, or if there is some SSD cache on top of the HDD, I would > not expect the patch to do much, because the SSD random I/O writes are > pretty comparable to sequential I/O writes. > > I would be curious whether flushing helps, though. > > Yes, me too. I think we should try to reach on consensus for exact scenarios and configuration where this patch('es) can give benefit or we want to verify if there is any regression as I have access to this m/c for a very-very limited time. This m/c might get formatted soon for some other purpose. > max_wal_size=5GB >>>> >>> >>> Hmmm... Maybe quite small given the average performance? >>> >> >> We can check with larger value, but do you expect some different >> results and why? >> > > Because checkpoints are xlog triggered (which depends on max_wal_size) or > time triggered (which depends on checkpoint_timeout). Given the large tps, > I expect that the WAL is filled very quickly hence may trigger checkpoints > every ... that is the question. > > checkpoint_timeout=2min >>>> >>> >>> This seems rather small. Are the checkpoints xlog or time triggered? >>> >> >> I wanted to test by triggering more checkpoints, but I can test with >> larger checkpoint interval as wel like 5 or 10 mins. Any suggestions? >> > > For a +2 hours test, I would suggest 10 or 15 minutes. > > Okay, lets keep it as 10 minutes. I don't think increasing shared_buffers would have any impact, because >> 8GB is sufficient for 300 scale factor data, >> > > It fits at the beginning, but when updates and inserts are performed > postgres adds new pages (update = delete + insert), and the deleted space > is eventually reclaimed by vacuum later on. > > Now if space is available in the page it is reused, so what really happens > is not that simple... > > At 8500 tps the disk space extension for tables may be up to 3 MB/s at the > beginning, and would evolve but should be at least about 0.6 MB/s (insert > in history, assuming updates are performed in page), on average. > > So whether the database fits in 8 GB shared buffer during the 2 hours of > the pgbench run is an open question. > > With this kind of configuration, I have noticed that more than 80% of updates are HOT updates, not much bloat, so I think it won't cross 8GB limit, but still I can keep it to 32GB if you have any doubts. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com