Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-08-06 Thread Yoshimi Ichiyanagi
I'm sorry for the delay in replying your mail.

<91411837-8c65-bf7d-7ca3-d69bdcb49...@iki.fi>
Thu, 1 Mar 2018 18:40:05 +0800Heikki Linnakangas  wrote
 :
>Interesting. How does this compare with using good old mmap()?

The libpmem's pmem_map_file() supported 2M/1G(the size of huge page)
alignment, since it could reduce the number of page faults. 
In addition, libpmem's pmem_memcpy_nodrain() is the function
to copy data using single instruction, multiple data(SIMD) instructions
and NT store instructions(MOVNT).
As a result, using these APIs is faster than using old mmap()/memcpy().

Please see the PGCon2018 presentation[1] for the details.

[1] 
https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf


<83eafbfd-d9c5-6623-2423-7cab1be38...@iki.fi>
Fri, 20 Jul 2018 23:18:05 +0300Heikki Linnakangas  
wrote :
>I think the way forward with this patch would be to map WAL segments 
>with plain old mmap(), and use msync(). If that's faster than the status 
>quo, great. If not, it would still be a good stepping stone for actually 
>using PMDK. 

I think so too.

I wrote this patch to replace read/write syscalls with libpmem's
API only. I believe that PMDK can make the current PostgreSQL faster.


> If nothing else, it would provide a way to test most of the 
>code paths, without actually having a persistent memory device, or 
>libpmem. The examples at http://pmem.io/pmdk/libpmem/ actually sugest 
>doing exactly that: use libpmem to map a file to memory, and check if it 
>lives on persistent memory using libpmem's pmem_is_pmem() function. If 
>it returns yes, use pmem_drain(), if it return false, fall back to using 
>msync().

When PMEM_IS_PMEM_FORCE(the environment variable[2]) is set to 1,
pmem_is_pmem() return yes.

Linux 4.15 and more supported MAP_SYNC and MAP_SHARED_VALIDATE of
mmap() flags to check if the mapped file is stored on PMEM.
An application that used both flags in its mmap() call can be sure
that MAP_SYNC is actually supported by both the kernel and
the filesystem that the mapped file is stored in[3].
But pmem_is_pmem() doesn't support this mechanism for now.

[2] http://pmem.io/pmdk/manpages/linux/v1.4/libpmem/libpmem.7.html
[3] https://lwn.net/Articles/758594/ 

--
Yoshimi Ichiyanagi
NTT Software Innovation Center
e-mail : ichiyanagi.yosh...@lab.ntt.co.jp




Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-03-01 Thread Yoshimi Ichiyanagi
<20180301103641.tudam4mavba3g...@alap3.anarazel.de>
Thu, 1 Mar 2018 02:36:41 -0800Andres Freund  wrote :

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent 
memory
>On 2018-02-05 09:59:25 +0900, Yoshimi Ichiyanagi wrote:
>> I added my patches to the CommitFest 2018-3.
>> https://commitfest.postgresql.org/17/1485/
>
>Unfortunately this is the last CF for the v11 development cycle. This is
>a major project submitted late for v11, there's been no code level
>review, the goals aren't agreed upon yet, etc. So I'd unfortunately like
>to move this to the next CF?

I get it. I modified the status to "move to next CF".

-- 
Yoshimi Ichiyanagi
NTT laboratories




Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-02-04 Thread Yoshimi Ichiyanagi
>On Tue, Jan 30, 2018 at 3:37 AM, Yoshimi Ichiyanagi
> wrote:
>> Oracle and Microsoft SQL Server suported PMEM [1][2].
>> I think it is not too early for PostgreSQL to support PMEM.
>
>I agree; it's good to have the option available for those who have
>access to the hardware.
>
>If you haven't added your patch to the next CommitFest, please do so.

Thank you for your time.

I added my patches to the CommitFest 2018-3.
https://commitfest.postgresql.org/17/1485/

Oh by the way, we submitted this proposal(Introducing PMDK into
PostgreSQL) to PGcon2018.
If our proposal is accepted and you have time, please listen to
our presentation.

-- 
Yoshimi Ichiyanagi
Mailto : ichiyanagi.yosh...@lab.ntt.co.jp




Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-30 Thread Yoshimi Ichiyanagi

Fri, 19 Jan 2018 09:42:25 -0500Robert Haas  wrote
 :
>
>I think that you really need to include the checkpoints in the tests.
>I would suggest setting max_wal_size and/or checkpoint_timeout so that
>you reliably complete 2 checkpoints in a 30-minute test, and then do a
>comparison on that basis.

Experimental setup:
-
Server: HP ProLiant DL360 Gen9
CPU:Xeon E5-2667 v4 (3.20GHz); 2 processors(without HT)
DRAM:   DDR4-2400; 32 GiB/processor
(8GiB/socket x 4 sockets/processor) x 2 processors
NVDIMM: DDR4-2133; 32 GiB/processor
(node 0: 8GiB/socket x 2 sockets/processor,
 node 1: 8GiB/socket x 6 sockets/processor)
HDD:Seagate Constellation2 2.5inch SATA 3.0. 6Gb/s 1TB 7200rpm x 1
SATA-SSD: Crucial_CT500MX200SSD1 (SATA 3.2, SATA 6Gb/s)
OS:   Ubuntu 16.04, linux-4.12
DAX FS:   ext4
PMDK: master(at)Aug 30, 2017
PostgreSQL: master
Note: I bound the postgres processes to one NUMA node,
  and the benchmarks to other NUMA node.
-

postgresql.conf
-
# - Settings -
wal_level = replica
fsync = on
synchronous_commit = on
wal_sync_method = pmem_drain/fdatasync/open_datasync
full_page_writes = on
wal_compression = off

# - Checkpoints -
checkpoint_timeout = 12min
max_wal_size = 20GB
min_wal_size = 20GB
-

Executed commands:

# numactl -N 1 pg_ctl start -D [PG_DIR] -l [LOG_FILE]
# numactl -N 0 pgbench -s 200 -i [DB_NAME]
# numactl -N 0 pgbench -c 32 -j 32 -T 1800 -r [DB_NAME] -M prepared


The results:

A) Applied the patches to PG src, and compiled PG with libpmem
B) Applied the patches to PG src, and compiled PG without libpmem
C) Original PG

The averages of running pgbench three times on *PMEM* are:
A)
wal_sync_method = pmem_drain  tps = 41660.42524
wal_sync_method = open_datasync   tps = 39913.49897
wal_sync_method = fdatasync   tps = 39900.83396

C)
wal_sync_method = open_datasync   tps = 40335.50178
wal_sync_method = fdatasync   tps = 40649.57772


The averages of running pgbench three times on *SATA-SSD* are:
B)
wal_sync_method = open_datasync   tps = 7224.07146
wal_sync_method = fdatasync   tps = 7222.19177

C)
wal_sync_method = open_datasync   tps = 7258.79093
wal_sync_method = fdatasync   tps = 7263.19878


>From the above results, it show that wal_sync_method=pmem_drain was
about faster than wal_sync_method=open_datasync/fdatasync.
When pgbench ran on SATA-SSD, wal_sync_method=fdatasync was as fast
as wal_sync_method=open_datasync.


>> Do you know any good WAL I/O intensive benchmarks? DBT2?
>
>pgbench is quite a WAL-intensive benchmark; it is much more
>write-heavy than what most systems experience in real life, at least
>in my experience.  Your comparison of DAX FS to DAX FS + PMDK is very
>interesting, but in real life the bandwidth of DAX FS is already so
>high -- and the latency so low -- that I think most real-world
>workloads won't gain very much.  At least, that is my impression based
>on internal testing EnterpriseDB did a few months back.  (Thanks to
>Mithun and Kuntal for that work.)

In the near future, many physical devices will send sensing data
(IoT might allow devices to exhaust tens Giga network bandwidth).
The amount of data inserted in the DB will significantly increase.
I think that PMEM will be needed for use cases like IoT.



Thu, 25 Jan 2018 09:30:45 -0500Robert Haas  wrote
 :
>Well, some day persistent memory may be a common enough storage
>technology that such a change makes sense, but these days most people
>have either SSD or spinning disks, where the change would probably be
>a net negative.  It seems more like something we might think about
>changing in PG 20 or PG 30.
>

Oracle and Microsoft SQL Server suported PMEM [1][2].
I think it is not too early for PostgreSQL to support PMEM.

[1] http://dbheartbeat.blogspot.jp/2017/11/doag-2017-oracle-18c-dbim-oracle.htm
[2] 
https://www.snia.org/sites/default/files/PM-Summit/2018/presentations/06_PM_Summit_2018_Talpey-Final_Post-CORRECTED.pdf

-- 
Yoshimi Ichiyanagi




Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-19 Thread Yoshimi Ichiyanagi
Thank you for your reply.


Wed, 17 Jan 2018 15:29:11 -0500Robert Haas  wrote :
>> Using pgbench which is a PostgreSQL general benchmark, the postgres server
>> to which the patches is applied is about 5% faster than original server.
>> And using my insert benchmark, it is up to 90% faster than original one.
>> I will describe these details later.
>
>Interesting.  But your insert benchmark looks highly artificial... in
>real life, you would not insert the same long static string 160
>million times.  Or if you did, you would use COPY or INSERT .. SELECT.

I made this benchmark in order to put very heavy WAL I/O load on PMEM.

PMEM is very fast. I ran the micro-benchmark test like fio on PMEM.
This workload involved 8K Bytes-block synchronous sequential writes,
and the total write size was 40G Bytes.

The micro-benchmark result was the following.
Using DAX FS(like fdatasync): 5,559 MB/sec
Using DAX FS and PMDK(like pmem_drain):  13,177 MB/sec

Using pgbench, the postgres server to which my patches were applied was
only 5% faster than the original server.
>> The averages of running pgbench three times are:
>> wal_sync_method=fdatasync:   tps = 43,179
>> wal_sync_method=pmem_drain:  tps = 45,254

While this pgbench was running, the utilization of 8 CPU cores(on which
the postgres server was runnnig) was about 800%, and the throughput of
WAL I/O was about 10 MB/sec. I thought that pgbench was not enough to put
heavy WAL I/O load on PMEM. So I made and ran the WAL I/O intensive test.

Do you know any good WAL I/O intensive benchmarks? DBT2?


Wed, 17 Jan 2018 15:40:25 -0500Robert Haas  wrote :
>> C-5. Running the 2 benchmarks(1. pgbench, 2. my insert benchmark)
>> C-5-1. pgbench
>> # numactl -N 1 pgbench -c 32 -j 8 -T 120 -M prepared [DB_NAME]
>>
>> The averages of running pgbench three times are:
>> wal_sync_method=fdatasync:   tps = 43,179
>> wal_sync_method=pmem_drain:  tps = 45,254
>
>What scale factor was used for this test?
This scale factor was 200.

# numactl -N 0 pgbench -s 200 -i [DB_NAME]


>Was the only non-default configuration setting wal_sync_method?  i.e.
>synchronous_commit=on?  No change to max_wal_size?
No, I used the following parameter in postgresql.conf to prevent
checkpoints from occurring while running the tests.

# - Settings -
wal_level = replica
fsync = on
synchronous_commit = on
wal_sync_method = pmem_drain
full_page_writes = on
wal_compression = off

# - Checkpoints -
checkpoint_timeout = 1d
max_wal_size = 20GB
min_wal_size = 20GB

>This seems like an exceedingly short test -- normally, for write
>tests, I recommend the median of 3 30-minute runs.  It also seems
>likely to be client-bound, because of the fact that jobs = clients/4.
>Normally I use jobs = clients or at least jobs = clients/2.
>

Thank you for your kind proposal. I did that.

# numactl -N 0 pgbench -s 200 -i [DB_NAME]
# numactl -N 1 pgbench -c 32 -j 32 -T 1800 -M prepared [DB_NAME]

The averages of running pgbench three times are:
wal_sync_method=fdatasync:   tps = 39,966
wal_sync_method=pmem_drain:  tps = 41,365


--
Yoshimi Ichiyanagi




[HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-16 Thread Yoshimi Ichiyanagi
B_NAME]

The averages of running pgbench three times are:
wal_sync_method=fdatasync:   tps = 43,179
wal_sync_method=pmem_drain:  tps = 45,254

C-5-2. pclinet_thread: my insert benchmark
Preparation
CREATE TABLE [TABLE_NAME] (id int8, value text);
ALTER TABLE [TABLE_NAME] ALTER value SET STORAGE external;
PREPARE insert_sql (int8) AS INSERT INTO %s (id, value) values ($1, '
[1K_data]');

Execution
BEGIN; EXECUTE insert_sql(%lld); COMMIT;
Note: I ran this quer 5M times with 32 threads. 

# ./pclient_thread
Invalid Arguments:
Usage: ./pclient_thread [The number of threads] [The number to insert 
tuples] [data size(KB)]
# numactl -N 1 ./pclient_thread 32 5242880 1


The averages of running this benchmark three times are:
wal_sync_method=fdatasync:   tps =  67,780
wal_sync_method=pmem_drain:  tps = 131,962

--
Yoshimi Ichiyanagi

0001-Add-configure-option-for-PMDK.patch
Description: Binary data


0002-Read-write-WAL-files-using-PMDK.patch
Description: Binary data


0003-Walreceiver-WAL-IO-using-PMDK.patch
Description: Binary data