Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
Oskari Saarenmaawrites: > 17.08.2016, 22:11, Tom Lane kirjoitti: >> I'd be more excited about this if the claimed improvement were more than >> 1.5%, but you know as well as I do that that's barely above the noise >> floor for most performance measurements. I'm left wondering why bother, >> and why take any risk of de-optimizing on some platforms. > I think it makes sense to try to optimize for the platforms that people > actually use for performance critical workloads, especially if it also > allows us to simplify the code and remove more lines than we add. It's > nice if the software still works on legacy platforms, but I don't think > we should be concerned about a hypothetical performance impact on > platforms no one uses in production anymore. Well, my point remains that I see little value in messing with long-established code if you can't demonstrate a benefit that's clearly above the noise level. We don't really know whether this change might have adverse impacts somewhere --- either performance-wise or bug-wise --- and for that amount of benefit I don't want to take the trouble to find out. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
17.08.2016, 22:11, Tom Lane kirjoitti: Robert Haaswrites: I don't understand why you think this would create non-trivial portability issues. The patch as submitted breaks entirely on platforms without pread/pwrite. Yes, we can add a configure test and some shim functions to fix that, but the argument that it makes the code shorter will get a lot weaker once we do. I posted an updated patch which just calls lseek + read/write, the code's still a lot shorter. I agree that adding such functions is pretty trivial, but there are reasons to think there are other hazards that are less trivial: First, a self-contained shim function will necessarily do an lseek every time, which means performance will get *worse* not better on non-pread platforms. And yes, the existing logic to avoid lseeks fires often enough to be worthwhile, particularly in seqscans. This will only regress on platforms without pread. The only relevant such platform appears to be Windows which has equivalent APIs. FWIW, I ran the same pgbench benchmarks on my Linux system where I always used lseek() + read/write instead of pread and pwrite - they ran slightly faster than the previous code which saved seek positions, but I suppose a workload with lots of seqscans could be slower. Unfortunately I didn't save the actual numbers anywhere, but I can rerun the benchmarks if you're interested. The numbers were pretty stable across multiple runs. Second, I wonder whether this will break any kernel's readahead detection. I wouldn't be too surprised if successive reads (not preads) without intervening lseeks are needed to trigger readahead on at least some platforms. So there's a potential, both on platforms with pread and those without, for this to completely destroy seqscan performance, with penalties very far exceeding what we might save by avoiding some kernel calls. At least Linux and FreeBSD don't seem to care how and why you read pages, they'll do readahead regardless of the way you read files and extend the readahead once you access previously readahead pages. They disable readahead only if fadvise(POSIX_FADV_RANDOM) has been used. I'd expect any kernel that implements mmap to also implement readahead based on page usage rather than than the seek position. Do you know of a kernel that would actually use the seek position for readahead? I'd be more excited about this if the claimed improvement were more than 1.5%, but you know as well as I do that that's barely above the noise floor for most performance measurements. I'm left wondering why bother, and why take any risk of de-optimizing on some platforms. I think it makes sense to try to optimize for the platforms that people actually use for performance critical workloads, especially if it also allows us to simplify the code and remove more lines than we add. It's nice if the software still works on legacy platforms, but I don't think we should be concerned about a hypothetical performance impact on platforms no one uses in production anymore. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
Robert Haaswrites: > Well, I think you're pointing out some things that need to be figured > out, but I hardly think that's a good enough reason to pour cold water > on the whole approach. If somebody feels like doing the legwork to find out if those performance hazards are real (which I freely concede they may not be), fine. I'm just saying this isn't a slam-dunk commit-it-and-move-on win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, Aug 17, 2016 at 3:11 PM, Tom Lanewrote: > Robert Haas writes: >> I don't understand why you think this would create non-trivial >> portability issues. > > The patch as submitted breaks entirely on platforms without pread/pwrite. > Yes, we can add a configure test and some shim functions to fix that, > but the argument that it makes the code shorter will get a lot weaker > once we do. > > I agree that adding such functions is pretty trivial, but there are > reasons to think there are other hazards that are less trivial: > > First, a self-contained shim function will necessarily do an lseek every > time, which means performance will get *worse* not better on non-pread > platforms. And yes, the existing logic to avoid lseeks fires often enough > to be worthwhile, particularly in seqscans. > > Second, I wonder whether this will break any kernel's readahead detection. > I wouldn't be too surprised if successive reads (not preads) without > intervening lseeks are needed to trigger readahead on at least some > platforms. So there's a potential, both on platforms with pread and those > without, for this to completely destroy seqscan performance, with > penalties very far exceeding what we might save by avoiding some kernel > calls. > > I'd be more excited about this if the claimed improvement were more than > 1.5%, but you know as well as I do that that's barely above the noise > floor for most performance measurements. I'm left wondering why bother, > and why take any risk of de-optimizing on some platforms. Well, I think you're pointing out some things that need to be figured out, but I hardly think that's a good enough reason to pour cold water on the whole approach. The number of lseeks we issue on many workloads is absolutely appalling, and I don't think there's any reason at all to assume that a 1.5% gain is as good as it gets. Even if it is, a 1% speedup on a benchmark where the noise is 5-10% is just as much of a speedup as a 1% speedup on a benchmark on a benchmark where the noise is 0.1%. Faster is faster, and 1% improvements are not so numerous that we can afford to ignore them when they pop up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
Robert Haaswrites: > I don't understand why you think this would create non-trivial > portability issues. The patch as submitted breaks entirely on platforms without pread/pwrite. Yes, we can add a configure test and some shim functions to fix that, but the argument that it makes the code shorter will get a lot weaker once we do. I agree that adding such functions is pretty trivial, but there are reasons to think there are other hazards that are less trivial: First, a self-contained shim function will necessarily do an lseek every time, which means performance will get *worse* not better on non-pread platforms. And yes, the existing logic to avoid lseeks fires often enough to be worthwhile, particularly in seqscans. Second, I wonder whether this will break any kernel's readahead detection. I wouldn't be too surprised if successive reads (not preads) without intervening lseeks are needed to trigger readahead on at least some platforms. So there's a potential, both on platforms with pread and those without, for this to completely destroy seqscan performance, with penalties very far exceeding what we might save by avoiding some kernel calls. I'd be more excited about this if the claimed improvement were more than 1.5%, but you know as well as I do that that's barely above the noise floor for most performance measurements. I'm left wondering why bother, and why take any risk of de-optimizing on some platforms. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, Aug 17, 2016 at 10:40 AM, Tom Lanewrote: > Oskari Saarenmaa writes: >> On my laptop a simple pgbench run (scale 100, 15 minutes) shows a 1.5% >> performance improvement. > > I would have hoped for a lot better result before anyone would propose > that we should deal with all the portability issues this'll create. > >> A 1.5% performance improvement is small but >> measurable - and IMV more importantly it allows us to drop more than 100 >> lines of backwards (compatible?) code; maybe we could start targeting >> more recent platforms in v10? > > That's basically nonsense: we'll end up adding way more than that to > deal with platforms that haven't got these APIs. Perhaps a better target would then be to try and make use of readv and writev, which should both be useful for sequential access of various kinds and network I/O. For instance, when reading 10 sequential buffers, a readv could fill 10 buffers at a time. I remember a project where we got a linear improvement in thoughput by using them for network I/O, because we were limited by packet thoughput rather than byte thoughput, and using the iovec utilities reduced the overhead considerably. But all this is anecdotal evidence in any case, YMMV. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, Aug 17, 2016 at 9:40 AM, Tom Lanewrote: > Oskari Saarenmaa writes: >> On my laptop a simple pgbench run (scale 100, 15 minutes) shows a 1.5% >> performance improvement. > > I would have hoped for a lot better result before anyone would propose > that we should deal with all the portability issues this'll create. > >> A 1.5% performance improvement is small but >> measurable - and IMV more importantly it allows us to drop more than 100 >> lines of backwards (compatible?) code; maybe we could start targeting >> more recent platforms in v10? > > That's basically nonsense: we'll end up adding way more than that to > deal with platforms that haven't got these APIs. I don't understand why you think this would create non-trivial portability issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
17.08.2016, 16:40, Tom Lane kirjoitti: Oskari Saarenmaawrites: On my laptop a simple pgbench run (scale 100, 15 minutes) shows a 1.5% performance improvement. I would have hoped for a lot better result before anyone would propose that we should deal with all the portability issues this'll create. AFAICT pread and pwrite are available on pretty much all operating systems released in 2000s; it was added to Linux in 1997. Windows and HP-UX 10.20 don't have it, but we can just simulate it using lseek + read/write there without adding too much code. A 1.5% performance improvement is small but measurable - and IMV more importantly it allows us to drop more than 100 lines of backwards (compatible?) code; maybe we could start targeting more recent platforms in v10? That's basically nonsense: we'll end up adding way more than that to deal with platforms that haven't got these APIs. Attached an updated patch that adds a configure check and uses lseek+read/write instead pread/pwrite when the latter aren't available. The previous code ended up seeking anyway in most of the cases and pgbench shows no performance regression on my Linux box. 8 files changed, 54 insertions(+), 168 deletions(-) / Oskari diff --git configure configure index 45c8eef..4d50b52 100755 --- configure +++ configure @@ -12473,7 +12473,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l +for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pread pstat pthread_is_threaded_np pwrite readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" diff --git configure.in configure.in index c878b4e..5eddaca 100644 --- configure.in +++ configure.in @@ -1446,7 +1446,7 @@ PGAC_FUNC_WCSTOMBS_L LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l]) +AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pread pstat pthread_is_threaded_np pwrite readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l]) AC_REPLACE_FUNCS(fseeko) case $host_os in diff --git src/backend/access/heap/rewriteheap.c src/backend/access/heap/rewriteheap.c index f9ce986..6dc5df3 100644 --- src/backend/access/heap/rewriteheap.c +++ src/backend/access/heap/rewriteheap.c @@ -918,7 +918,7 @@ logical_heap_rewrite_flush_mappings(RewriteState state) * Note that we deviate from the usual WAL coding practices here, * check the above "Logical rewrite support" comment for reasoning. */ - written = FileWrite(src->vfd, waldata_start, len); + written = FileWriteAt(src->vfd, waldata_start, len, src->off); if (written != len) ereport(ERROR, (errcode_for_file_access(), diff --git src/backend/storage/file/buffile.c src/backend/storage/file/buffile.c index 042be79..39482fa 100644 --- src/backend/storage/file/buffile.c +++ src/backend/storage/file/buffile.c @@ -60,12 +60,6 @@ struct BufFile int numFiles; /* number of physical files in set */ /* all files except the last have length exactly MAX_PHYSICAL_FILESIZE */ File *files; /* palloc'd array with numFiles entries */ - off_t *offsets; /* palloc'd array with numFiles entries */ - - /* - * offsets[i] is the current seek position of files[i]. We use this to - * avoid making redundant FileSeek calls. - */ bool isTemp; /* can only add files if this is TRUE */ bool isInterXact; /* keep open over transactions? */ @@ -108,8 +102,6 @@ makeBufFile(File firstfile) file->numFiles = 1; file->files = (File *) palloc(sizeof(File)); file->files[0] = firstfile; - file->offsets = (off_t *) palloc(sizeof(off_t)); - file->offsets[0] = 0L; file->isTemp = false; file->isInterXact = false; file->dirty = false; @@ -143,10 +135,7 @@ extendBufFile(BufFile *file) file->files = (File *) repalloc(file->files, (file->numFiles + 1) * sizeof(File)); - file->offsets = (off_t *) repalloc(file->offsets, - (file->numFiles + 1) * sizeof(off_t)); file->files[file->numFiles] = pfile; - file->offsets[file->numFiles] = 0L; file->numFiles++; } @@ -210,7 +199,6 @@ BufFileClose(BufFile *file) FileClose(file->files[i]); /* release the buffer space */ pfree(file->files); - pfree(file->offsets); pfree(file); }
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
Oskari Saarenmaawrites: > On my laptop a simple pgbench run (scale 100, 15 minutes) shows a 1.5% > performance improvement. I would have hoped for a lot better result before anyone would propose that we should deal with all the portability issues this'll create. > A 1.5% performance improvement is small but > measurable - and IMV more importantly it allows us to drop more than 100 > lines of backwards (compatible?) code; maybe we could start targeting > more recent platforms in v10? That's basically nonsense: we'll end up adding way more than that to deal with platforms that haven't got these APIs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, 17 Aug 2016 12:17:35 +0200 Magnus Haganderwrote: > On Wed, Aug 17, 2016 at 11:34 AM, Victor Wagner > wrote: > > I don't think that all platforms, supported by PostgreSQL support > > this API. Especially, I cannot find any mention of pread/pwrite in > > the Win32 except this thread on stackoverflow: > > > > > Yeah, Windows does not have those API calls, but it shouldn't be > rocket science to write a wrapper for it. The standard windows APIs > can do the same thing -- but they'll need access to the HANDLE for > the file and not the posix file descriptor. There is _get_osfhandle function, which allows to find out Windows HANDLE associated with posix file descriptor. Really my question was - someone should write these wrappers into src/port and add corresponding test to the configure and/or CMakefile for this patch to be complete. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, Aug 17, 2016 at 12:41 PM, Dagfinn Ilmari Mannsåker < ilm...@ilmari.org> wrote: > Magnus Haganderwrites: > > [pread/pwrite] > > > Yeah, Windows does not have those API calls, but it shouldn't be rocket > > science to write a wrapper for it. The standard windows APIs can do the > > same thing -- but they'll need access to the HANDLE for the file and not > > the posix file descriptor. > Just to be clear, I'm referring to the standard ReadFile() and WriteFile() APIs here. > > > > It also has things like ReadFileScatter() ( > > https://msdn.microsoft.com/en-us/library/windows/desktop/ > aa365469(v=vs.85).aspx) > > which is not the same, but might also be interesting as a future > > improvement. > > That looks a lot like POSIX readv() > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readv.html), > and as far as I can tell it has the same issue as it in that it doesn't > take an offset argument, but requires you to seek first. > Ah yeah, for some reason I keep getting readv() and pread(). Which solve a different problem (see below about that function not having the same issues on windows -- but it's still not the problem we're trying to solve here) > Linux and modern BSDs however have preadv() > (http://manpages.ubuntu.com/manpages/xenial/en/man2/preadv.2.html), > which takes an offset and an iovec array. I don't know if Windows and > other platforms have anything similar. > ReadFileScatter() can take the offset from OVERLAPPED (same as regular ReadFile) and an array of FILE_SEGMENT_ELEMENT, same as preadv(). But the APIs start looking more different the further down the rabbithole you go, I think. But the capability is definitely there (and has been for ages so is in all supported version). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
Magnus Haganderwrites: [pread/pwrite] > Yeah, Windows does not have those API calls, but it shouldn't be rocket > science to write a wrapper for it. The standard windows APIs can do the > same thing -- but they'll need access to the HANDLE for the file and not > the posix file descriptor. > > It also has things like ReadFileScatter() ( > https://msdn.microsoft.com/en-us/library/windows/desktop/aa365469(v=vs.85).aspx) > which is not the same, but might also be interesting as a future > improvement. That looks a lot like POSIX readv() (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readv.html), and as far as I can tell it has the same issue as it in that it doesn't take an offset argument, but requires you to seek first. Linux and modern BSDs however have preadv() (http://manpages.ubuntu.com/manpages/xenial/en/man2/preadv.2.html), which takes an offset and an iovec array. I don't know if Windows and other platforms have anything similar. -- - Twitter seems more influential [than blogs] in the 'gets reported in the mainstream press' sense at least. - Matt McLeod - That'd be because the content of a tweet is easier to condense down to a mainstream media article. - Calle Dybedahl -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, Aug 17, 2016 at 11:34 AM, Victor Wagnerwrote: > On Wed, 17 Aug 2016 10:58:09 +0300 > Oskari Saarenmaa wrote: > > > > > > > The attached patch replaces FileWrite and FileRead with FileWriteAt > > and FileReadAt and removes most FileSeek calls. FileSeek is still > > around so we can find the end of a file, but it's not used for > > anything else. > > It seems that configure test for availability of pread/pwrite functions > and corresponding #define is needed. > > I don't think that all platforms, supported by PostgreSQL support this > API. Especially, I cannot find any mention of pread/pwrite in the Win32 > except this thread on stackoverflow: > > Yeah, Windows does not have those API calls, but it shouldn't be rocket science to write a wrapper for it. The standard windows APIs can do the same thing -- but they'll need access to the HANDLE for the file and not the posix file descriptor. It also has things like ReadFileScatter() ( https://msdn.microsoft.com/en-us/library/windows/desktop/aa365469(v=vs.85).aspx) which is not the same, but might also be interesting as a future improvement. That's clearly something different though, and out of scope for this one. But IIRC that functionality was actually added for the sake of SQLServer back in the days. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Use pread and pwrite instead of lseek + write and read
On Wed, 17 Aug 2016 10:58:09 +0300 Oskari Saarenmaawrote: > > The attached patch replaces FileWrite and FileRead with FileWriteAt > and FileReadAt and removes most FileSeek calls. FileSeek is still > around so we can find the end of a file, but it's not used for > anything else. It seems that configure test for availability of pread/pwrite functions and corresponding #define is needed. I don't think that all platforms, supported by PostgreSQL support this API. Especially, I cannot find any mention of pread/pwrite in the Win32 except this thread on stackoverflow: http://stackoverflow.com/questions/766477/are-there-equivalents-to-pread-on-different-platforms -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers