On 2015-08-11 17:15:22 +0200, Fabien COELHO wrote: > +void > +PerformFileFlush(FileFlushContext * context) > +{ > + if (context->ncalls != 0) > + { > + int rc; > + > +#if defined(HAVE_SYNC_FILE_RANGE) > + > + /* Linux: tell the memory manager to move these blocks to io so > + * that they are considered for being actually written to disk. > + */ > + rc = sync_file_range(context->fd, context->offset, > context->nbytes, > + SYNC_FILE_RANGE_WRITE); > + > +#elif defined(HAVE_POSIX_FADVISE) > + > + /* Others: say that data should not be kept in memory... > + * This is not exactly what we want to say, because we want to > write > + * the data for durability but we may need it later > nevertheless. > + * It seems that Linux would free the memory *if* the data has > + * already been written do disk, else the "dontneed" call is > ignored. > + * For FreeBSD this may have the desired effect of moving the > + * data to the io layer, although the system does not seem to > + * take into account the provided offset & size, so it is rather > + * rough... > + */ > + rc = posix_fadvise(context->fd, context->offset, > context->nbytes, > + POSIX_FADV_DONTNEED); > + > +#endif > + > + if (rc < 0) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not flush block " > INT64_FORMAT > + " on " INT64_FORMAT " > blocks in file \"%s\": %m", > + context->offset / > BLCKSZ, > + context->nbytes / > BLCKSZ, > + context->filename))); > + }
I'm a bit wary that this might cause significant regressions on platforms not supporting sync_file_range, but support posix_fadvise() for workloads that are bigger than shared_buffers. Consider what happens if the workload does *not* fit into shared_buffers but *does* fit into the OS's buffer cache. Suddenly reads will go to disk again, no? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers