On 2015-08-11 17:15:22 +0200, Fabien COELHO wrote:
> +void
> +PerformFileFlush(FileFlushContext * context)
> +{
> +     if (context->ncalls != 0)
> +     {
> +             int rc;
> +
> +#if defined(HAVE_SYNC_FILE_RANGE)
> +
> +             /* Linux: tell the memory manager to move these blocks to io so
> +              * that they are considered for being actually written to disk.
> +              */
> +             rc = sync_file_range(context->fd, context->offset, 
> context->nbytes,
> +                                                      SYNC_FILE_RANGE_WRITE);
> +
> +#elif defined(HAVE_POSIX_FADVISE)
> +
> +             /* Others: say that data should not be kept in memory...
> +              * This is not exactly what we want to say, because we want to 
> write
> +              * the data for durability but we may need it later 
> nevertheless.
> +              * It seems that Linux would free the memory *if* the data has
> +              * already been written do disk, else the "dontneed" call is 
> ignored.
> +              * For FreeBSD this may have the desired effect of moving the
> +              * data to the io layer, although the system does not seem to
> +              * take into account the provided offset & size, so it is rather
> +              * rough...
> +              */
> +             rc = posix_fadvise(context->fd, context->offset, 
> context->nbytes,
> +                                                POSIX_FADV_DONTNEED);
> +
> +#endif
> +
> +             if (rc < 0)
> +                     ereport(ERROR,
> +                                     (errcode_for_file_access(),
> +                                      errmsg("could not flush block " 
> INT64_FORMAT
> +                                                     " on " INT64_FORMAT " 
> blocks in file \"%s\": %m",
> +                                                     context->offset / 
> BLCKSZ,
> +                                                     context->nbytes / 
> BLCKSZ,
> +                                                     context->filename)));
> +     }

I'm a bit wary that this might cause significant regressions on
platforms not supporting sync_file_range, but support posix_fadvise()
for workloads that are bigger than shared_buffers. Consider what happens
if the workload does *not* fit into shared_buffers but *does* fit into
the OS's buffer cache. Suddenly reads will go to disk again, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to