On Thu, 2013-01-17 at 14:53 -0800, Jeff Davis wrote:
> Test plan:
> 
>   1. Take current patch (without "skip VM check for small tables"
> optimization mentioned above).
>   2. Create 500 tables each about 1MB.
>   3. VACUUM them all.
>   4. Start 500 connections (one for each table)
>   5. Time the running of a loop that executes a COUNT(*) on that
> connection's table 100 times.

Done, with a few extra variables. Again, thanks to Nathan Boley for
lending me the 64-core box. Test program attached.

I did both 1MB tables and 1 tuple tables, but I ended up throwing out
the 1-tuple table results. First of all, as I said, that's a pretty easy
problem to solve, so not really what I want to test. Second, I had to do
so many iterations that I don't think I was testing anything useful. I
did see what might have been a couple differences, but I would need to
explore in more detail and I don't think it's worth it, so I'm just
reporting on the 1MB tables.

For each test, each of 500 connections runs 10 iterations of a COUNT(*)
on it's own 1MB table (which is vacuumed and has the VM bit set). The
query is prepared once. The table has only an int column.

The variable is shared_buffers, going from 32MB (near exhaustion for 500
connections) to 2048MB (everything fits).

The last column is the time range in seconds. I included the range this
time, because there was more variance in the runs but I still think they
are good test results.

master:
    32MB: 16.4 - 18.9
    64MB: 16.9 - 17.3
   128MB: 17.5 - 17.9
   256MB: 14.7 - 15.8
   384MB:  8.1 -  9.3
   448MB:  4.3 -  9.2
   512MB:  1.7 -  2.2
   576MB:  0.6 -  0.6
  1024MB:  0.6 -  0.6
  2048MB:  0.6 -  0.6

patch:
    32MB: 16.8 - 17.6
    64MB: 17.1 - 17.5
   128MB: 17.2 - 18.0
   256MB: 14.8 - 16.2
   384MB:  8.0 - 10.1
   448MB:  4.6 -  7.2
   512MB:  2.0 -  2.6
   576MB:  0.6 -  0.6
  1024MB:  0.6 -  0.6
  2048MB:  0.6 -  0.6

Conclusion:

I see about what I expect: a precipitous drop in runtime after
everything fits in shared_buffers (500 1MB tables means the inflection
point around 512MB makes a lot of sense). There does seem to be a
measurable difference right around that inflection point, but it's not
much. Considering that this is the worst case that I could devise, I am
not too concerned about this.

However, it is interesting to see that there really is a lot of
maintenance work being done when we need to move pages in and out of
shared buffers. I'm not sure that it's related to the freelists though.

For the extra pins to really be a problem, I think a much higher
percentage of the buffers would need to be pinned. Since the case we are
worried about involves scans (if it involved indexes, that would already
be using more than one pin per scan), then that means the only way to
get to a high percentage of pinned buffers is by having very small
tables. But we don't really need to use the VM when scanning very small
tables (the overhead would be elsewhere), so I think we're OK.

So, I attached a new version of the patch that doesn't look at the VM
for tables with fewer than 32 pages. That's the only change.

Regards,
        Jeff Davis
#include <libpq-fe.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>

#define QSIZE 256

void
test(char *query, int procnum, int niter)
{
  PGconn	*conn;
  PGresult	*result;
  int		 i;
  
  conn = PQconnectdb("host=/tmp dbname=postgres");
  if (PQstatus(conn) != CONNECTION_OK)
    {
      fprintf(stderr, "connection failed!\n");
      exit(1);
    }

  result = PQprepare(conn, "q", query, 0, NULL);
  if (PQresultStatus(result) != PGRES_COMMAND_OK)
    {
      fprintf(stderr, "PREPARE failed: %s", PQerrorMessage(conn));
      PQclear(result);
      exit(1);
    }
  PQclear(result);

  for (i = 0; i < niter; i++)
    {
      result = PQexecPrepared(conn, "q", 0, NULL, NULL, NULL, 0);
      if (PQresultStatus(result) != PGRES_TUPLES_OK)
	{
	  fprintf(stderr, "EXECUTE PREPARED failed: %s\n", PQerrorMessage(conn));
	  PQclear(result);
	  exit(1);
	}
      PQclear(result);
    }

  PQfinish(conn);
}

int
main(int argc, char *argv[])
{
  int	 niter;
  int	 nprocs;
  char	 query[QSIZE];
  int	 i;
  pid_t *procs;
  struct timeval tv1, tv2;

  if (argc != 3)
    {
      fprintf(stderr, "expected 3 arguments, got %d\n", argc);
      exit(1);
    }

  nprocs = atoi(argv[1]);
  niter = atoi(argv[2]);

  procs = malloc(sizeof(pid_t) * nprocs);

  gettimeofday(&tv1, NULL);

  for (i = 0; i < nprocs; i++)
    {
      pid_t pid = fork();
      if (pid == 0)
	{
	  snprintf(query, QSIZE, "SELECT COUNT(*) FROM mb_%d;", i);
	  test(query, i, niter);
	  exit(0);
	}
      else
	{
	  procs[i] = pid;
	}
    }

  for (i = 0; i < nprocs; i++)
    {
      int status;
      waitpid(procs[i], &status, 0);
      if (!WIFEXITED(status))
	{
	  fprintf(stderr, "child did not exit!\n", argc);
	  exit(1);
	}
      if (WEXITSTATUS(status) != 0)
	{
	  fprintf(stderr, "child exited with status %d\n", WEXITSTATUS(status));
	  exit(1);
	}
    }

  gettimeofday(&tv2, NULL);

  free(procs);

  if (tv2.tv_usec < tv1.tv_usec)
    {
      tv2.tv_usec += 1000000;
      tv2.tv_sec--;
    }
  

  printf("%03d.%06d\n",
	 (int) (tv2.tv_sec - tv1.tv_sec),
	 (int) (tv2.tv_usec - tv1.tv_usec));
}

Attachment: rm-pd-all-visible-20130118.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to