Why is pq_begintypsend so slow?

Jeff Janes Sat, 11 Jan 2020 11:05:48 -0800

I was using COPY recently and was wondering why BINARY format is not much
(if any) faster than the default format.  Once I switched from mostly
exporting ints to mostly exporting double precisions (7e6 rows of 100
columns, randomly generated), it was faster, but not by as much as I
intuitively thought it should be.


Running 'perf top' to profile a "COPY BINARY .. TO '/dev/null'" on a AWS
m5.large machine running Ubuntu 18.04, with self compiled PostgreSQL:

PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit

I saw that the hotspot was pq_begintypsend at 20%, which was twice the
percentage as the next place winner (AllocSetAlloc).  If I drill down into
teh function, I see something like the below.  I don't really speak
assembly, but usually when I see an assembly instruction being especially
hot and not being the inner most instruction in a loop, I blame it on CPU
cache misses.  But everything being touched here should already be well
cached, since  initStringInfo has just got done setting it up. And if not
for that, then the by the 2nd invocation of appendStringInfoCharMacro it
certainly should be in the cache, yet that one is even slower than the 1st
appendStringInfoCharMacro.

Why is this such a bottleneck?

pq_begintypsend  /usr/local/pgsql/bin/postgres

 0.15 |    push  %rbx
 0.09 |    mov  %rdi,%rbx
   |       initStringInfo(buf);
 3.03 |    callq initStringInfo
   |       /* Reserve four bytes for the bytea length word */
   |       appendStringInfoCharMacro(buf, '\0');
   |    movslq 0x8(%rbx),%rax
 1.05 |    lea  0x1(%rax),%edx
 0.72 |    cmp  0xc(%rbx),%edx
   |    jge  b0
 2.92 |    mov  (%rbx),%rdx
   |    movb  $0x0,(%rdx,%rax,1)
13.76 |    mov  0x8(%rbx),%eax
 0.81 |    mov  (%rbx),%rdx
 0.52 |    add  $0x1,%eax
 0.12 |    mov  %eax,0x8(%rbx)
 2.85 |    cltq
 0.01 |    movb  $0x0,(%rdx,%rax,1)
   |       appendStringInfoCharMacro(buf, '\0');
10.65 |    movslq 0x8(%rbx),%rax
   |    lea  0x1(%rax),%edx
 0.90 |    cmp  0xc(%rbx),%edx
   |    jge  ca
 0.54 | 42:  mov  (%rbx),%rdx
 1.84 |    movb  $0x0,(%rdx,%rax,1)
13.88 |    mov  0x8(%rbx),%eax
 0.03 |    mov  (%rbx),%rdx
   |    add  $0x1,%eax
 0.33 |    mov  %eax,0x8(%rbx)
 2.60 |    cltq
 0.06 |    movb  $0x0,(%rdx,%rax,1)
   |       appendStringInfoCharMacro(buf, '\0');
 3.21 |    movslq 0x8(%rbx),%rax
 0.23 |    lea  0x1(%rax),%edx
 1.74 |    cmp  0xc(%rbx),%edx
   |    jge  e0
 0.21 | 67:  mov  (%rbx),%rdx
 1.18 |    movb  $0x0,(%rdx,%rax,1)
 9.29 |    mov  0x8(%rbx),%eax
 0.18 |    mov  (%rbx),%rdx
   |    add  $0x1,%eax
 0.19 |    mov  %eax,0x8(%rbx)
 3.14 |    cltq
 0.12 |    movb  $0x0,(%rdx,%rax,1)
   |       appendStringInfoCharMacro(buf, '\0');
 5.29 |    movslq 0x8(%rbx),%rax
 0.03 |    lea  0x1(%rax),%edx
 1.45 |    cmp  0xc(%rbx),%edx
   |    jge  f6
 0.41 | 8c:  mov  (%rbx),%rdx

Cheers,

Jeff

Why is pq_begintypsend so slow?

Reply via email to