Re: [HACKERS] Inefficient bytea escaping?

Andreas Pflug Thu, 25 May 2006 11:38:13 -0700

Tom Lane wrote:

Andreas Pflug <[EMAIL PROTECTED]> writes:
When dumping the table with psql \copy (non-binary), the resulting filewould be 6.6GB of size, taking about 5.5 minutes. Using psql \copy WITHBINARY (modified psql as posted to -patches), the time was cut down to21-22 seconds (filesize 1.4GB as expected), which is near the physicalthroughput of the target disk. If server based COPY to file is used, Thesame factor 12 can be observed, CPU is up to 100 % (single P4 3GHz 2MBCache HT disabled, 1GB main mem).
This is with an 8.0.x server, right?


I've tested both 8.0.5 and 8.1.4, no difference observed.

Testing a similar case with CVS HEAD, I see about a 5x speed difference,
which is right in line with the difference in the physical amount of
data written.


That's what I would have expected, apparently the data is near worst case.

  (I was testing a case where all the bytes were emitted as

'\nnn', so it's the worst case.)  oprofile says the time is being spent
in CopyAttributeOutText() and fwrite().  So I don't think there's
anything to be optimized here, as far as bytea goes: its binary
representation is just inherently a lot smaller.

Unfortunately, binary isn't the cure for all, since copying normal datawith binary option might bloat that by factor two or so. I wish therewas a third option that's fine for both kinds of data. That's not only aquestion of dump file sizes, but also of network throughput (an onlinecompression in the line protocol would be desirable for this).

Looking at CopySendData, I wonder whether any traction could be gained
by trying not to call fwrite() once per character.  I'm not sure how
much per-call overhead there is in that function.  We've done a lot of
work trying to optimize the COPY IN path since 8.0, but nothing much
on COPY OUT ...

Hm, I'll see whether I can manage to check CVS head too, and see what'shappening, not a production alternative though.


Regards,
Andreas

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Inefficient bytea escaping?

Reply via email to