Tom Lane wrote:
Andreas Pflug <[EMAIL PROTECTED]> writes:

When dumping the table with psql \copy (non-binary), the resulting file would be 6.6GB of size, taking about 5.5 minutes. Using psql \copy WITH BINARY (modified psql as posted to -patches), the time was cut down to 21-22 seconds (filesize 1.4GB as expected), which is near the physical throughput of the target disk. If server based COPY to file is used, The same factor 12 can be observed, CPU is up to 100 % (single P4 3GHz 2MB Cache HT disabled, 1GB main mem).


This is with an 8.0.x server, right?

I've tested both 8.0.5 and 8.1.4, no difference observed.

Testing a similar case with CVS HEAD, I see about a 5x speed difference,
which is right in line with the difference in the physical amount of
data written.

That's what I would have expected, apparently the data is near worst case.

  (I was testing a case where all the bytes were emitted as
'\nnn', so it's the worst case.)  oprofile says the time is being spent
in CopyAttributeOutText() and fwrite().  So I don't think there's
anything to be optimized here, as far as bytea goes: its binary
representation is just inherently a lot smaller.

Unfortunately, binary isn't the cure for all, since copying normal data with binary option might bloat that by factor two or so. I wish there was a third option that's fine for both kinds of data. That's not only a question of dump file sizes, but also of network throughput (an online compression in the line protocol would be desirable for this).


Looking at CopySendData, I wonder whether any traction could be gained
by trying not to call fwrite() once per character.  I'm not sure how
much per-call overhead there is in that function.  We've done a lot of
work trying to optimize the COPY IN path since 8.0, but nothing much
on COPY OUT ...

Hm, I'll see whether I can manage to check CVS head too, and see what's happening, not a production alternative though.

Regards,
Andreas

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to