Re: [HACKERS] Speed dblink using alternate libpq tuple storage

Marko Kreen Mon, 30 Jan 2012 10:17:33 -0800

On Mon, Jan 30, 2012 at 06:06:57PM +0900, Kyotaro HORIGUCHI wrote:
> The gain of performance is more than expected. Measure script now
> does query via dblink ten times for stability of measuring, so
> the figures become about ten times longer than the previous ones.
> 
>                        sec    % to Original
> Original             : 31.5     100.0%
> RowProcessor patch   : 31.3      99.4%
> dblink patch         : 24.6      78.1%
> 
> RowProcessor patch alone makes no loss or very-little gain, and
> full patch gives us 22% gain for the benchmark(*1).


Excellent!

> - The callers of RowProcessor() no more set null_field to
>   PGrowValue.value. Plus, the PGrowValue[] which RowProcessor()
>   receives has nfields + 1 elements to be able to make rough
>   estimate by cols->value[nfields].value - cols->value[0].value -
>   something.  The somthing here is 4 * nfields for protocol3 and
>   4 * (non-null fields) for protocol2. I fear that this applies
>   only for textual transfer usage...

Excact estimate is not important here.  And (nfields + 1) elem
feels bit too much magic, considering that most users probably
do not need it.  Without it, the logic would be:

 total = last.value - first.value + ((last.len > 0) ? last.len : 0)

which isn't too complex.  So I think we can remove it.


= Problems =

* Remove the dubious memcpy() in pqAddRow()

* I think the dynamic arrays in getAnotherTuple() are not portable enough,
  please do proper allocation for array.  I guess in PQsetResultAttrs()?


= Minor notes =

These can be argued either way, if you don't like some
suggestion, you can drop it.

* Move PQregisterRowProcessor() into fe-exec.c, then we can make
  pqAddRow static.

* Should PQclear() free RowProcessor error msg?  It seems
  it should not get outside from getAnotherTuple(), but
  thats not certain.  Perhaps it would be clearer to free
  it here too.

* Remove the part of comment in getAnotherTuple():
   * Buffer content may be shifted on reloading additional
   * data. So we must set all pointers on every scan.

  It's confusing why it needs to clarify that, as there
  is nobody expecting it.

* PGrowValue documentation should mention that ->value pointer
  is always valid.

* dblink: Perhaps some of those mallocs() could be replaced
  with pallocs() or even StringInfo, which already does
  the realloc dance?  I'm not familiar with dblink, and
  various struct lifetimes there so I don't know it that
  actually makes sense or not.


It seems this patch is getting ReadyForCommitter soon...

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

Reply via email to