Re: [HACKERS] Caching index AM working data across aminsert calls

2017-02-07 Thread Robert Haas
On Tue, Feb 7, 2017 at 6:04 PM, Tom Lane  wrote:
> It's always been possible for index AMs to cache data across successive
> amgettuple calls within a single SQL command: the IndexScanDesc.opaque
> field is meant for precisely that.  However, no comparable facility
> exists for amortizing setup work across successive aminsert calls.
> The attached proposed patch adds such a feature and teaches gin,
> gist, and brin to use it.  (The other standard index AMs keep everything
> they need in the relcache, so there's little to improve there.)
>
> The improvement I see from this is fairly modest in a normal build.
> In an example similar to the gin regression test's main insert query,
>
> insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 
> 100) g;
>
> the overall insertion speed increases perhaps 10%, which is nice but
> not great.  gist and brin are less, maybe 5% or so.

I think that's more than nice.  I think it's great.  It's not that
easy to squeeze 5-10% out of common operations.

(I have not reviewed the patch.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Caching index AM working data across aminsert calls

2017-02-07 Thread Tom Lane
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that.  However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
The attached proposed patch adds such a feature and teaches gin,
gist, and brin to use it.  (The other standard index AMs keep everything
they need in the relcache, so there's little to improve there.)

The improvement I see from this is fairly modest in a normal build.
In an example similar to the gin regression test's main insert query,

insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 100) 
g;

the overall insertion speed increases perhaps 10%, which is nice but
not great.  gist and brin are less, maybe 5% or so.  However, because
most of what happens in the saved work is catalog lookups, the savings
in a CLOBBER_CACHE_ALWAYS test build are pretty substantial: the runtime
of the gin regression script, on my workstation, goes from 40 minutes
to 4 seconds.  (Yes, really.)  The gist and brin test scripts are less
insert-heavy but still lose several minutes apiece.  Since the core
regression tests are run multiple times (twice serially and once in
parallel) in the standard buildfarm cycle, I estimate that this patch
would cut over 1.5 hours from the cycle time for a CLOBBER_CACHE_ALWAYS
critter running on hardware similar to mine.  I think that alone makes it
worth doing.

The reason this has been hard up to now is that the aminsert function
is not passed any useful place to cache per-statement data.  What I
chose to do in the attached is to add suitable fields to struct IndexInfo
and pass that to aminsert.  That's not widening the index AM API very
much because IndexInfo is already within the ken of ambuild.  I figured
that this might be a particularly useful way to do it because it means
that aminsert also has access to the other data in the IndexInfo struct,
which might save it having to recompute some things.  And it makes the
DDL info available to ambuild and aminsert more similar, which seems good
on general principles.

I also looked into the idea of using the index relcache entry's
rd_amcache field for this purpose, but that fails immediately in a
CLOBBER_CACHE_ALWAYS build, because gininsert (at least, probably the
other ones too) is not robust against its GinState disappearing from under
it mid-insert.  Since rd_amcache goes away on a cache flush even if the
index is open, that doesn't work.  We could maybe fix that by introducing
some way for AMs to control the lifetime of rd_amcache, but it would
result in a substantially more complex and invasive patch than this one,
and I'm unconvinced that it'd be worth the trouble.

Thoughts, objections?

regards, tom lane

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 29bc5ce..913f1f8 100644
*** a/contrib/bloom/blinsert.c
--- b/contrib/bloom/blinsert.c
*** blbuildempty(Relation index)
*** 190,196 
   */
  bool
  blinsert(Relation index, Datum *values, bool *isnull,
! 		 ItemPointer ht_ctid, Relation heapRel, IndexUniqueCheck checkUnique)
  {
  	BloomState	blstate;
  	BloomTuple *itup;
--- 190,198 
   */
  bool
  blinsert(Relation index, Datum *values, bool *isnull,
! 		 ItemPointer ht_ctid, Relation heapRel,
! 		 IndexUniqueCheck checkUnique,
! 		 IndexInfo *indexInfo)
  {
  	BloomState	blstate;
  	BloomTuple *itup;
diff --git a/contrib/bloom/bloom.h b/contrib/bloom/bloom.h
index a75a235..39d8d05 100644
*** a/contrib/bloom/bloom.h
--- b/contrib/bloom/bloom.h
*** extern bool blvalidate(Oid opclassoid);
*** 189,195 
  /* index access method interface functions */
  extern bool blinsert(Relation index, Datum *values, bool *isnull,
  		 ItemPointer ht_ctid, Relation heapRel,
! 		 IndexUniqueCheck checkUnique);
  extern IndexScanDesc blbeginscan(Relation r, int nkeys, int norderbys);
  extern int64 blgetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
  extern void blrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
--- 189,196 
  /* index access method interface functions */
  extern bool blinsert(Relation index, Datum *values, bool *isnull,
  		 ItemPointer ht_ctid, Relation heapRel,
! 		 IndexUniqueCheck checkUnique,
! 		 struct IndexInfo *indexInfo);
  extern IndexScanDesc blbeginscan(Relation r, int nkeys, int norderbys);
  extern int64 blgetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
  extern void blrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 5d8e557..9afd7f6 100644
*** a/doc/src/sgml/indexam.sgml
--- b/doc/src/sgml/indexam.sgml
*** aminsert (Relation indexRelation,
*** 259,265 
bool *isnull,
ItemPointer heap_tid,
Relation heapRelation,
!   IndexUniqueCheck checkUnique);
  
 Insert a new tuple into an existing index.