Re: [HACKERS] Index-only scans with btree_gist

2015-03-30 Thread Heikki Linnakangas

On 03/29/2015 04:30 AM, Andreas Karlsson wrote:

On 03/29/2015 03:25 AM, Andreas Karlsson wrote:

On 03/28/2015 09:36 PM, Andreas Karlsson wrote:

Thanks! Do you know if it is possible to add index-only scan support to
range indexes? I have never looked at those and do not know if they are
lossy.


Seems like range types are not compressed at all so implementing
index-only scans was trivial. A patch is attached.


Noticed a couple of typos, so to avoid having them sneak into the code
here is a version without them.


Thanks, pushed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans with btree_gist

2015-03-28 Thread Andreas Karlsson

On 03/29/2015 03:25 AM, Andreas Karlsson wrote:

On 03/28/2015 09:36 PM, Andreas Karlsson wrote:

Thanks! Do you know if it is possible to add index-only scan support to
range indexes? I have never looked at those and do not know if they are
lossy.


Seems like range types are not compressed at all so implementing
index-only scans was trivial. A patch is attached.


Noticed a couple of typos, so to avoid having them sneak into the code 
here is a version without them.


--
Andreas Karlsson
diff --git a/src/backend/utils/adt/rangetypes_gist.c b/src/backend/utils/adt/rangetypes_gist.c
index c1ff471..ef84121 100644
--- a/src/backend/utils/adt/rangetypes_gist.c
+++ b/src/backend/utils/adt/rangetypes_gist.c
@@ -216,7 +216,7 @@ range_gist_union(PG_FUNCTION_ARGS)
 	PG_RETURN_RANGE(result_range);
 }
 
-/* compress, decompress are no-ops */
+/* compress, decompress, fetch are no-ops */
 Datum
 range_gist_compress(PG_FUNCTION_ARGS)
 {
@@ -233,6 +233,14 @@ range_gist_decompress(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(entry);
 }
 
+Datum
+range_gist_fetch(PG_FUNCTION_ARGS)
+{
+	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+
+	PG_RETURN_POINTER(entry);
+}
+
 /*
  * GiST page split penalty function.
  *
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 037684c..8a43f64 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -235,6 +235,7 @@ DATA(insert (	3919   3831 3831 4 3878 ));
 DATA(insert (	3919   3831 3831 5 3879 ));
 DATA(insert (	3919   3831 3831 6 3880 ));
 DATA(insert (	3919   3831 3831 7 3881 ));
+DATA(insert (	3919   3831 3831 9 3996 ));
 
 
 /* gin */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index a96d369..3cd7851 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4951,6 +4951,8 @@ DATA(insert OID = 3877 (  range_gist_compress	PGNSP PGUID 12 1 0 0 0 f f f f t f
 DESCR("GiST support");
 DATA(insert OID = 3878 (  range_gist_decompress PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ range_gist_decompress _null_ _null_ _null_ ));
 DESCR("GiST support");
+DATA(insert OID = 3996 (  range_gist_fetch		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ range_gist_fetch _null_ _null_ _null_ ));
+DESCR("GiST support");
 DATA(insert OID = 3879 (  range_gist_penalty	PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ range_gist_penalty _null_ _null_ _null_ ));
 DESCR("GiST support");
 DATA(insert OID = 3880 (  range_gist_picksplit	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ range_gist_picksplit _null_ _null_ _null_ ));
diff --git a/src/include/utils/rangetypes.h b/src/include/utils/rangetypes.h
index 173bf74..43c80f4 100644
--- a/src/include/utils/rangetypes.h
+++ b/src/include/utils/rangetypes.h
@@ -209,6 +209,7 @@ extern RangeType *make_empty_range(TypeCacheEntry *typcache);
 extern Datum range_gist_consistent(PG_FUNCTION_ARGS);
 extern Datum range_gist_compress(PG_FUNCTION_ARGS);
 extern Datum range_gist_decompress(PG_FUNCTION_ARGS);
+extern Datum range_gist_fetch(PG_FUNCTION_ARGS);
 extern Datum range_gist_union(PG_FUNCTION_ARGS);
 extern Datum range_gist_penalty(PG_FUNCTION_ARGS);
 extern Datum range_gist_picksplit(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index 35d0dd3..8654e03 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -1072,6 +1072,25 @@ select count(*) from test_range_spgist where ir -|- int4range(100,500);
  5
 (1 row)
 
+-- test index-only scans
+explain (costs off)
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+   QUERY PLAN   
+
+ Sort
+   Sort Key: ir
+   ->  Index Only Scan using test_range_spgist_idx on test_range_spgist
+ Index Cond: (ir -|- '[10,20)'::int4range)
+(4 rows)
+
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+ ir 
+
+ [20,30)
+ [20,30)
+ [20,10020)
+(3 rows)
+
 RESET enable_seqscan;
 RESET enable_indexscan;
 RESET enable_bitmapscan;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index aa026ca..af13352 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -286,6 +286,11 @@ select count(*) from test_range_spgist where ir &< int4range(100,500);
 select count(*) from test_range_spgist where ir &> int4range(100,500);
 select count(*) from test_range_spgist where ir -|- int4range(100,500);
 
+-- test index-only scans
+explain (costs off)
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;

Re: [HACKERS] Index-only scans with btree_gist

2015-03-28 Thread Andreas Karlsson

On 03/28/2015 09:36 PM, Andreas Karlsson wrote:

Thanks! Do you know if it is possible to add index-only scan support to
range indexes? I have never looked at those and do not know if they are
lossy.


Seems like range types are not compressed at all so implementing 
index-only scans was trivial. A patch is attached.


--
Andreas Karlsson
diff --git a/src/backend/utils/adt/rangetypes_gist.c b/src/backend/utils/adt/rangetypes_gist.c
index c1ff471..dea8f04 100644
--- a/src/backend/utils/adt/rangetypes_gist.c
+++ b/src/backend/utils/adt/rangetypes_gist.c
@@ -216,7 +216,7 @@ range_gist_union(PG_FUNCTION_ARGS)
 	PG_RETURN_RANGE(result_range);
 }
 
-/* compress, decompress are no-ops */
+/* compress, decompress, fecth are no-ops */
 Datum
 range_gist_compress(PG_FUNCTION_ARGS)
 {
@@ -233,6 +233,14 @@ range_gist_decompress(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(entry);
 }
 
+Datum
+range_gist_fetch(PG_FUNCTION_ARGS)
+{
+	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+
+	PG_RETURN_POINTER(entry);
+}
+
 /*
  * GiST page split penalty function.
  *
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 037684c..e80c897 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -235,6 +235,7 @@ DATA(insert (	3919   3831 3831 4 3878 ));
 DATA(insert (	3919   3831 3831 5 3879 ));
 DATA(insert (	3919   3831 3831 6 3880 ));
 DATA(insert (	3919   3831 3831 7 3881 ));
+DATA(insert (   3919   3831 3831 9 3996 ));
 
 
 /* gin */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index a96d369..3cd7851 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4951,6 +4951,8 @@ DATA(insert OID = 3877 (  range_gist_compress	PGNSP PGUID 12 1 0 0 0 f f f f t f
 DESCR("GiST support");
 DATA(insert OID = 3878 (  range_gist_decompress PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ range_gist_decompress _null_ _null_ _null_ ));
 DESCR("GiST support");
+DATA(insert OID = 3996 (  range_gist_fetch		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ range_gist_fetch _null_ _null_ _null_ ));
+DESCR("GiST support");
 DATA(insert OID = 3879 (  range_gist_penalty	PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ range_gist_penalty _null_ _null_ _null_ ));
 DESCR("GiST support");
 DATA(insert OID = 3880 (  range_gist_picksplit	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ range_gist_picksplit _null_ _null_ _null_ ));
diff --git a/src/include/utils/rangetypes.h b/src/include/utils/rangetypes.h
index 173bf74..43c80f4 100644
--- a/src/include/utils/rangetypes.h
+++ b/src/include/utils/rangetypes.h
@@ -209,6 +209,7 @@ extern RangeType *make_empty_range(TypeCacheEntry *typcache);
 extern Datum range_gist_consistent(PG_FUNCTION_ARGS);
 extern Datum range_gist_compress(PG_FUNCTION_ARGS);
 extern Datum range_gist_decompress(PG_FUNCTION_ARGS);
+extern Datum range_gist_fetch(PG_FUNCTION_ARGS);
 extern Datum range_gist_union(PG_FUNCTION_ARGS);
 extern Datum range_gist_penalty(PG_FUNCTION_ARGS);
 extern Datum range_gist_picksplit(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index 35d0dd3..8654e03 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -1072,6 +1072,25 @@ select count(*) from test_range_spgist where ir -|- int4range(100,500);
  5
 (1 row)
 
+-- test index-only scans
+explain (costs off)
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+   QUERY PLAN   
+
+ Sort
+   Sort Key: ir
+   ->  Index Only Scan using test_range_spgist_idx on test_range_spgist
+ Index Cond: (ir -|- '[10,20)'::int4range)
+(4 rows)
+
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+ ir 
+
+ [20,30)
+ [20,30)
+ [20,10020)
+(3 rows)
+
 RESET enable_seqscan;
 RESET enable_indexscan;
 RESET enable_bitmapscan;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index aa026ca..af13352 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -286,6 +286,11 @@ select count(*) from test_range_spgist where ir &< int4range(100,500);
 select count(*) from test_range_spgist where ir &> int4range(100,500);
 select count(*) from test_range_spgist where ir -|- int4range(100,500);
 
+-- test index-only scans
+explain (costs off)
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+select ir from test_range_spgist where ir -|- int4range(10,20) order by ir;
+
 RESET enable_seqscan;
 RESET enable_indexscan;
 RESET enable_bitmapscan;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To 

Re: [HACKERS] Index-only scans with btree_gist

2015-03-28 Thread Andreas Karlsson

On 03/28/2015 02:12 PM, Heikki Linnakangas wrote:

Looks good to me. Committed, thanks.


Thanks! Do you know if it is possible to add index-only scan support to 
range indexes? I have never looked at those and do not know if they are 
lossy.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans with btree_gist

2015-03-28 Thread Heikki Linnakangas

On 03/28/2015 01:14 AM, Andreas Karlsson wrote:

On 03/26/2015 10:31 PM, Heikki Linnakangas wrote:

I've pushed Anastasia's patch to support index-only scans with GiST, and
it's time to add opclasses support for all the opclasses that are not
lossy. I think at least all the btree_gist opclasses need to be
supported, it would be pretty surprising if they didn't support
index-only scans, while some more complex opclasses did.

Attached is a WIP patch for that. It covers all the varlen types that
btree_gist supports, and int2, int4 and oid. The rest of the fixed-width
types should be just a matter of copy-pasting. I'll continue adding
those, but wanted to let people know I'm working on this.


Would it also be worth doing the same for the inet_ops class for
inet/cidr? I have hacked a quick WIP patch which I believe should work,
but have not looked into the index only scan code enough to be sure.


Looks good to me. Committed, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans with btree_gist

2015-03-27 Thread Andreas Karlsson

On 03/26/2015 10:31 PM, Heikki Linnakangas wrote:

I've pushed Anastasia's patch to support index-only scans with GiST, and
it's time to add opclasses support for all the opclasses that are not
lossy. I think at least all the btree_gist opclasses need to be
supported, it would be pretty surprising if they didn't support
index-only scans, while some more complex opclasses did.

Attached is a WIP patch for that. It covers all the varlen types that
btree_gist supports, and int2, int4 and oid. The rest of the fixed-width
types should be just a matter of copy-pasting. I'll continue adding
those, but wanted to let people know I'm working on this.


Would it also be worth doing the same for the inet_ops class for 
inet/cidr? I have hacked a quick WIP patch which I believe should work, 
but have not looked into the index only scan code enough to be sure.


--
Andreas Karlsson
diff --git a/src/backend/utils/adt/network_gist.c b/src/backend/utils/adt/network_gist.c
index 14dd62b..0133032 100644
--- a/src/backend/utils/adt/network_gist.c
+++ b/src/backend/utils/adt/network_gist.c
@@ -105,6 +105,27 @@ typedef struct GistInetKey
 #define SET_GK_VARSIZE(dst) \
 	SET_VARSIZE_SHORT(dst, offsetof(GistInetKey, ipaddr) + gk_ip_addrsize(dst))
 
+Datum
+inet_gist_fetch(PG_FUNCTION_ARGS)
+{
+	GISTENTRY	*entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+	GistInetKey	*key = DatumGetInetKeyP(entry->key);
+	GISTENTRY	*retval;
+	inet		*dst;
+
+	dst = (inet *) palloc0(sizeof(inet));
+
+	ip_family(dst) = gk_ip_family(key);
+	ip_bits(dst) = gk_ip_minbits(key);
+	memcpy(ip_addr(dst), gk_ip_addr(key), ip_addrsize(dst));
+	SET_INET_VARSIZE(dst);
+
+	retval = palloc(sizeof(GISTENTRY));
+	gistentryinit(*retval, InetPGetDatum(dst), entry->rel, entry->page,
+  entry->offset, FALSE);
+
+	PG_RETURN_POINTER(retval);
+}
 
 /*
  * The GiST query consistency check
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 612a9d2..3b87f90 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -411,6 +411,7 @@ DATA(insert (	3550   869	869  4 3556 ));
 DATA(insert (	3550   869	869  5 3557 ));
 DATA(insert (	3550   869	869  6 3558 ));
 DATA(insert (	3550   869	869  7 3559 ));
+DATA(insert (   3550   869  869  9 3573 ));
 
 /* sp-gist */
 DATA(insert (	3474   3831 3831 1 3469 ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 77b7717..a96d369 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2240,6 +2240,8 @@ DATA(insert OID = 3555 (  inet_gist_compress	PGNSP PGUID 12 1 0 0 0 f f f f t f
 DESCR("GiST support");
 DATA(insert OID = 3556 (  inet_gist_decompress	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ inet_gist_decompress _null_ _null_ _null_ ));
 DESCR("GiST support");
+DATA(insert OID = 3573 (  inet_gist_fetch		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ inet_gist_fetch _null_ _null_ _null_ ));
+DESCR("GiST support");
 DATA(insert OID = 3557 (  inet_gist_penalty		PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ inet_gist_penalty _null_ _null_ _null_ ));
 DESCR("GiST support");
 DATA(insert OID = 3558 (  inet_gist_picksplit	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ inet_gist_picksplit _null_ _null_ _null_ ));
diff --git a/src/include/utils/inet.h b/src/include/utils/inet.h
index 6694688..2d2bae4 100644
--- a/src/include/utils/inet.h
+++ b/src/include/utils/inet.h
@@ -123,6 +123,7 @@ extern int	bitncommon(const unsigned char *l, const unsigned char *r, int n);
 /*
  * GiST support functions in network_gist.c
  */
+extern Datum inet_gist_fetch(PG_FUNCTION_ARGS);
 extern Datum inet_gist_consistent(PG_FUNCTION_ARGS);
 extern Datum inet_gist_union(PG_FUNCTION_ARGS);
 extern Datum inet_gist_compress(PG_FUNCTION_ARGS);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-03-26 Thread Heikki Linnakangas

On 03/02/2015 12:43 AM, Heikki Linnakangas wrote:

On 02/27/2015 04:19 PM, Anastasia Lubennikova wrote:

I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.


Yeah, I think that solves it.


On further testing, there was actually still a leak with kNN-searches. 
Fixed.



I spent a little time cleaning this up. I reverted that pageData change
so that it's an array again, put back the gist_indexonly.sql and
expected output files that were missing from your latest version,
removed a couple of unused local variables that gcc complained about. I
refactored gistFetchTuple a bit, because it was doing IMHO quite bogus
things with NULLs. It was passing NULLs to the opclass' fetch function,
but it didn't pass the isNull flag correctly. I changed it so that the
fetch function is not called at all for NULLs.

I think this is pretty close to being committable. I'll make a round of
editorializing over the docs, and the code comments as well.

The opr_sanity regression test is failing, there's apparently something
wrong with the pg_proc entries of the *canreturn functions. I haven't
looked into that yet; could you fix that?


I have pushed this, after fixing the opr_sanity failure, some bug fixes, 
and documentation and comment cleanup.


Thanks for the patch!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-03-01 Thread Heikki Linnakangas

On 02/27/2015 04:19 PM, Anastasia Lubennikova wrote:

I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.


Yeah, I think that solves it.


* What's the reason for turning GISTScanOpaqueData.pageData from an array
to a List?

This array is field of structure GISTScanOpaqueData. Memory for that
structure allocated in function gistbeginscan(). The array is static so
it's declared only one time in structure:
GISTSearchHeapItem  pageData [BLCKSZ/sizeof(IndexTupleData)]

But how could we know size of array if we don't know what data would be
returned? I mean type and amount.


You're only adding a pointer to the IndexTuple to GISTSearchHeapItem. 
The GISTSearchHeapItem struct itself is still of constant size.


I spent a little time cleaning this up. I reverted that pageData change 
so that it's an array again, put back the gist_indexonly.sql and 
expected output files that were missing from your latest version, 
removed a couple of unused local variables that gcc complained about. I 
refactored gistFetchTuple a bit, because it was doing IMHO quite bogus 
things with NULLs. It was passing NULLs to the opclass' fetch function, 
but it didn't pass the isNull flag correctly. I changed it so that the 
fetch function is not called at all for NULLs.


I think this is pretty close to being committable. I'll make a round of 
editorializing over the docs, and the code comments as well.


The opr_sanity regression test is failing, there's apparently something 
wrong with the pg_proc entries of the *canreturn functions. I haven't 
looked into that yet; could you fix that?


- Heikki

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate->distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy(&(giststate->fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate->fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..80532c8 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -229,6 +229,8 @@ gistindex_keytest(IndexScanDesc scan,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
  * tuple TIDs are returned into so->pageData[].  For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so->pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so->giststate;
+	Relation r = scan->indexRelation;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -288,6 +292,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	}
 
 	so->nPageData = so->curPageData = 0;
+	if (so->pageDataCxt)
+		MemoryContextReset(so->pageDataCxt);
 
 	/*
 	 * check all tuples on page
@@ -326,10 +332,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan->numberOfOrderBys == 0 && GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so->pageData[]
+			 * Non-ordered scan, so report tuples in so->pageData[]
 			 */
 			so->pageData[so->nPageData].heapPtr = it->t_tid;
 			so->pageData[so->nPageData].recheck = recheck;
+
+			/*
+			 * If index-only scan is possible add the data fetched from index.
+			 */
+			if (scan->xs_want_itup)
+			{
+oldcxt = MemoryCont

Re: [HACKERS] Index-only scans for GiST.

2015-02-27 Thread Anastasia Lubennikova
I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.

* What's the reason for turning GISTScanOpaqueData.pageData from an array
to a List?

This array is field of structure GISTScanOpaqueData. Memory for that
structure allocated in function gistbeginscan(). The array is static so
it's declared only one time in structure:
GISTSearchHeapItem  pageData [BLCKSZ/sizeof(IndexTupleData)]

But how could we know size of array if we don't know what data would be
returned? I mean type and amount.

I asked Alexander about that and he offered me to use List instead of Array.


indexonlyscan_gist_2.2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data


test_performance.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Fabrízio de Royes Mello
On Thu, Feb 12, 2015 at 3:07 PM, Thom Brown  wrote:
>
> On 12 February 2015 at 16:40, Heikki Linnakangas 
wrote:
>>
>> On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:
>>>
>>> Thanks for answer.
>>> Now it seems to be applied correctly.
>>
>>
>> * Documentation is missing.
>
>
> Anastasia provided a documentation patch in the first email.
>

Yeah, but it's a good practice send all the patches together. Will make the
life of the reviewers better ;-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown
On 12 February 2015 at 16:40, Heikki Linnakangas 
wrote:

> On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:
>
>> Thanks for answer.
>> Now it seems to be applied correctly.
>>
>
> * Documentation is missing.
>

Anastasia provided a documentation patch in the first email.

Thom


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Heikki Linnakangas

On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

Thanks for answer.
Now it seems to be applied correctly.


Thanks, it would be great to get this completed.

This still leaks memory with the same test query as earlier. The leak 
seems to be into a different memory context now; it used to be to the 
"GiST scan context", but now it's to the ExecutorState context. See 
attached patch which makes the leak evident.


Looking at my previous comments from August:

* What's the reason for turning GISTScanOpaqueData.pageData from an
array to a List?

* Documentation is missing.

- Heikki
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 0925e56..3768c9c 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -526,6 +526,12 @@ gistgettuple(PG_FUNCTION_ARGS)
  * It's always head of so->pageData
  */
 so->pageData = list_delete_first(so->pageData);
+{
+	static int lastreport = 0;
+	if ((lastreport++) % 1 == 0)
+		MemoryContextStats(CurrentMemoryContext);
+}
+
 PG_RETURN_BOOL(TRUE);
 			}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown
On 12 February 2015 at 10:40, Anastasia Lubennikova  wrote:

> Thanks for answer.
> Now it seems to be applied correctly.
>

(please avoid top-posting)

Thanks for the updated patch.  I can confirm that it now cleanly applies
and compiles fine.  I've run the tested in the SQL file you provided, and
it's behaving as expected.

Thom


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Anastasia Lubennikova
Thanks for answer.
Now it seems to be applied correctly.

2015-02-12 3:12 GMT+04:00 Thom Brown :

> On 11 February 2015 at 22:50, Anastasia Lubennikova <
> lubennikov...@gmail.com> wrote:
>
>> Finally there is a new version of patch (in attachments).
>> It provides multicolumn index-only scan for GiST indexes.
>>
>> - Memory leak is fixed.
>> - little code cleanup
>> - example of performance test in attachmens
>> - function OIDs have debugging values (*) just to avoid merge
>> conflicts while testing patch
>>
>> Wiki page of the project is
>>
>> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>>
>> Waiting for feedback.
>>
>
> Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
> to head and it doesn't appear to apply cleanly.
>
> $ patch -p1 < ~/Downloads/indexonlyscan_gist_2.0.patch
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/access/gist/gist.c
> Hunk #1 succeeded at 1404 (offset 9 lines).
> Hunk #2 succeeded at 1434 (offset 9 lines).
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/access/gist/gistget.c
> Hunk #1 succeeded at 227 (offset 1 line).
> Hunk #2 succeeded at 243 (offset 1 line).
> Hunk #3 succeeded at 293 (offset -4 lines).
> Hunk #4 succeeded at 330 (offset -4 lines).
> Hunk #5 succeeded at 365 (offset -5 lines).
> Hunk #6 succeeded at 444 (offset -27 lines).
> Hunk #7 succeeded at 474 (offset -27 lines).
> Hunk #8 FAILED at 518.
> Hunk #9 succeeded at 507 (offset -28 lines).
> Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
> Hunk #11 FAILED at 601.
> 2 out of 11 hunks FAILED -- saving rejects to file
> src/backend/access/gist/gistget.c.rej
> ...
>
> --
> Thom
>



-- 
Best regards,
Lubennikova Anastasia
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate->distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy(&(giststate->fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate->fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..0925e56 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -227,8 +227,10 @@ gistindex_keytest(IndexScanDesc scan,
  * If tbm/ntids aren't NULL, we are doing an amgetbitmap scan, and heap
  * tuples should be reported directly into the bitmap.  If they are NULL,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
- * tuple TIDs are returned into so->pageData[].  For an ordered indexscan,
+ * tuple TIDs are returned into so->pageData. For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so->pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so->giststate;
+	Relation r = scan->indexRelation;
+	boolisnull[INDEX_MAX_KEYS];
+	GISTSearchHeapItem *tmpListItem;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -287,8 +293,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	so->nPageData = so->curPageData = 0;
-
 	/*
 	 * check all tuples on page
 	 */
@@ -326,11 +330,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan->numberOfOrderBys == 0 && GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so->pageData[]
+			 * Non-ordered scan, so rep

Re: [HACKERS] Index-only scans for GiST.

2015-02-11 Thread Thom Brown
On 11 February 2015 at 22:50, Anastasia Lubennikova  wrote:

> Finally there is a new version of patch (in attachments).
> It provides multicolumn index-only scan for GiST indexes.
>
> - Memory leak is fixed.
> - little code cleanup
> - example of performance test in attachmens
> - function OIDs have debugging values (*) just to avoid merge
> conflicts while testing patch
>
> Wiki page of the project is
>
> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>
> Waiting for feedback.
>

Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
to head and it doesn't appear to apply cleanly.

$ patch -p1 < ~/Downloads/indexonlyscan_gist_2.0.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/access/gist/gist.c
Hunk #1 succeeded at 1404 (offset 9 lines).
Hunk #2 succeeded at 1434 (offset 9 lines).
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/access/gist/gistget.c
Hunk #1 succeeded at 227 (offset 1 line).
Hunk #2 succeeded at 243 (offset 1 line).
Hunk #3 succeeded at 293 (offset -4 lines).
Hunk #4 succeeded at 330 (offset -4 lines).
Hunk #5 succeeded at 365 (offset -5 lines).
Hunk #6 succeeded at 444 (offset -27 lines).
Hunk #7 succeeded at 474 (offset -27 lines).
Hunk #8 FAILED at 518.
Hunk #9 succeeded at 507 (offset -28 lines).
Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
Hunk #11 FAILED at 601.
2 out of 11 hunks FAILED -- saving rejects to file
src/backend/access/gist/gistget.c.rej
...

-- 
Thom


Re: [HACKERS] Index-only scans for GIST

2014-10-27 Thread Thom Brown
On 18 August 2014 09:05, Heikki Linnakangas  wrote:

> On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:
>
>> 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas :
>>
>> * I'm getting two regression failures with this (opr_sanity and join).
>>
>>>
>>>
>> opr_sanity failure is corrected.
>> But there is remain question with join.
>> I check the latest version of my github repo and there's no fail in join
>> regression test
>> All 145 tests passed.
>> To tell the truth, I don't understand which changes could led to this
>> failure.
>> Could you show me regression diffs?
>>
>
> Sure, here you go. It seems like a change in a plan. At a quick glance it
> seems harmless: the new plan is identical except that the left and right
> side of a join have been reversed. But I don't understand either why this
> patch would change that, so it needs to be investigated.
>
>  * The regression test queries that use LIMIT are not guaranteed to always
>>
>>> return the same rows, hence they're not very good regression test cases.
>>> I'd suggest using more restricting WHERE clauses, so that each query only
>>> returns a handful of rows.
>>>
>>
>> Thank you for comment, I rewrote wrong queries. But could you explain why
>> LIMIT queries may return different results? Is it happens because of
>> different query optimization?
>>
>
> Imagine that you have a table with two rows, A and B. If you run a query
> like "SELECT * FROM table LIMIT 1", the system can legally return either
> row A or B, because there's no ORDER BY.
>
> Now, admittedly you have a similar problem even without the LIMIT - the
> system can legally return the rows in either order - but it's less of an
> issue because at least you can quickly see from the diff that the result
> set is in fact the same, the rows are just in different order. You could
> fix that by adding an ORDER BY to all test queries, but we haven't done
> that in the regression suite because then we would not have any test
> coverage for cases where you don't have an ORDER BY. As a compromise, test
> cases are usually written without an ORDER BY, but if e.g. the buildfarm
> starts failing because of differences in the result set order across
> platforms, then we add an ORDER BY to make it stable.
>
>  * I think it's leaking memory, in GIST scan context. I tested this with a
>>
>>> variant of the regression tests:
>>>
>>> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
>>> 0.05*i)),
>>>   point(0.05*i, 0.05*i) FROM generate_series(0,
>>> 1000) as i;
>>> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);
>>>
>>> set enable_seqscan=off;
>>> set enable_bitmapscan=off;
>>>
>>> explain analyze  select p from gist_tbl where p <@ box(point(0,0),
>>> point(999,999)) and length(p::text) < 10;
>>>
>>> while the final query runs, 'top' shows constantly increasing memory
>>> usage.
>>>
>>
>> I don't think it's memory leak. After some increasing, memory using remain
>> the same. It works similar without using indexonlyscan.
>>
>
> No, it's definitely a leak caused by the patch. Test with the attached
> patch, which prints out to the server log the amount of memory used by the
> GiST scan memory context every 1 rows. It clearly shows increasing
> memory usage all the way to the end of the query.
>
> It's cleaned up at the end of the query, but that's not good enough
> because for a large query you might accumulate gigabytes of leaked memory
> until the query has finished. If you (manually) apply the same patch to git
> master, you'll see that the memory usage stays consistent and small.
>

Hi Anastasia,

Do you have time to address the issues brought up in Heikki's review?  It
would be good if we could your work into PostgreSQL 9.5.

Thanks

Thom


Re: [HACKERS] Index-only scans for GIST

2014-08-18 Thread Heikki Linnakangas

On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:

2014-08-07 0:30 GMT+04:00 Heikki Linnakangas :

* I'm getting two regression failures with this (opr_sanity and join).




opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?


Sure, here you go. It seems like a change in a plan. At a quick glance 
it seems harmless: the new plan is identical except that the left and 
right side of a join have been reversed. But I don't understand either 
why this patch would change that, so it needs to be investigated.



* The regression test queries that use LIMIT are not guaranteed to always

return the same rows, hence they're not very good regression test cases.
I'd suggest using more restricting WHERE clauses, so that each query only
returns a handful of rows.


Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?


Imagine that you have a table with two rows, A and B. If you run a query 
like "SELECT * FROM table LIMIT 1", the system can legally return either 
row A or B, because there's no ORDER BY.


Now, admittedly you have a similar problem even without the LIMIT - the 
system can legally return the rows in either order - but it's less of an 
issue because at least you can quickly see from the diff that the result 
set is in fact the same, the rows are just in different order. You could 
fix that by adding an ORDER BY to all test queries, but we haven't done 
that in the regression suite because then we would not have any test 
coverage for cases where you don't have an ORDER BY. As a compromise, 
test cases are usually written without an ORDER BY, but if e.g. the 
buildfarm starts failing because of differences in the result set order 
across platforms, then we add an ORDER BY to make it stable.



* I think it's leaking memory, in GIST scan context. I tested this with a

variant of the regression tests:

insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
0.05*i)),
  point(0.05*i, 0.05*i) FROM generate_series(0,
1000) as i;
CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p <@ box(point(0,0),
point(999,999)) and length(p::text) < 10;

while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.


No, it's definitely a leak caused by the patch. Test with the attached 
patch, which prints out to the server log the amount of memory used by 
the GiST scan memory context every 1 rows. It clearly shows 
increasing memory usage all the way to the end of the query.


It's cleaned up at the end of the query, but that's not good enough 
because for a large query you might accumulate gigabytes of leaked 
memory until the query has finished. If you (manually) apply the same 
patch to git master, you'll see that the memory usage stays consistent 
and small.


- Heikki

*** /home/heikki/git-sandbox/postgresql/src/test/regress/expected/join.out  
2014-08-18 09:38:55.146171394 +0300
--- /home/heikki/git-sandbox/postgresql/src/test/regress/results/join.out   
2014-08-18 10:28:30.326491898 +0300
***
*** 2579,2590 
  ---
   Sort
 Sort Key: t1.q1, t1.q2
!->  Hash Left Join
!  Hash Cond: (t1.q2 = t2.q1)
   Filter: (1 = (SubPlan 1))
!  ->  Seq Scan on int8_tbl t1
   ->  Hash
!->  Seq Scan on int8_tbl t2
   SubPlan 1
 ->  Limit
   ->  Result
--- 2579,2590 
  ---
   Sort
 Sort Key: t1.q1, t1.q2
!->  Hash Right Join
!  Hash Cond: (t2.q1 = t1.q2)
   Filter: (1 = (SubPlan 1))
!  ->  Seq Scan on int8_tbl t2
   ->  Hash
!->  Seq Scan on int8_tbl t1
   SubPlan 1
 ->  Limit
   ->  Result
***
*** 3589,3603 
lateral (values(x.q1,y.q1,y.q2)) v(xq1,yq1,yq2);
  q1|q2 |q1|q2 
|   xq1|   yq1|yq2
  
--+---+--+---+--+--+---
-   123 |   456 |  |   
|  123 |  |  
-   123 |  4567890123456789 | 4567890123456789 | -4567

Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
Updated patch
* Compiler, merge and regression fails checked
* Regression tests was impoved
* GiST and amcanreturn docs updated
-- 
Best regards,
Lubennikova Anastasia


indexonlyscan_gist2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas :

* I'm getting two regression failures with this (opr_sanity and join).
>

opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?
I want to understand what's wrong with the patch.

* The regression test queries that use LIMIT are not guaranteed to always
> return the same rows, hence they're not very good regression test cases.
> I'd suggest using more restricting WHERE clauses, so that each query only
> returns a handful of rows.
>

Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?

* I think it's leaking memory, in GIST scan context. I tested this with a
> variant of the regression tests:
>
> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
> 0.05*i)),
>  point(0.05*i, 0.05*i) FROM generate_series(0,
> 1000) as i;
> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);
>
> set enable_seqscan=off;
> set enable_bitmapscan=off;
>
> explain analyze  select p from gist_tbl where p <@ box(point(0,0),
> point(999,999)) and length(p::text) < 10;
>
> while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GIST

2014-08-06 Thread Heikki Linnakangas

On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote:

Hi, hackers!
I work on a GSoC project "Index-only scans for GIST"
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.


Thanks!

Some comments:

* I got this compiler warning:

gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]

 ListCell *tmpPageData = so->curPageData;
 ^

* I'm getting two regression failures with this (opr_sanity and join).

* After merging with master, build fails because of duplicate OIDs.

* The regression test queries that use LIMIT are not guaranteed to 
always return the same rows, hence they're not very good regression test 
cases. I'd suggest using more restricting WHERE clauses, so that each 
query only returns a handful of rows.


* What's the reason for turning GISTScanOpaqueData.pageData from an 
array to a List?


* I think it's leaking memory, in GIST scan context. I tested this with 
a variant of the regression tests:


insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 
0.05*i)),
 point(0.05*i, 0.05*i) FROM generate_series(0, 
1000) as i;

CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p <@ box(point(0,0), 
point(999,999)) and length(p::text) < 10;


while the final query runs, 'top' shows constantly increasing memory usage.


It includes index-only scans for multicolumn GIST and new regression test.
Fetch() method is realized for box and point opclasses.


Can we have Fetch functions for all the datatypes in btree_gist contrib 
module, please? Do other contrib modules contain GiST opclasses that 
could have Fetch functions?



Documentation is not updated yet, but I'm going to do it till the end of
GSoC.

I've got one question about query with OR condition. It is the last query
in regression test "gist_indexonly". It doesn't fail but it doensn't use
index-only scans. Could someone explain to me how it works?
It seems to depend on build_paths_for_OR

function.
But I couldn't understand how.


The query is:

select * from gist_tbl
where b <@ box(point(5,5),  point(6,6))
or p <@ box(point(0,0),  point(100,100)) limit 10;

It cannot use an index(-only) scan for this, because a single index scan 
can only return rows based on one key. In this case, you need to do two 
scans, and then return the rows returned by either scan, removing 
duplicates. A bitmap scan is possible, because it can remove the 
duplicates, but the planner can't produce a plain index scan plan that 
would do the same.


A common trick when that happens in a real-world application is to 
re-write the query using UNION:


select * from gist_tbl
where b <@ box(point(5,5),  point(6,6))
UNION
select * from gist_tbl
where p <@ box(point(0,0),  point(100,100))
limit 10;

Although that doesn't seem to actually work:

ERROR:  could not identify an equality operator for type box
LINE 1: select * from gist_tbl
   ^

but that's not your patch's fault, the same happens with unpatched master.

IOW, you don't need to worry about that case.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-08-01 Thread Anastasia Lubennikova
Thank you for comment
Patch is already added in Performance topic.


2014-08-01 20:25 GMT+04:00 Fabrízio de Royes Mello 
:

>
> On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova <
> lubennikov...@gmail.com> wrote:
> >
> > Hi, hackers!
> > I work on a GSoC project "Index-only scans for GIST"
> >
> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
> >
> > Repository is
> > https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> > Patch is in attachments.
> >
> > It includes index-only scans for multicolumn GIST and new regression
> test.
> > Fetch() method is realized for box and point opclasses.
> >
> > Documentation is not updated yet, but I'm going to do it till the end of
> GSoC.
> >
> > I've got one question about query with OR condition. It is the last
> query in regression test "gist_indexonly". It doesn't fail but it doensn't
> use index-only scans. Could someone explain to me how it works?
> > It seems to depend on build_paths_for_OR function. But I couldn't
> understand how.
> >
>
> Very nice... please add your patch to the next commit fest [1].
>
> Regards,
>
>
> [1] https://commitfest.postgresql.org/action/commitfest_view?id=23
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira: http://www.timbira.com.br
> >> Blog sobre TI: http://fabriziomello.blogspot.com
> >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
>



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GIST

2014-08-01 Thread Fabrízio de Royes Mello
On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova <
lubennikov...@gmail.com> wrote:
>
> Hi, hackers!
> I work on a GSoC project "Index-only scans for GIST"
>
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>
> Repository is
> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> Patch is in attachments.
>
> It includes index-only scans for multicolumn GIST and new regression test.
> Fetch() method is realized for box and point opclasses.
>
> Documentation is not updated yet, but I'm going to do it till the end of
GSoC.
>
> I've got one question about query with OR condition. It is the last query
in regression test "gist_indexonly". It doesn't fail but it doensn't use
index-only scans. Could someone explain to me how it works?
> It seems to depend on build_paths_for_OR function. But I couldn't
understand how.
>

Very nice... please add your patch to the next commit fest [1].

Regards,


[1] https://commitfest.postgresql.org/action/commitfest_view?id=23

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-23 Thread Emre Hasegeli
> That seems like a nonstarter :-(.  Index-only scans don't have a license
> to return approximations.  If we drop the behavior for circles, how much
> functionality do we have left?

It should work with exact operator classes, box_ops, point_ops,
range_ops, inet_ops.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Tom Lane
Heikki Linnakangas  writes:
> For a circle, the GiST index stores a bounding box of the circle. The 
> new fetch function reverses that, calculating the radius and center of 
> the circle from the bounding box.

> Those conversions lose some precision due to rounding. Are we okay with 
> that?

That seems like a nonstarter :-(.  Index-only scans don't have a license
to return approximations.  If we drop the behavior for circles, how much
functionality do we have left?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Robert Haas
On Tue, Jul 22, 2014 at 2:55 AM, Heikki Linnakangas
 wrote:
> On 07/21/2014 10:47 PM, Anastasia Lubennikova wrote:
>>
>> Hi, hackers!
>> There are new results of my work on GSoC project "Index-only scans for
>> GIST".
>> Previous post is here:
>>
>> http://postgresql.1045698.n5.nabble.com/Index-only-scans-for-GIST-td5804892.html
>>
>> Repository is
>> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
>> Patch is in attachments.
>> It includes indexonlyscan for multicolumn GiST. It works correctly -
>> results are the same with another scan plans.
>> Fetch() method is realized for box and circle opclasses
>> Improvement for circle opclass is not such distinct as for box opclass,
>> because of recheck.
>
>
> For a circle, the GiST index stores a bounding box of the circle. The new
> fetch function reverses that, calculating the radius and center of the
> circle from the bounding box.
>
> Those conversions lose some precision due to rounding. Are we okay with
> that? Floating point math is always subject to rounding errors, but there's
> a good argument to be made that it's not acceptable to get a different value
> back when the user didn't explicitly invoke any math functions.
>
> As an example:
>
> create table circle_tbl (c circle);
> create index circle_tbl_idx on circle_tbl using gist (c);
> insert into circle_tbl values ('1.23456789012345,1.23456789012345,1e300');
>
> postgres=# set enable_seqscan=off; set enable_bitmapscan=off; set
> enable_indexonlyscan=on;
> SET
> SET
> SET
> postgres=# select * from circle_tbl ;
>c
> 
>  <(0,0),1e+300>
> (1 row)
>
> postgres=# set enable_indexonlyscan=off;
> SET
> postgres=# select * from circle_tbl ;
>   c
> --
>  <(1.23456789012345,1.23456789012345),1e+300>
> (1 row)

I really don't think it's ever OK for a query to produce different
answers depending on which plan is chosen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-21 Thread Heikki Linnakangas

On 07/21/2014 10:47 PM, Anastasia Lubennikova wrote:

Hi, hackers!
There are new results of my work on GSoC project "Index-only scans for
GIST".
Previous post is here:
http://postgresql.1045698.n5.nabble.com/Index-only-scans-for-GIST-td5804892.html

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.
It includes indexonlyscan for multicolumn GiST. It works correctly -
results are the same with another scan plans.
Fetch() method is realized for box and circle opclasses
Improvement for circle opclass is not such distinct as for box opclass,
because of recheck.


For a circle, the GiST index stores a bounding box of the circle. The 
new fetch function reverses that, calculating the radius and center of 
the circle from the bounding box.


Those conversions lose some precision due to rounding. Are we okay with 
that? Floating point math is always subject to rounding errors, but 
there's a good argument to be made that it's not acceptable to get a 
different value back when the user didn't explicitly invoke any math 
functions.


As an example:

create table circle_tbl (c circle);
create index circle_tbl_idx on circle_tbl using gist (c);
insert into circle_tbl values ('1.23456789012345,1.23456789012345,1e300');

postgres=# set enable_seqscan=off; set enable_bitmapscan=off; set 
enable_indexonlyscan=on;

SET
SET
SET
postgres=# select * from circle_tbl ;
   c

 <(0,0),1e+300>
(1 row)

postgres=# set enable_indexonlyscan=off;
SET
postgres=# select * from circle_tbl ;
  c
--
 <(1.23456789012345,1.23456789012345),1e+300>
(1 row)

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Ryan Johnson

On 27/06/2014 8:20 AM, Alvaro Herrera wrote:

Ryan Johnson wrote:

On 26/06/2014 11:04 PM, Alvaro Herrera wrote:

Ryan Johnson wrote:

As part of a research project, I'm trying to change Read Committed
isolation to use HeapTupleSatisfiesNow rather than acquiring a new
snapshot at every command [1].

Are you aware of this?

commit 813fb0315587d32e3b77af1051a0ef517d187763
Author: Robert Haas 
Date:   Thu Aug 1 10:46:19 2013 -0400

 Remove SnapshotNow and HeapTupleSatisfiesNow.

That would be wonderful news... if snapshots weren't so darned
expensive to create.

I take it you aren't aware of this other effort, either:
http://archives.postgresql.org/message-id/539ad153.9000...@vmware.com
That is good news, though from reading the thread it sounds like proc 
array accesses are being exchanged for accesses to an SLRU, so a lot of 
lwlock calls will remain. It will definitely help, though. SLRU will get 
ex-locked a lot less often, so the main source of contention will be for 
the actual lwlock acquire/release operations.


Regards,
Ryan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Andres Freund
On 2014-06-27 08:39:13 -0600, Ryan Johnson wrote:
> On 27/06/2014 3:14 AM, Andres Freund wrote:
> >On 2014-06-26 22:47:47 -0600, Ryan Johnson wrote:
> >>Hi,
> >>
> >>As part of a research project, I'm trying to change Read Committed isolation
> >>to use HeapTupleSatisfiesNow rather than acquiring a new snapshot at every
> >>command [1]. Things appear to have gone reasonably well so far, except
> >>certain queries fail with "ERROR:  non-MVCC snapshots are not supported in
> >>index-only scans."
> >You're aware that unless you employ additional locking you can simply
> >miss individual rows or see them several times because of concurrent
> >updates?
> >The reason it has worked (< 9.4) for system catalogs is that updates of
> >rows were only performed while objects were locked access exclusively -
> >that's the reason why some places in the code use inplace updates btw...

> Yes, I was aware of the need for locking. The documentation just made it
> sound that locking was already in place for non-MVCC index scans. I was
> hoping I'd missed some easy way to activate it.

Well, it is/was for the places (i.e. DDL) that actually use non-MVCC
scans.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Ryan Johnson

On 27/06/2014 3:14 AM, Andres Freund wrote:

On 2014-06-26 22:47:47 -0600, Ryan Johnson wrote:

Hi,

As part of a research project, I'm trying to change Read Committed isolation
to use HeapTupleSatisfiesNow rather than acquiring a new snapshot at every
command [1]. Things appear to have gone reasonably well so far, except
certain queries fail with "ERROR:  non-MVCC snapshots are not supported in
index-only scans."

You're aware that unless you employ additional locking you can simply
miss individual rows or see them several times because of concurrent
updates?
The reason it has worked (< 9.4) for system catalogs is that updates of
rows were only performed while objects were locked access exclusively -
that's the reason why some places in the code use inplace updates btw...
Yes, I was aware of the need for locking. The documentation just made it 
sound that locking was already in place for non-MVCC index scans. I was 
hoping I'd missed some easy way to activate it.


Regards,
Ryan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Alvaro Herrera
Ryan Johnson wrote:
> On 26/06/2014 11:04 PM, Alvaro Herrera wrote:
> >Ryan Johnson wrote:
> >>As part of a research project, I'm trying to change Read Committed
> >>isolation to use HeapTupleSatisfiesNow rather than acquiring a new
> >>snapshot at every command [1].
> >Are you aware of this?
> >
> >commit 813fb0315587d32e3b77af1051a0ef517d187763
> >Author: Robert Haas 
> >Date:   Thu Aug 1 10:46:19 2013 -0400
> >
> > Remove SnapshotNow and HeapTupleSatisfiesNow.
>
> That would be wonderful news... if snapshots weren't so darned
> expensive to create.

I take it you aren't aware of this other effort, either:
http://archives.postgresql.org/message-id/539ad153.9000...@vmware.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Andres Freund
On 2014-06-26 22:47:47 -0600, Ryan Johnson wrote:
> Hi,
> 
> As part of a research project, I'm trying to change Read Committed isolation
> to use HeapTupleSatisfiesNow rather than acquiring a new snapshot at every
> command [1]. Things appear to have gone reasonably well so far, except
> certain queries fail with "ERROR:  non-MVCC snapshots are not supported in
> index-only scans."

You're aware that unless you employ additional locking you can simply
miss individual rows or see them several times because of concurrent
updates?
The reason it has worked (< 9.4) for system catalogs is that updates of
rows were only performed while objects were locked access exclusively -
that's the reason why some places in the code use inplace updates btw...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-26 Thread Ryan Johnson

On 26/06/2014 11:04 PM, Alvaro Herrera wrote:

Ryan Johnson wrote:

As part of a research project, I'm trying to change Read Committed
isolation to use HeapTupleSatisfiesNow rather than acquiring a new
snapshot at every command [1].

Are you aware of this?

commit 813fb0315587d32e3b77af1051a0ef517d187763
Author: Robert Haas 
Date:   Thu Aug 1 10:46:19 2013 -0400

 Remove SnapshotNow and HeapTupleSatisfiesNow.
That would be wonderful news... if snapshots weren't so darned expensive 
to create.


I guess there's no avoiding that bottleneck now, though.

Ryan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-26 Thread Alvaro Herrera
Ryan Johnson wrote:

> As part of a research project, I'm trying to change Read Committed
> isolation to use HeapTupleSatisfiesNow rather than acquiring a new
> snapshot at every command [1].

Are you aware of this?

commit 813fb0315587d32e3b77af1051a0ef517d187763
Author: Robert Haas 
Date:   Thu Aug 1 10:46:19 2013 -0400

Remove SnapshotNow and HeapTupleSatisfiesNow.

We now use MVCC catalog scans, and, per discussion, have eliminated
all other remaining uses of SnapshotNow, so that we can now get rid of
it.  This will break third-party code which is still using it, which
is intentional, as we want such code to be updated to do things the
new way.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-05-29 Thread Robert Haas
On Sun, May 25, 2014 at 6:12 AM, Anastasia Lubennikova
 wrote:
> Hi, hackers!
> There are first results of my work on GSoC project "Index-only scans for
> GIST".

Cool.

> 1. Version of my project code is in forked repository
> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> Patch is in attachments
> - This version is only for one-column indexes

That's probably a limitation that needs to be fixed before this can be
committed.

> - fetch() method is realized only for box opclass (because it's trivial)

That might not need to be fixed before this can be committed.

> 2. I test Index-only scans with SQL script box_test.sql
> and it works really faster. (results in box_test_out)
>
> I'll be glad to get your feedback about this feature.

Since this is a GSoC project, it would be nice if one of the people
who is knowledgeable about GIST (Heikki, Alexander, etc.) could weigh
in on this before too much time goes by, so that Anastasia can press
forward with this work.

I don't know enough to offer too many substantive comments, but I
think you should remove all of the //-style comments (most of which
are debugging leftovers) and add some more comments describing what
you're actually doing and, more importantly, why.

This comment doesn't appear to make sense:

+   /*
+* The offset number on tuples on internal pages is unused.
For historical
+* reasons, it is set 0x.
+*/

The reason this doesn't make sense is because the tuple in question is
not on an internal page, or indeed any page at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scans wiki page

2012-11-15 Thread Peter Geoghegan
On 13 November 2012 01:25, Peter Geoghegan  wrote:
> Attached is a piece I wrote on the feature. That might form the basis
> of a new wiki page.

Since no one else moved on this, I've completely replaced the existing
page with the contents of the user-orientated document I posted. I
don't believe that any of the information that has been removed was of
general interest, since it only existed as a developer-orientated
page. If anyone thinks that I shouldn't have removed everything that
was there, or indeed who would like to change what I've added, well,
it's a wiki.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scans wiki page

2012-11-13 Thread Peter Geoghegan
On 13 November 2012 16:37, Robert Haas  wrote:
> I found this an interesting read.  As one of the people who worked on
> the feature, I'm sort of curious whether people have any experience
> yet with how this actually shakes out in the field.  Are you (or is
> anyone) aware of positive/negative field experiences with this
> feature?

Unfortunately, I don't think that I have any original insight about
the problems with index-only scans in the field right now.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scans wiki page

2012-11-13 Thread Peter Geoghegan
On 13 November 2012 16:37, Robert Haas  wrote:
> I found this an interesting read.  As one of the people who worked on
> the feature, I'm sort of curious whether people have any experience
> yet with how this actually shakes out in the field.  Are you (or is
> anyone) aware of positive/negative field experiences with this
> feature?

Unfortunately, I don't think that I have any original insight about
the problems with index-only scans in the field right now.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scans wiki page

2012-11-13 Thread Robert Haas
On Mon, Nov 12, 2012 at 8:25 PM, Peter Geoghegan  wrote:
> On 13 November 2012 01:03, Jeff Janes  wrote:
>> https://wiki.postgresql.org/wiki/Index-only_scans
>>
>> This page is seriously out of date.  It suggests they are not yet
>> implemented, but only being talked about.
>
> Attached is a piece I wrote on the feature. That might form the basis
> of a new wiki page. Feel free to incorporate this material as you see
> fit.

I found this an interesting read.  As one of the people who worked on
the feature, I'm sort of curious whether people have any experience
yet with how this actually shakes out in the field.  Are you (or is
anyone) aware of positive/negative field experiences with this
feature?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scans wiki page

2012-11-12 Thread Peter Geoghegan
On 13 November 2012 01:03, Jeff Janes  wrote:
> https://wiki.postgresql.org/wiki/Index-only_scans
>
> This page is seriously out of date.  It suggests they are not yet
> implemented, but only being talked about.

Attached is a piece I wrote on the feature. That might form the basis
of a new wiki page. Feel free to incorporate this material as you see
fit.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


index_only_scans.rst
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans versus serializable transactions

2012-09-04 Thread Kevin Grittner
"Kevin Grittner"  wrote:
 
> New version attached. Will apply if no further problems are found.
 
Pushed to master and REL9_2_STABLE.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans versus serializable transactions

2012-09-04 Thread Kevin Grittner
Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> By not visiting the heap page for tuples, index-only scans fail to
>> acquire all of the necessary predicate locks for correct behavior
>> at the serializable transaction isolation level. The tag for the
>> tuple-level predicate locks includes the xmin, to avoid possible
>> problems with tid re-use. (This was not covered in initial
>> pre-release versions of SSI, and testing actually hit the
>> problem.)  When an "index-only" scan does need to look at the heap
>> because the visibility map doesn't indicate that the tuple is
>> visible to all transactions, the tuple-level predicate lock is
>> acquired. The best we can do without visiting the heap is a page
>> level lock on the heap page, so that is what the attached patch
>> does.
> 
>> If there are no objections, I will apply to HEAD and 9.2.
> 
> This isn't right in detail: there are paths through the loop where
> "tuple" is not NULL at the beginning of the next iteration
> (specifically, consider failure of a lossy-qual recheck). I think
> that only results in wasted work, but it's still not operating as
> intended. I'd suggest moving the declaration/initialization of the
> "tuple" variable to inside the while loop, since there's no desire
> for its value to carry across loops.
 
You're right.  It looks to me like moving the declaration (and
initialization) to more local scape (just inside the loop) fixes it.
 
New version attached.  Will apply if no further problems are found.
 
-Kevin




index-only-serializable-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans versus serializable transactions

2012-09-03 Thread Tom Lane
"Kevin Grittner"  writes:
> By not visiting the heap page for tuples, index-only scans fail to
> acquire all of the necessary predicate locks for correct behavior at
> the serializable transaction isolation level.  The tag for the
> tuple-level predicate locks includes the xmin, to avoid possible
> problems with tid re-use.  (This was not covered in initial
> pre-release versions of SSI, and testing actually hit the problem.) 
> When an "index-only" scan does need to look at the heap because the
> visibility map doesn't indicate that the tuple is visible to all
> transactions, the tuple-level predicate lock is acquired.  The best
> we can do without visiting the heap is a page level lock on the heap
> page, so that is what the attached patch does.

> If there are no objections, I will apply to HEAD and 9.2.

This isn't right in detail: there are paths through the loop where
"tuple" is not NULL at the beginning of the next iteration
(specifically, consider failure of a lossy-qual recheck).  I think
that only results in wasted work, but it's still not operating as
intended.  I'd suggest moving the declaration/initialization of the
"tuple" variable to inside the while loop, since there's no desire
for its value to carry across loops.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-05-04 Thread Simon Riggs
On 2 May 2012 13:41, Robert Haas  wrote:

> So on further reflection I'm thinking it may be best just to stick
> with a hard conflict for now and see what feedback we get from beta
> testers.

Which is what I was expecting y'all to conclude once you'd looked at
the task in more detail.

And I'm happy with the concept of beta being a period where we listen
to feedback, not just bugs, and decide whether further refinements are
required.

What I think is possible is to alter the conflict as it hits the
backend. If a backend has enable_indexonly = off then it wouldn't be
affected by those conflicts anyhow. So if we simply record whether we
are using an index only plan then we'll know whether to ignore it or
abort. I think we can handle that by marking the snapshot at executor
startup time. Needs a few other pushups but not that hard.

The likelihood is that SQL that uses index only won't run for long
enough to be cancelled anyway.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-05-02 Thread Robert Haas
On Thu, Apr 26, 2012 at 8:03 PM, Robert Haas  wrote:
> So, as a first step, I've committed a patch that just throws a hard
> conflict.  I think we probably want to optimize this further, and I'm
> going to work investigate that next.  But it seemed productive to get
> this much out of the way first, so I did.

I've been thinking about this some more.  What's worrying me is that a
visibility conflict, however we implement it, could be *worse* from
the user's point of view than just killing the query.  After all,
there's a reasonable likelihood that a single visibility map page
covers the whole relation (or all the blocks that the user is
interested in), so any sort of conflict is basically going to turn the
index-only scan into an index-scan plus some extra overhead.  And if
the planner had known that the user was going to get an index-only
scan rather than just a plain index scan, it might well have picked
some other plan in the first place.

Another problem is that, if we add a test for visibility conflicts
into visibilitymap_test(), I'm afraid we're going to drive up the cost
of that function very significantly.  Previous testing suggests that
that efficiency or lack thereof of that function is already a
performance problem for index-only scans, which kinda makes me not
that excited about adding another branch in there somewhere (and even
less excited about any proposed implementation that would add an
lwlock acquire/release or similar).

So on further reflection I'm thinking it may be best just to stick
with a hard conflict for now and see what feedback we get from beta
testers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-26 Thread Robert Haas
On Mon, Apr 16, 2012 at 4:13 PM, Robert Haas  wrote:
> But fundamentally we all seem to be converging on some variant of the
> "soft conflict" idea.

So, as a first step, I've committed a patch that just throws a hard
conflict.  I think we probably want to optimize this further, and I'm
going to work investigate that next.  But it seemed productive to get
this much out of the way first, so I did.

In studying this, it strikes me that it would be rather nicer if we
recovery conflicts could somehow arrange to roll back the active
transaction by some means short of a FATAL error.  I think there are
some protocol issues with doing that, but I still wish we could figure
out a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 1:58 PM, Simon Riggs  wrote:
> If we do need to do something, then introduce concept of a visibility 
> conflict.
>
> On replay:
> If feedback not set, set LSN of visibility conflict on PROCs that
> conflict, if not already set.
>
> On query:
> If feedback not set, check conflict LSN against page, if page is
> later, check visibility.

Hmm, should have read the whole thread before replying.  This similar
to what I just proposed in response to Heikki's message, but using LSN
in lieu of (or maybe you mean in addition to) XID.

I don't think we can ignore the need to throw conflicts just because
hot_standby_feedback is set; there are going to be corner cases, for
example, when it's just recently been turned on and the master has
already done cleanup; or if the master and standby have recently
gotten disconnected for even just a few seconds.

But fundamentally we all seem to be converging on some variant of the
"soft conflict" idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 3:02 AM, Noah Misch  wrote:
> Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement
> to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
> positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
> in its role as a witness of tuple visibility, but it is not a hint in its role
> as a witness of the visibility map status?

Exactly.

> The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
> systems with weaker memory ordering, the FlushBuffer() process's clearing of
> BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
> process until some time after the former retrieved buffer contents from shared
> memory.

True.

> Memory barriers could make the comment true, but we should probably
> just update the comment to explain that a race condition may eat the update
> entirely.

Agreed.

> Incidentally, is there a good reason for MarkBufferDirty() to
> increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
> not to do so?  Looks like an oversight.

Also agreed.

>> 2. More seriously, lazy_scan_heap() releases the buffer lock before
>> setting the all-visible bit on the heap.  This looks just plain wrong
>> (and it's my fault, to be clear).  Somebody else can come along after
>> we've released the buffer lock and mark the page not-all-visible.
>> That concurrent process will check the visibility map bit, find it
>> already clear, and go on its merry way.  Then, we set the visibility
>> map bit and go on our merry way.  Now we have a not-all-visible page
>> with the all-visible bit set in the visibility map.   Oops.
>
> Hmm, indeed.  This deserves its own open item.

Also agreed.

> Good point.  We could finagle things so the standby notices end-of-recovery
> checkpoints and blocks the optimization for older snapshots.  For example,
> maintain an integer count of end-of-recovery checkpoints seen and store that
> in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
> global value is greater than the snapshot's value, disable the optimization
> for that snapshot.  I don't know whether the optimization is powerful enough
> to justify that level of trouble, but it's an option to consider.

I suspect not.  It seems we can make index-only scans work without
doing something like this; it's only the sequential-scan optimization
we might lose.  But nobody's ever really complained about not having
that, to my knowledge.

> For a different approach, targeting the fragility, we could add assertions to
> detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
> full-page image.

The holes are narrow enough that I fear such cases would be detected
only on high-velocity production systems, which is not exactly where
you want to find out about such problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 4:26 AM, Heikki Linnakangas
 wrote:
> Can we have a "soft" hot standby conflict that doesn't kill the query, but
> disables index-only-scans?

Yeah, something like that seems possible.

For example, suppose the master includes, in each
mark-heap-page-all-visible record, the newest XID on the page.  On the
standby, we keep track of the most recent such XID ever seen in shared
memory.  After noting that a page is all-visible, the standby
cross-checks its snapshot against this XID and does the heap fetch
anyway if it's too new.  Conceivably this can be done with memory
barriers but not without locks: we must update the XID in shared
memory *before* marking the page all-visible, and we must check the
visibility map or page-level bit *before* fetching the XID from shared
memory - but the actual reads and writes of the XID are atomic.

Now, if an index-only scan suffers a very high number of these "soft
conflicts" and consequently does a lot more heap fetches than
expected, performance might suck.  But that should be rare, and could
be mitigated by turning on hot_standby_feedback.  Also, there might be
some hit for repeatedly reading that XID from memory.

Alternatively, we could try to forbid index-only scan plans from being
created in the first place *only when* the underlying snapshot is too
old.  But then what happens if a conflict happens later, after the
plan has been created but before it's fully executed?  At that point
it's too late to switch gears, so we'd still need something like the
above.  And the above might be adequate all by itself, since in
practice index-only scans are mostly going to be useful when the data
isn't changing all that fast, so most of the queries that would be
cancelled by "hard" conflicts wouldn't have actually had a problem
anyway.

> In the long run, perhaps we need to store XIDs in the visibility map instead
> of just a bit. If you we only stored one xid per 100 pages or something like
> that, the storage overhead would not be much higher than what we have at the
> moment. But that's obviously not going to happen for 9.2...

Well, that would also have the undesirable side effect of greatly
reducing the granularity of the map.  As it is, updating a tiny
fraction of the tuples in the table could result in the entire table
ceasing to be not-all-visible if you happen to update exactly one
tuple per page through the entire heap.  Or more generally, updating
X% of the rows in the table can cause Y% of the rows in the table to
no longer be visible for index-only-scan purposes, where Y >> X.  What
you're proposing would make that much worse.

I think we're actually going to find that the current system isn't
tight enough; my suspicion is that users are going to complain that
we're not aggressive enough about marking pages all-visible, because
autovac won't kick in until updates+deletes>20%, but (1) index-only
scans also care about inserts and (2) by the time we've got 20% dead
tuples index-only scans may well be worthless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 3:19 PM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> Can we have a "soft" hot standby conflict that doesn't kill the query,
>> but disables index-only-scans?
>
> Well, there wouldn't be any way for the planner to know whether an
> index-only scan would be safe or not.  I think this would have to look
> like a run-time fallback.  Could it be structured as "return that the
> page's all-visible bit is not set, instead of failing?"  Or am I
> confused about where the conflict is coming from?

The starting point is that HS now offers 4 different mechanisms for
avoiding conflicts, probably the best of which is hot standby
feedback. So we only need to improve things if those techniques don't
work for people. So initially, my attitude is lets throw a conflict
and wait for feedback during beta.

If we do need to do something, then introduce concept of a visibility conflict.

On replay:
If feedback not set, set LSN of visibility conflict on PROCs that
conflict, if not already set.

On query:
If feedback not set, check conflict LSN against page, if page is
later, check visibility.

In terms of optimisation, I wouldn't expect to have to adjust costs at
all. The difference would only show for long running queries and
typically they don't touch too much new data as a fraction of total.
The cost model for index only is pretty crude anyhow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Tom Lane
Heikki Linnakangas  writes:
> Can we have a "soft" hot standby conflict that doesn't kill the query, 
> but disables index-only-scans?

Well, there wouldn't be any way for the planner to know whether an
index-only scan would be safe or not.  I think this would have to look
like a run-time fallback.  Could it be structured as "return that the
page's all-visible bit is not set, instead of failing?"  Or am I
confused about where the conflict is coming from?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Heikki Linnakangas

On 16.04.2012 10:38, Simon Riggs wrote:

On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch  wrote:

On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:

In the department of query cancellations, I believe Noah argued
previously that this wasn't really going to cause a problem.  And,
indeed, if the master has a mix of inserts, updates, and deletes, then
it seems likely that any recovery conflicts generated this way would
be hitting about the same place in the XID space as the ones caused by
pruning away dead tuples.  What will be different, if we go this
route, is that an insert-only master, which right now only generates
conflicts at freezing time, will instead generate conflicts much
sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
enough to know whether this is a problem, and I'm not trying to block
this approach, but I do want to make sure that everyone agrees that
it's acceptable before we go do it, because inevitably somebody is
going to get bit.


Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC "off" would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.


I'm against adding a new GUC, especially for that fairly thin reason.


Same here.

Can we have a "soft" hot standby conflict that doesn't kill the query, 
but disables index-only-scans?


In the long run, perhaps we need to store XIDs in the visibility map 
instead of just a bit. If you we only stored one xid per 100 pages or 
something like that, the storage overhead would not be much higher than 
what we have at the moment. But that's obviously not going to happen for 
9.2...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch  wrote:
> On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
>> In the department of query cancellations, I believe Noah argued
>> previously that this wasn't really going to cause a problem.  And,
>> indeed, if the master has a mix of inserts, updates, and deletes, then
>> it seems likely that any recovery conflicts generated this way would
>> be hitting about the same place in the XID space as the ones caused by
>> pruning away dead tuples.  What will be different, if we go this
>> route, is that an insert-only master, which right now only generates
>> conflicts at freezing time, will instead generate conflicts much
>> sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
>> enough to know whether this is a problem, and I'm not trying to block
>> this approach, but I do want to make sure that everyone agrees that
>> it's acceptable before we go do it, because inevitably somebody is
>> going to get bit.
>
> Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
> A standby with the GUC "off" would ignore all-visible indicators and also
> decline to generate conflicts when flipping them on.

I'm against adding a new GUC, especially for that fairly thin reason.


>> 1. vacuum on master sets the page-level bit and the visibility map bit
>> 2. the heap page with the bit is written to disk BEFORE the WAL entry
>> generated in (1) gets flushed; this is allowable, since it's not an
>> error for the page-level bit to be set while the visibility-map bit is
>> unset, only the other way around
>> 3. crash (still before the WAL entry is flushed)
>> 4. some operation on the master emits an FPW for the page without
>> first rendering it not-all-visible
>>
>> At present, I'm not sure there's any real problem here, since normally
>> an all-visible heap page is only going to get another WAL entry if
>> somebody does an insert, update, or delete on it, in which case the
>> visibility map bit is going to get cleared anyway.  Freezing the page
>> might emit a new FPW, but that's going to generate conflicts anyway,
>> so no problem there.  So I think there's no real problem here, but it
>> doesn't seem totally future-proof - any future type of WAL record that
>> might modify the page without rendering it not-all-visible would
>> create a very subtle hazard for Hot Standby.  We could dodge the whole
>> issue by leaving the hack in heapgetpage() intact and just be happy
>> with making index-only scans work, or maybe somebody has a more clever
>> idea.
>
> Good point.  We could finagle things so the standby notices end-of-recovery
> checkpoints and blocks the optimization for older snapshots.  For example,
> maintain an integer count of end-of-recovery checkpoints seen and store that
> in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
> global value is greater than the snapshot's value, disable the optimization
> for that snapshot.  I don't know whether the optimization is powerful enough
> to justify that level of trouble, but it's an option to consider.
>
> For a different approach, targeting the fragility, we could add assertions to
> detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
> full-page image.

We don't need a future proof solution, especially not at this stage of
the release cycle. Every time you add a WAL record, we need to
consider the possible conflict impact.

We just need a simple and clear solution. I'll work on that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Noah Misch
On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
> In the department of query cancellations, I believe Noah argued
> previously that this wasn't really going to cause a problem.  And,
> indeed, if the master has a mix of inserts, updates, and deletes, then
> it seems likely that any recovery conflicts generated this way would
> be hitting about the same place in the XID space as the ones caused by
> pruning away dead tuples.  What will be different, if we go this
> route, is that an insert-only master, which right now only generates
> conflicts at freezing time, will instead generate conflicts much
> sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
> enough to know whether this is a problem, and I'm not trying to block
> this approach, but I do want to make sure that everyone agrees that
> it's acceptable before we go do it, because inevitably somebody is
> going to get bit.

Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC "off" would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.

> As to correctness, after further review of lazy_scan_heap(), I think
> there are some residual bugs here that need to be fixed whatever we
> decide to do about the Hot Standby case:
> 
> 1. I noticed that when we do PageSetAllVisible() today we follow it up
> with SetBufferCommitInfoNeedsSave().  PD_ALL_VISIBLE is not merely a
> hint, so I think these should be changed to MarkBufferDirty(), which
> shouldn't make much difference given current code, but proposals to
> handle hint changes differently than non-hint changes abound, so it's
> probably not good to rely on those meaning the same thing forever.

Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement
to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
in its role as a witness of tuple visibility, but it is not a hint in its role
as a witness of the visibility map status?  In any event, I agree that those
call sites should use MarkBufferDirty().

The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
systems with weaker memory ordering, the FlushBuffer() process's clearing of
BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
process until some time after the former retrieved buffer contents from shared
memory.  Memory barriers could make the comment true, but we should probably
just update the comment to explain that a race condition may eat the update
entirely.  Incidentally, is there a good reason for MarkBufferDirty() to
increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
not to do so?  Looks like an oversight.

> 2. More seriously, lazy_scan_heap() releases the buffer lock before
> setting the all-visible bit on the heap.  This looks just plain wrong
> (and it's my fault, to be clear).  Somebody else can come along after
> we've released the buffer lock and mark the page not-all-visible.
> That concurrent process will check the visibility map bit, find it
> already clear, and go on its merry way.  Then, we set the visibility
> map bit and go on our merry way.  Now we have a not-all-visible page
> with the all-visible bit set in the visibility map.   Oops.

Hmm, indeed.  This deserves its own open item.

> I think
> this probably needs to be rewritten so that lazy_scan_heap() acquires
> a pin on the visibility map page before locking the buffer for cleanup
> and holds onto that pin until either we exit the range of heap pages
> covered by that visibility map page or trigger index vac due to
> maintenance_work_mem exhaustion.  With that infrastructure in place,
> we can set the visibility map bit at the same time we set the
> page-level bit without risking holding the buffer lock across a buffer
> I/O (we might have to hold the buffer lock across a WAL flush in the
> worst case, but that hazard exists in a number of other places as well
> and fixing it here is well out of scope).

Looks reasonable at a glance.

> 1. vacuum on master sets the page-level bit and the visibility map bit
> 2. the heap page with the bit is written to disk BEFORE the WAL entry
> generated in (1) gets flushed; this is allowable, since it's not an
> error for the page-level bit to be set while the visibility-map bit is
> unset, only the other way around
> 3. crash (still before the WAL entry is flushed)
> 4. some operation on the master emits an FPW for the page without
> first rendering it not-all-visible
> 
> At present, I'm not sure there's any real problem here, since normally
> an all-visible heap page is only going to get another WAL entry if
> somebody does an insert, update, or delete on it, in which case the
> visibility map bit is going to get cleared anyway.  Freezing the page
> might emit a new FPW, but that's going to generate conflicts anyway,
> so no problem the

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-15 Thread Simon Riggs
On Fri, Apr 13, 2012 at 5:33 PM, Robert Haas  wrote:
> Currently, we have a problem with index-only scans in Hot Standby
> mode: the xmin horizon on the standby might lag the master, and thus
> an index-only scan might mistakenly conclude that no heap fetch is
> needed when in fact it is.  I suggested that we handle this by
> suppressing generation of index-only scan plans in Hot Standby mode,
> but Simon, Noah, and Dimitri were arguing that we should instead do
> the following, which is now on the open items list:
>
> * Make XLOG_HEAP2_VISIBLE records generate recovery snapshot conflicts
> so that IndexOnlyScans work on Hot Standby
...
 very long email 

Luckily its much simpler than all of that suggests. It'll take a few
hours for me to write a short reply but its Sunday today, so that will
happen later.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-15 Thread Jeff Janes
On Sat, Oct 15, 2011 at 2:16 PM, Jeff Janes  wrote:
> On Fri, Oct 7, 2011 at 11:40 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Please find attached a patch implementing a basic version of
>>> index-only scans.
>>
>> I'm making some progress with this, but I notice what seems like a
>> missing feature: there needs to be a way to turn it off.  Otherwise
>> performance comparisons will be difficult to impossible.
>>
>> The most obvious solution is a planner control GUC, perhaps
>> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
>>
>
> Currently I can't force an indexonlyscan over an indexscan, because of
> the way the enable_* variables work.

OK, scratch that.  I must have been using the wrong query (for
which the index was not covering), as I can't reproduce the
behavior nor looking at the code can I see how it could have
occurred in the first place.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-15 Thread Jeff Janes
On Fri, Oct 7, 2011 at 11:40 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Please find attached a patch implementing a basic version of
>> index-only scans.
>
> I'm making some progress with this, but I notice what seems like a
> missing feature: there needs to be a way to turn it off.  Otherwise
> performance comparisons will be difficult to impossible.
>
> The most obvious solution is a planner control GUC, perhaps
> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
>

Currently I can't force an indexonlyscan over an indexscan, because of
the way the enable_* variables work.

Should setting enable_indexscan=off also disable index-only scans (the
current behavior) in addition to plain index scans?

By way of precedent, enable_indexscan=off does not disable Bitmap Index Scan.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-12 Thread Tom Lane
I wrote:
>> I was also toying with the notion of pushing the slot fill-in into the
>> AM, so that the AM API is to return a filled TupleSlot not an
>> IndexTuple.  This would not save any cycles AFAICT --- at least in
>> btree, we still have to make a palloc'd copy of the IndexTuple so that
>> we can release lock on the index page.  The point of it would be to
>> avoid the assumption that the index's internal storage has exactly the
>> format of IndexTuple.  Don't know whether we'd ever have any actual use
>> for that flexibility, but it seems like it wouldn't cost much to
>> preserve the option.

> BTW, I concluded that that would be a bad idea, because it would involve
> the index AM in some choices that we're likely to want to change.  In
> particular it would foreclose ever doing anything with expression
> indexes, without an AM API change.  Better to just define the AM's
> responsibility as to hand back a tuple defined according to the index's
> columns.

Although this aspect of the code is now working well enough for btree,
I realized that it's going to have a problem if/when we add GiST
support.  The difficulty is that the index rowtype includes "storage"
datatypes, not underlying-heap-column datatypes, for opclasses where
those are different.  This is not going to do for cases where we need
to reconstruct a heap value from the index contents, as in Alexander's
example of gist point_ops using a box as the underlying storage.

What we actually want back from the index AM is a rowtype that matches
the list of values submitted for indexing (ie, the original output of
FormIndexDatum), and only for btree is it the case that that's
represented more or less exactly as the IndexTuple stored in the index.

So what I'm now thinking is to go back to the idea of having the index
AM fill in a TupleTableSlot.  For btree this would just amount to moving
the existing StoreIndexTuple function into the AM.  But it would give
GiST the opportunity to do some computation, and it would avoid the
problem of the index's rowtype not being a suitable intermediate format.
The objection I voiced above is misguided, because it confuses the set
of column types that's needed with the format distinction between a Slot
and an IndexTuple.

BTW, right at the moment I'm not that excited about actually doing
any work on GiST itself for index-only scans.  Given the current list of
available opclasses there don't seem to be very many for which
index-only scans would be possible, so the amount of work needed seems
rather out of proportion to the benefit.  But I don't mind fixing AM API
details that are needed to make this workable.  I'd rather have the API
as right as possible in the first release.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread Tom Lane
Alexander Korotkov  writes:
> On Wed, Oct 12, 2011 at 12:35 AM, Tom Lane  wrote:
>> Maybe, instead of a simple constant amcanreturn column, we need an AM
>> API function that says whether the index can return data.

> I like idea of such AM API function. Since single multicolumn index can use
> multiple opclasses, AM API function should also say *what* data index can
> return.

I was thinking more like "amcanreturn(index, column_number) returns bool"
which says if the index can return the data for that column.  The AM
would still have to return a full IndexTuple at runtime, but it'd be
allowed to insert nulls or garbage for columns it hadn't promised to
return.

BTW, if we do this, I'm rather strongly tempted to get rid of the
name-versus-cstring hack (see index_descriptor_hack() in HEAD) by
defining btree name_ops as not capable of returning data.  I don't
trust that hack much at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread Dimitri Fontaine
Tom Lane  writes:
> I haven't thought as far ahead as how we might get the information
> needed for a per-opclass flag.  A syntax addition to CREATE OPERATOR
> CLASS might be the only way.

It looks to me like it's related to the RECHECK property.  Maybe it's
just too late, though.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread Alexander Korotkov
On Wed, Oct 12, 2011 at 12:35 AM, Tom Lane  wrote:
>
> Hm.  I had been supposing that lossless compress functions would just be
> no-ops.  If that's not necessarily the case then we might need something
> different from the opclass's decompress function to get back the
> original data.  However, that doesn't really solve the problem I'm
> concerned about, because the existence and use of such a function would
> be entirely internal to GiST.  There still needs to be a way for the
> planner to know which opclasses support data retrieval.  And I do *not*
> want to see us hard-wire "the presence of opclass function 8 means a
> GiST opclass can return data" into the planner.
>
> Maybe, instead of a simple constant amcanreturn column, we need an AM
> API function that says whether the index can return data.
>
I like idea of such AM API function. Since single multicolumn index can use
multiple opclasses, AM API function should also say *what* data index can
return.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] index-only scans

2011-10-11 Thread Tom Lane
Alexander Korotkov  writes:
> On Tue, Oct 11, 2011 at 5:22 PM, Tom Lane  wrote:
>> I haven't thought as far ahead as how we might get the information
>> needed for a per-opclass flag.  A syntax addition to CREATE OPERATOR
>> CLASS might be the only way.
>> 
> Shouldn't it be implemented through additional interface function? There are
> situations when restoring of original tuple requires some transformation.
> For example, in point_ops we store box in the leaf index tuple, while point
> can be easily restored from box.

Hm.  I had been supposing that lossless compress functions would just be
no-ops.  If that's not necessarily the case then we might need something
different from the opclass's decompress function to get back the
original data.  However, that doesn't really solve the problem I'm
concerned about, because the existence and use of such a function would
be entirely internal to GiST.  There still needs to be a way for the
planner to know which opclasses support data retrieval.  And I do *not*
want to see us hard-wire "the presence of opclass function 8 means a
GiST opclass can return data" into the planner.

Maybe, instead of a simple constant amcanreturn column, we need an AM
API function that says whether the index can return data.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread PostgreSQL - Hans-Jürgen Schönig
On Oct 7, 2011, at 8:47 PM, Joshua D. Drake wrote:

> 
> On 10/07/2011 11:40 AM, Tom Lane wrote:
>> Robert Haas  writes:
>>> Please find attached a patch implementing a basic version of
>>> index-only scans.
>> 
>> I'm making some progress with this, but I notice what seems like a
>> missing feature: there needs to be a way to turn it off.  Otherwise
>> performance comparisons will be difficult to impossible.
>> 
>> The most obvious solution is a planner control GUC, perhaps
>> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
> 
> enable_onlyindexscan
> 
> I'm kidding.
> 
> +1 on Tom's proposed name.


+1 ...
definitely an important thing to do.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread Alexander Korotkov
On Tue, Oct 11, 2011 at 5:22 PM, Tom Lane  wrote:
>
> I haven't thought as far ahead as how we might get the information
> needed for a per-opclass flag.  A syntax addition to CREATE OPERATOR
> CLASS might be the only way.
>
Shouldn't it be implemented through additional interface function? There are
situations when restoring of original tuple requires some transformation.
For example, in point_ops we store box in the leaf index tuple, while point
can be easily restored from box.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] index-only scans

2011-10-11 Thread Tom Lane
Robert Haas  writes:
> Have you given any thought to what would be required to support
> index-only scans on non-btree indexes - particularly, GIST?  ISTM we
> might have had a thought that some GIST opclasses but not others would
> be able to regurgitate tuples, or maybe even that it might vary from
> index tuple to index tuple.  But that discussion was a long time ago,
> and my memory is fuzzy.

It would have to be a per-opclass property, for sure, since the basic
criterion is whether the opclass's "compress" function is lossy.
I don't think we can tolerate a situation where some of the index tuples
might be able to yield data and others not --- that would undo all the
improvements I've been working on over the past few days.

I haven't thought as far ahead as how we might get the information
needed for a per-opclass flag.  A syntax addition to CREATE OPERATOR
CLASS might be the only way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread Alexander Korotkov
On Tue, Oct 11, 2011 at 2:36 PM, Robert Haas  wrote:

> Have you given any thought to what would be required to support
> index-only scans on non-btree indexes - particularly, GIST?  ISTM we
> might have had a thought that some GIST opclasses but not others would
> be able to regurgitate tuples, or maybe even that it might vary from
> index tuple to index tuple.  But that discussion was a long time ago,
> and my memory is fuzzy.

In some GiST opclasses original tuple can't be restored from index tuple.
For example, polygon can't be restored from it's MBR. In some opclasses, for
example box_ops and point_ops, original tuple can be easily restore from
index tuple.
I can't remeber any implementation where this possibility can vary from
index tuple to index tuple. GiST opclass for ts_vector have different
representation of leaf index tuple depending on it's length. But, at most,
it store array of hashes of words, so it's lossy anyway.
I think GiST interface could be extended with optional function which
restores original tuple. But there is some additional difficulty, when some
opclasses of composite index provide this function, but others - not.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] index-only scans

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 12:19 AM, Tom Lane  wrote:
> I wrote:
>> I have mostly-working code for approach #3, but I haven't tried to make
>> EXPLAIN work yet.  While looking at that I realized that there's a
>> pretty good argument for adding the above-mentioned explicit TargetEntry
>> list representing the index columns to index-only plan nodes.  Namely,
>> that if we don't do it, EXPLAIN will have to go to the catalogs to find
>> out what's in that index, and this will fall down for "hypothetical"
>> indexes injected into the planner by index advisors.  We could imagine
>> adding some more hooks to let the advisor inject bogus catalog data at
>> EXPLAIN time, but on the whole it seems easier and less fragile to just
>> have the planner include a data structure it has to build anyway into
>> the finished plan.
>
>> The need for this additional node list field also sways me in a
>> direction that I'd previously been on the fence about, namely that
>> I think index-only scans need to be their own independent plan node type
>> instead of sharing a node type with regular indexscans.  It's just too
>> weird that a simple boolean indexonly property would mean completely
>> different contents/interpretation of the tlist and quals.
>
> Attached is a draft patch for this.  It needs some more review before
> committing, but it does pass regression tests now.
>
> One point worth commenting on is that I chose to rename the OUTER and
> INNER symbols for special varnos to OUTER_VAR and INNER_VAR, along with
> adding a new special varno INDEX_VAR.  It's bothered me for some time
> that those macro names were way too generic/susceptible to collision;
> and since I had to look at all the uses anyway to see if the INDEX case
> needed to be added, this seemed like a good time to rename them.

+1 for that renaming, for sure.

Have you given any thought to what would be required to support
index-only scans on non-btree indexes - particularly, GIST?  ISTM we
might have had a thought that some GIST opclasses but not others would
be able to regurgitate tuples, or maybe even that it might vary from
index tuple to index tuple.  But that discussion was a long time ago,
and my memory is fuzzy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-10 Thread Tom Lane
I wrote:
> I have mostly-working code for approach #3, but I haven't tried to make
> EXPLAIN work yet.  While looking at that I realized that there's a
> pretty good argument for adding the above-mentioned explicit TargetEntry
> list representing the index columns to index-only plan nodes.  Namely,
> that if we don't do it, EXPLAIN will have to go to the catalogs to find
> out what's in that index, and this will fall down for "hypothetical"
> indexes injected into the planner by index advisors.  We could imagine
> adding some more hooks to let the advisor inject bogus catalog data at
> EXPLAIN time, but on the whole it seems easier and less fragile to just
> have the planner include a data structure it has to build anyway into
> the finished plan.

> The need for this additional node list field also sways me in a
> direction that I'd previously been on the fence about, namely that
> I think index-only scans need to be their own independent plan node type
> instead of sharing a node type with regular indexscans.  It's just too
> weird that a simple boolean indexonly property would mean completely
> different contents/interpretation of the tlist and quals.

Attached is a draft patch for this.  It needs some more review before
committing, but it does pass regression tests now.

One point worth commenting on is that I chose to rename the OUTER and
INNER symbols for special varnos to OUTER_VAR and INNER_VAR, along with
adding a new special varno INDEX_VAR.  It's bothered me for some time
that those macro names were way too generic/susceptible to collision;
and since I had to look at all the uses anyway to see if the INDEX case
needed to be added, this seemed like a good time to rename them.

regards, tom lane



binInhGxDA2ea.bin
Description: index-only-scan-revisions.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-09 Thread Greg Stark
On Mon, Oct 10, 2011 at 2:47 AM, Tom Lane  wrote:
> The need for this additional node list field also sways me in a
> direction that I'd previously been on the fence about, namely that
> I think index-only scans need to be their own independent plan node type
> instead of sharing a node type with regular indexscans

At a superficial PR level it'll go over quite well to have a special
plan node be visible in the explain output. People will love to see
Fast Index Scan or Covering Index Scan or whatever you call it in
their plans.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-09 Thread Tom Lane
I wrote:
> There are at least four things we could do: ...
> 2. Set varno = OUTER (or maybe INNER).  This is safe because there's no
> other use for OUTER/INNER in a table scan node.  We would have to hack
> things so that the index tuple gets put into econtext->ecxt_outertuple
> (resp. ecxt_innertuple) at runtime, but that seems like no big problem.
> In both setrefs.c and ruleutils.c, it would be desirable to have a
> TargetEntry list somewhere representing the index columns, which setrefs
> would want so it could set up the special Var nodes with fix_upper_expr,
> and ruleutils would want so it could interpret the Vars using existing
> machinery.  I'm not sure whether to hang that list on the index-only
> plan node or expect EXPLAIN to regenerate it at need.

> 3. Invent another special varno value similar to OUTER/INNER but
> representing an index reference.  This is just about like #2 except that
> we could still put the index tuple into econtext->ecxt_scantuple, and
> ExecEvalVar would do the right thing as it stands.

I have mostly-working code for approach #3, but I haven't tried to make
EXPLAIN work yet.  While looking at that I realized that there's a
pretty good argument for adding the above-mentioned explicit TargetEntry
list representing the index columns to index-only plan nodes.  Namely,
that if we don't do it, EXPLAIN will have to go to the catalogs to find
out what's in that index, and this will fall down for "hypothetical"
indexes injected into the planner by index advisors.  We could imagine
adding some more hooks to let the advisor inject bogus catalog data at
EXPLAIN time, but on the whole it seems easier and less fragile to just
have the planner include a data structure it has to build anyway into
the finished plan.

The need for this additional node list field also sways me in a
direction that I'd previously been on the fence about, namely that
I think index-only scans need to be their own independent plan node type
instead of sharing a node type with regular indexscans.  It's just too
weird that a simple boolean indexonly property would mean completely
different contents/interpretation of the tlist and quals.

I've run into one other thing that's going to need to be hacked up
a bit: index-only scans on "name" columns fall over with this modified
code, because there's now tighter checking of the implied tuple
descriptors:

regression=# select relname from pg_class where relname = 'tenk1';
ERROR:  attribute 1 has wrong type
DETAIL:  Table has type cstring, but query expects name.

The reason for this is the hack we put in some time back to conserve
space in system catalog indexes by having "name" columns be indexed as
though they were "cstring", cf commit
5f6f840e93a3649e0d07e85bad188d163e96ec0e.  We will probably need some
compensatory hack in index-only scans, unless we can think of a less
klugy way of representing that optimization.  (Basically, the index-only
code is assuming that btrees don't have storage type distinct from input
type, and that's not the case for the name opclass.  I had kind of
expected the original patch to have some issues with that too, and I'm
still not fully convinced that there aren't corner cases where it'd be
an issue even with the currently committed code.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-09 Thread Greg Stark
On Sun, Oct 9, 2011 at 10:54 PM, Tom Lane  wrote:
> I don't think it's particularly relevant --- we would not want to use
> weird representations of the Vars outside the index scan nodes.  Above
> the scan they'd be just like any other upper-level Vars.

I can't say I fully understand the planner data structures and the
implications of the options. I guess what I was imagining was that
being able to reference the indexes as regular rangetable entries
would make it more convenient for the rest of the planner to keep
working as if nothing had changed when working with values extracted
from index tuples.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-09 Thread Tom Lane
Greg Stark  writes:
> On Sun, Oct 9, 2011 at 9:03 PM, Tom Lane  wrote:
>> At the moment I'm leaning to approach #3, but I wonder if anyone has
>> a different opinion or another idea altogether.

> Would any of these make it more realistic to talk about the crazy
> plans Heikki suggested like doing two index scans, doing the join
> between the index tuples, and only then looking up the visibility
> information and remaining columns for the tuple on the matching rows?

I don't think it's particularly relevant --- we would not want to use
weird representations of the Vars outside the index scan nodes.  Above
the scan they'd be just like any other upper-level Vars.

(FWIW, that idea isn't crazy; I remember having discussions of it back
in 2003 or so.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-09 Thread Greg Stark
On Sun, Oct 9, 2011 at 9:03 PM, Tom Lane  wrote:
> At the moment I'm leaning to approach #3, but I wonder if anyone has
> a different opinion or another idea altogether.
>

Would any of these make it more realistic to talk about the crazy
plans Heikki suggested like doing two index scans, doing the join
between the index tuples, and only then looking up the visibility
information and remaining columns for the tuple on the matching rows?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-09 Thread Tom Lane
I wrote:
> I believe that we should rejigger things so that when an index-only scan
> is selected, the executor *always* works from the data supplied by the
> index.  Even if it has to visit the heap --- it will do that but just to
> consult the tuple's visibility data, and then use what it got from the
> index anyway.  This means we'd build the plan node's filter quals and
> targetlist to reference the index tuple columns not the underlying
> table's.

I've been studying this a bit.  The key decision is how to represent
Vars that reference columns of the index.  We really have to have
varattno equal to the index column number, else ExecEvalVar will pull
the wrong column from the tuple.  However, varno is not so clear cut.
There are at least four things we could do:

1. Keep varno = table's rangetable index.  The trouble with this is that
a Var referencing index column N would look exactly like a Var
referencing table column N; so the same Var would mean something
different in an index-only scan node than it does in any other type of
scan node for the same table.  We could maybe make that work, but it
seems confusing and fragile as heck.  The executor isn't going to care
much, but inspection of the plan tree by e.g. EXPLAIN sure will.

2. Set varno = OUTER (or maybe INNER).  This is safe because there's no
other use for OUTER/INNER in a table scan node.  We would have to hack
things so that the index tuple gets put into econtext->ecxt_outertuple
(resp. ecxt_innertuple) at runtime, but that seems like no big problem.
In both setrefs.c and ruleutils.c, it would be desirable to have a
TargetEntry list somewhere representing the index columns, which setrefs
would want so it could set up the special Var nodes with fix_upper_expr,
and ruleutils would want so it could interpret the Vars using existing
machinery.  I'm not sure whether to hang that list on the index-only
plan node or expect EXPLAIN to regenerate it at need.

3. Invent another special varno value similar to OUTER/INNER but
representing an index reference.  This is just about like #2 except that
we could still put the index tuple into econtext->ecxt_scantuple, and
ExecEvalVar would do the right thing as it stands.

4. Create a rangetable entry specifically representing the index,
and set varno equal to that RTE's number.  This has some attractiveness
in terms of making the meaning of the Vars clear, but an RTE that
represents an index rather than a table seems kind of ugly otherwise.
It would likely require changes in unrelated parts of the code.


One point here is that we have historically used solution #1 to
represent the index keys in index qual expressions.  We avoid the
ambiguity issues by not asking EXPLAIN to try to interpret the indexqual
tree at all: it works from indexqualorig which contains ordinary Vars.
So one way to dodge the disadvantages of solution #1 would be to add
untransformed "targetlistorig" and "qualorig" fields to an index-only
plan node, and use those for EXPLAIN.  However, those fields would be
totally dead weight if the plan were never EXPLAINed, whereas
indexqualorig has a legitimate use for rechecking indexquals against the
heap tuple in case of a lossy index.  (BTW, if we go with any solution
other than #1, I'm strongly inclined to change the representation of
indexqual to match.  See the comments in fix_indexqual_operand.)

At the moment I'm leaning to approach #3, but I wonder if anyone has
a different opinion or another idea altogether.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-08 Thread Robert Haas
On Oct 8, 2011, at 11:52 AM, Tom Lane  wrote:
> I'm inclined to do the last part immediately, since there's a
> performance argument for it, and then work on revising the executor
> implementation.

Sounds great.  Thanks for working on this.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-08 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> Not really.  We have detected a small performance regression when both
>> heap and index fit in shared_buffers and an index-only scan is used.
>> I have a couple of ideas for improving this.  One is to store a
>> virtual tuple into the slot instead of building a regular tuple, but
>> what do we do about tuples with OIDs?

[ that's done ]

> I was also toying with the notion of pushing the slot fill-in into the
> AM, so that the AM API is to return a filled TupleSlot not an
> IndexTuple.  This would not save any cycles AFAICT --- at least in
> btree, we still have to make a palloc'd copy of the IndexTuple so that
> we can release lock on the index page.  The point of it would be to
> avoid the assumption that the index's internal storage has exactly the
> format of IndexTuple.  Don't know whether we'd ever have any actual use
> for that flexibility, but it seems like it wouldn't cost much to
> preserve the option.

BTW, I concluded that that would be a bad idea, because it would involve
the index AM in some choices that we're likely to want to change.  In
particular it would foreclose ever doing anything with expression
indexes, without an AM API change.  Better to just define the AM's
responsibility as to hand back a tuple defined according to the index's
columns.

>> Another is to avoid locking the
>> index buffer multiple times - right now it locks the index buffer to
>> get the TID, and then relocks it to extract the index tuple (after
>> checking that nothing disturbing has happened meanwhile).  It seems
>> likely that with some refactoring we could get this down to a single
>> lock/unlock cycle, but I haven't figured out exactly where the TID
>> gets copied out.

> Yeah, maybe, but let's get the patch committed before we start looking
> for second-order optimizations.

On reflection I'm starting to think that the above would be a good idea
because there are a couple of bogosities in the basic choices this patch
made.  In particular, I'm thinking about how we could use an index on
f(x) to avoid recalculating f() in something like

select f(x) from tab where f(x) < 42;

assuming that f() is expensive but immutable.  The planner side of this
is already a bit daunting, because it's not clear how to recognize that
an index on f(x) is a covering index (the existing code is going to
think that x itself needs to be available).  But the executor side is a
real problem, because it will fail to make use of the f() value fetched
from the index anytime the heap visibility test fails.

I believe that we should rejigger things so that when an index-only scan
is selected, the executor *always* works from the data supplied by the
index.  Even if it has to visit the heap --- it will do that but just to
consult the tuple's visibility data, and then use what it got from the
index anyway.  This means we'd build the plan node's filter quals and
targetlist to reference the index tuple columns not the underlying
table's.  (Which in passing gets rid of the behavior you were
complaining about that EXPLAIN VERBOSE shows a lot of columns that
aren't actually being computed.)

In order to make this work, we have to remove the API wart that says the
index AM is allowed to choose to not return the index tuple.  And that
ties into what you were saying above.  What we ought to do is have
_bt_readpage() save off copies of the whole tuples not only the TIDs
when it is extracting data from the page.  This is no more net copying
work than what happens now (assuming that all the tuples get fetched)
since we won't need the per-tuple memcpy that occurs now in
bt_getindextuple.  The tuples can go into a page-sized workspace buffer
associated with the BTScanPosData structure, and then just be referenced
in-place in that workspace with no second copy step.

I'm inclined to do the last part immediately, since there's a
performance argument for it, and then work on revising the executor
implementation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-07 Thread Robert Haas
On Fri, Oct 7, 2011 at 8:14 PM, Tom Lane  wrote:
>> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
>> perhaps susceptible to improvement.  I thought about building a
>> virtual tuple, but then what do I do with an OID column, if I have
>> one?  Or maybe this should be done some other way altogether.
>
> I switched it to use a virtual tuple for now, and just not attempt to
> use index-only scans if a system column is required.  We're likely to
> want to rethink this anyway, because as currently constituted the code
> can't do anything with an expression index, and avoiding recalculation
> of an expensive function could be a nice win.  But the approach of
> just building a faux heap tuple fundamentally doesn't work for that.

Figuring out how to fix that problem likely requires more knowledge of
the executor than I have got.

>> 2. Suppose we scan one tuple on a not-all-visible page followed by 99
>> tuples on all-visible pages.  The code as written will hold the pin on
>> the first heap page for the entire scan.  As soon as we hit the end of
>> the scan or another tuple where we have to actually visit the page,
>> the old pin will be released, but until then we hold onto it.
>
> I did not do anything about this issue --- ISTM it needs performance
> testing.

I'm actually less worried about any performance problem than I am
about the possibility of holding up VACUUM.  That can happen the old
way, too, but now the pin could stay on the same page for quite a
while even when the scan is advancing.

I think we maybe ought to think seriously about solving the problem at
the other end, though - either make VACUUM skip pages that it can't
get a cleanup lock on without blocking (except in anti-wraparound
mode) or have it just do the amount of work that can be done with an
exclusive lock (i.e. prune but not defragment, which would work even
in anti-wraparound mode).  That would solve the problems of (1)
undetected VACUUM deadlock vs. a buffer pin, (2) indefinite VACUUM
stall due to a suspended query, and (3) this issue.

>> 3. The code in create_index_path() builds up a bitmapset of heap
>> attributes that get used for any purpose anywhere in the query, and
>> hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
>> index under consideration.  However, if it were somehow possible to
>> have the rel involved without using any attributes at all, we'd
>> rebuild the cache over and over, since it would never become non-NULL.
>
> I dealt with this by the expedient of getting rid of the caching ;-).
> It's not clear to me that it was worth the trouble, and in any case
> it's fundamentally wrong to suppose that every index faces the same
> set of attributes it must supply.  It should not need to supply columns
> that are only needed in indexquals or index predicate conditions.
> I'm not sure how to deal with those refinements cheaply enough, but
> the cache isn't helping.

Oh, hmm.

>> 4. There are a couple of cases that use index-only scans even though
>> the EXPLAIN output sort of makes it look like they shouldn't.  For
>> example, in the above queries, an index-only scan is chosen even
>> though the query does "SELECT *" from the table being scanned.  Even
>> though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
>> that the target list of an EXISTS query is in fact discarded, e.g.:
>
> The reason it looks that way is that we're choosing to use a "physical
> result tuple" to avoid an ExecProject step at runtime.  There's nothing
> wrong with the logic, it's just that EXPLAIN shows something that might
> mislead people.

I wonder if we oughta do something about that.

I was also thinking we should probably make EXPLAIN ANALYZE display
the number of heap fetches, so that you can see how index-only your
index-only scan actually was.

>> 5. We haven't made any planner changes at all, not even for cost
>> estimation.  It is not clear to me what the right way to do cost
>> estimation here is.
>
> Yeah, me either.  For the moment I put in a hard-wired estimate that
> only 90% of the heap pages would actually get fetched.  This is
> conservative and only meant to ensure that the planner picks an
> index-only-capable plan over an indexscan with a non-covering index,
> all else being equal.  We'll need to do performance testing before
> we can refine that.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-07 Thread Tom Lane
Robert Haas  writes:
> Please find attached a patch implementing a basic version of
> index-only scans.  This patch is the work of my colleague Ibrar Ahmed
> and myself, and also incorporates some code from previous patches
> posted by Heikki Linnakanagas.

I've committed this after some rather substantial editorialization.
There's still a lot left to do of course, but it seems to need
performance testing next, and that'll be easier if the code is in HEAD.

> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
> perhaps susceptible to improvement.  I thought about building a
> virtual tuple, but then what do I do with an OID column, if I have
> one?  Or maybe this should be done some other way altogether.

I switched it to use a virtual tuple for now, and just not attempt to
use index-only scans if a system column is required.  We're likely to
want to rethink this anyway, because as currently constituted the code
can't do anything with an expression index, and avoiding recalculation
of an expensive function could be a nice win.  But the approach of
just building a faux heap tuple fundamentally doesn't work for that.

> 2. Suppose we scan one tuple on a not-all-visible page followed by 99
> tuples on all-visible pages.  The code as written will hold the pin on
> the first heap page for the entire scan.  As soon as we hit the end of
> the scan or another tuple where we have to actually visit the page,
> the old pin will be released, but until then we hold onto it.

I did not do anything about this issue --- ISTM it needs performance
testing.

> 3. The code in create_index_path() builds up a bitmapset of heap
> attributes that get used for any purpose anywhere in the query, and
> hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
> index under consideration.  However, if it were somehow possible to
> have the rel involved without using any attributes at all, we'd
> rebuild the cache over and over, since it would never become non-NULL.

I dealt with this by the expedient of getting rid of the caching ;-).
It's not clear to me that it was worth the trouble, and in any case
it's fundamentally wrong to suppose that every index faces the same
set of attributes it must supply.  It should not need to supply columns
that are only needed in indexquals or index predicate conditions.
I'm not sure how to deal with those refinements cheaply enough, but
the cache isn't helping.

> 4. There are a couple of cases that use index-only scans even though
> the EXPLAIN output sort of makes it look like they shouldn't.  For
> example, in the above queries, an index-only scan is chosen even
> though the query does "SELECT *" from the table being scanned.  Even
> though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
> that the target list of an EXISTS query is in fact discarded, e.g.:

The reason it looks that way is that we're choosing to use a "physical
result tuple" to avoid an ExecProject step at runtime.  There's nothing
wrong with the logic, it's just that EXPLAIN shows something that might
mislead people.

> 5. We haven't made any planner changes at all, not even for cost
> estimation.  It is not clear to me what the right way to do cost
> estimation here is.

Yeah, me either.  For the moment I put in a hard-wired estimate that
only 90% of the heap pages would actually get fetched.  This is
conservative and only meant to ensure that the planner picks an
index-only-capable plan over an indexscan with a non-covering index,
all else being equal.  We'll need to do performance testing before
we can refine that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-07 Thread Tom Lane
Robert Haas  writes:
> On Fri, Oct 7, 2011 at 2:40 PM, Tom Lane  wrote:
>> I'm making some progress with this, but I notice what seems like a
>> missing feature: there needs to be a way to turn it off.  Otherwise
>> performance comparisons will be difficult to impossible.
>> 
>> The most obvious solution is a planner control GUC, perhaps
>> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?

> I was expecting you to NOT want that, or I would have put it in to
> begin with...  so go for it.

It seems unlikely to have any use except for testing, but I think we
need it for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-07 Thread Robert Haas
On Fri, Oct 7, 2011 at 2:40 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Please find attached a patch implementing a basic version of
>> index-only scans.
>
> I'm making some progress with this, but I notice what seems like a
> missing feature: there needs to be a way to turn it off.  Otherwise
> performance comparisons will be difficult to impossible.
>
> The most obvious solution is a planner control GUC, perhaps
> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?

I was expecting you to NOT want that, or I would have put it in to
begin with...  so go for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-07 Thread Joshua D. Drake


On 10/07/2011 11:40 AM, Tom Lane wrote:

Robert Haas  writes:

Please find attached a patch implementing a basic version of
index-only scans.


I'm making some progress with this, but I notice what seems like a
missing feature: there needs to be a way to turn it off.  Otherwise
performance comparisons will be difficult to impossible.

The most obvious solution is a planner control GUC, perhaps
"enable_indexonlyscan".  Anyone object, or want to bikeshed the name?


enable_onlyindexscan

I'm kidding.

+1 on Tom's proposed name.



regards, tom lane




--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-07 Thread Tom Lane
Robert Haas  writes:
> Please find attached a patch implementing a basic version of
> index-only scans.

I'm making some progress with this, but I notice what seems like a
missing feature: there needs to be a way to turn it off.  Otherwise
performance comparisons will be difficult to impossible.

The most obvious solution is a planner control GUC, perhaps
"enable_indexonlyscan".  Anyone object, or want to bikeshed the name?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-06 Thread Thom Brown
On 6 October 2011 21:11, Tom Lane  wrote:
> Robert Haas  writes:
>> Not really.  We have detected a small performance regression when both
>> heap and index fit in shared_buffers and an index-only scan is used.
>> I have a couple of ideas for improving this.  One is to store a
>> virtual tuple into the slot instead of building a regular tuple, but
>> what do we do about tuples with OIDs?
>
> Yeah, I was just looking at that.  I think it's going to be a clear win
> to use a virtual tuple slot instead of constructing and deconstructing
> a HeapTuple.  The obvious solution is to decree that you can't use an
> index-only scan if the query requires fetching OID (or any other system
> column).  This would be slightly annoying for catalog fetches but I'm
> not sure I believe that catalog queries are that important a use-case.
>
> I was also toying with the notion of pushing the slot fill-in into the
> AM, so that the AM API is to return a filled TupleSlot not an
> IndexTuple.  This would not save any cycles AFAICT --- at least in
> btree, we still have to make a palloc'd copy of the IndexTuple so that
> we can release lock on the index page.  The point of it would be to
> avoid the assumption that the index's internal storage has exactly the
> format of IndexTuple.  Don't know whether we'd ever have any actual use
> for that flexibility, but it seems like it wouldn't cost much to
> preserve the option.
>
>> Another is to avoid locking the
>> index buffer multiple times - right now it locks the index buffer to
>> get the TID, and then relocks it to extract the index tuple (after
>> checking that nothing disturbing has happened meanwhile).  It seems
>> likely that with some refactoring we could get this down to a single
>> lock/unlock cycle, but I haven't figured out exactly where the TID
>> gets copied out.
>
> Yeah, maybe, but let's get the patch committed before we start looking
> for second-order optimizations.

+1

Been testing this today with a few regression tests and haven't seen
anything noticeably broken.  Does what it says on the tin.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-06 Thread Tom Lane
Robert Haas  writes:
> Not really.  We have detected a small performance regression when both
> heap and index fit in shared_buffers and an index-only scan is used.
> I have a couple of ideas for improving this.  One is to store a
> virtual tuple into the slot instead of building a regular tuple, but
> what do we do about tuples with OIDs?

Yeah, I was just looking at that.  I think it's going to be a clear win
to use a virtual tuple slot instead of constructing and deconstructing
a HeapTuple.  The obvious solution is to decree that you can't use an
index-only scan if the query requires fetching OID (or any other system
column).  This would be slightly annoying for catalog fetches but I'm
not sure I believe that catalog queries are that important a use-case.

I was also toying with the notion of pushing the slot fill-in into the
AM, so that the AM API is to return a filled TupleSlot not an
IndexTuple.  This would not save any cycles AFAICT --- at least in
btree, we still have to make a palloc'd copy of the IndexTuple so that
we can release lock on the index page.  The point of it would be to
avoid the assumption that the index's internal storage has exactly the
format of IndexTuple.  Don't know whether we'd ever have any actual use
for that flexibility, but it seems like it wouldn't cost much to
preserve the option.

> Another is to avoid locking the
> index buffer multiple times - right now it locks the index buffer to
> get the TID, and then relocks it to extract the index tuple (after
> checking that nothing disturbing has happened meanwhile).  It seems
> likely that with some refactoring we could get this down to a single
> lock/unlock cycle, but I haven't figured out exactly where the TID
> gets copied out.

Yeah, maybe, but let's get the patch committed before we start looking
for second-order optimizations.

> With regard to the AM API, the basic idea is we're just adding a
> Boolean to say whether the AM is capable of returning index tuples.
> If it's not, then we don't ever try an index-only scan.  If it is,
> then we'll set the want_index_tuple flag if an index-only scan is
> possible.  This requests that the AM attempt to return the tuple; but
> the AM is also allowed to fail and not return the tuple whenever it
> wants.

It was the "allowed to fail" bit that I was annoyed to discover only by
reading some well-buried code.

> Incidentally, if you happen to feel the urge to beat this around and
> send it back rather than posting a list of requested changes, feel
> free.

You can count on that ;-).  I've already found a lot of things I didn't
care for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-06 Thread Robert Haas
On Thu, Oct 6, 2011 at 3:15 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Please find attached a patch implementing a basic version of
>> index-only scans.  This patch is the work of my colleague Ibrar Ahmed
>> and myself, and also incorporates some code from previous patches
>> posted by Heikki Linnakanagas.
>
> I'm starting to review this patch now.

Thanks!

> Has any further work been done
> since the first version was posted?  Also, was any documentation
> written?  I'm a tad annoyed by having to reverse-engineer the changes
> in the AM API specification from the code.

Not really.  We have detected a small performance regression when both
heap and index fit in shared_buffers and an index-only scan is used.
I have a couple of ideas for improving this.  One is to store a
virtual tuple into the slot instead of building a regular tuple, but
what do we do about tuples with OIDs?  Another is to avoid locking the
index buffer multiple times - right now it locks the index buffer to
get the TID, and then relocks it to extract the index tuple (after
checking that nothing disturbing has happened meanwhile).  It seems
likely that with some refactoring we could get this down to a single
lock/unlock cycle, but I haven't figured out exactly where the TID
gets copied out.

With regard to the AM API, the basic idea is we're just adding a
Boolean to say whether the AM is capable of returning index tuples.
If it's not, then we don't ever try an index-only scan.  If it is,
then we'll set the want_index_tuple flag if an index-only scan is
possible.  This requests that the AM attempt to return the tuple; but
the AM is also allowed to fail and not return the tuple whenever it
wants.  This is more or less the interface that Heikki suggested a
couple years back, but it might well be vulnerable to improvement.

Incidentally, if you happen to feel the urge to beat this around and
send it back rather than posting a list of requested changes, feel
free.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-06 Thread Tom Lane
Robert Haas  writes:
> Please find attached a patch implementing a basic version of
> index-only scans.  This patch is the work of my colleague Ibrar Ahmed
> and myself, and also incorporates some code from previous patches
> posted by Heikki Linnakanagas.

I'm starting to review this patch now.  Has any further work been done
since the first version was posted?  Also, was any documentation
written?  I'm a tad annoyed by having to reverse-engineer the changes
in the AM API specification from the code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-09-25 Thread Robert Haas
On Sun, Sep 25, 2011 at 2:43 PM, Marti Raudsepp  wrote:
> On Sun, Aug 14, 2011 at 00:31, Heikki Linnakangas
>  wrote:
>> That is somewhat compensated by the fact that tuples that are accessed more
>> often are also more likely to be in cache. Fetching the heap tuple to check
>> visibility is very cheap when the tuple is in cache.
>>
>> I'm not sure how far that compensates it, though. I'm sure there's typically
>> nevertheless a fairly wide range of pages that have been modified since the
>> last vacuum, but not in cache anymore.
>
> Would it make sense to re-evaluate the visibility bit just before a
> page gets flushed out from shared buffers? On a system with no long
> transactions, it seems likely that a dirty page is already all-visible
> by the time bgwriter (or shared buffers memory pressure) gets around
> to writing it out. That way we don't have to wait for vacuum to do it
> and would make your observation hold more often.

This has been suggested before, and, sure, there might be cases where
it helps.  But you need to choose your test case fairly carefully.
For example, if you're doing a large sequential scan on a table, the
ring-buffer logic causes processes to evict their own pages, and so
the background writer doesn't get a chance to touch any of those
pages.  You need some kind of a workload where pages are being evicted
from shared buffers slowly enough that it ends up being the background
writer, rather than the individual backends, that do the work.  But if
you have that kind of workload, then we can infer that most of your
working set fits into shared buffers.  And in that case you don't
really need index-only scans in the first place.  The main point of
index only scans is to optimize the case where you have a gigantic
table and you're trying to avoid swamping the system with random I/O.
I'm not saying that such a change would be a particularly bad idea,
but I'm not personally planning to work on it any time soon because I
can't convince myself that it really helps all that much.

I think the real solution to getting visibility map bits set is to
vacuum more frequently, but have it be cheaper each time.  Our default
autovacuum settings vacuum the table when the number of updates and
deletes reaches 20% of the table size.  But those settings were put in
place under the assumption that we'll have to scan the entire heap,
dirtying every page that contains dead tuples, scan all the indexes
and remove the associated index pointers, and then scan and dirty the
heap pages that contain now-dead line pointers a second time to remove
those.  The visibility map has eroded those assumptions to some
degree, because now we probably won't have to scan the entire heap
every time we vacuum; and I'm hoping we're going to see some further
erosion.  Pavan has a pending patch which, if we can work out the
details, will eliminate the second heap scan; and we've also talked
about doing the index scan only when there are enough dead line
pointers to justify the effort.  That, it seems to me, would open the
door to lowering the scale factor, maybe by quite a bit - which, in
turn, would help us control bloat better and get visibility map bits
set sooner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-09-25 Thread Marti Raudsepp
On Sun, Aug 14, 2011 at 00:31, Heikki Linnakangas
 wrote:
> That is somewhat compensated by the fact that tuples that are accessed more
> often are also more likely to be in cache. Fetching the heap tuple to check
> visibility is very cheap when the tuple is in cache.
>
> I'm not sure how far that compensates it, though. I'm sure there's typically
> nevertheless a fairly wide range of pages that have been modified since the
> last vacuum, but not in cache anymore.

Would it make sense to re-evaluate the visibility bit just before a
page gets flushed out from shared buffers? On a system with no long
transactions, it seems likely that a dirty page is already all-visible
by the time bgwriter (or shared buffers memory pressure) gets around
to writing it out. That way we don't have to wait for vacuum to do it
and would make your observation hold more often.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-09-23 Thread Greg Stark
On Tue, Aug 16, 2011 at 11:24 AM, Anssi Kääriäinen
 wrote:
> There is the question if one should be allowed to tune the *_page_costs at
> all. If I am not missing something, it is possible to detect the correct
> values programmatically and they do not change if you do not change the
> hardware. Cache hit ratio is the real reason why they are currently so
> important for tuning.

Unfortunately things a tad more complex than this picture. There are
multiple levels of cache involved here. There's the Postgres buffer
cache, the filesystem buffer cache, and then the raid controller or
drives often have cache as well.

Also the difference between seq_page_cost and random_page_cost is
hiding another cache effect. The reason sequential reads are faster is
twofold, there's no seek but also there's an increased chance the
buffer is already in the filesystem cache due to having been
prefetched. Actually it's hardly even probabilistic -- only every nth
page needs to do i/o when doing sequential reads.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-09-23 Thread Cédric Villemain
2011/8/16 Anssi Kääriäinen :
> On 08/14/2011 12:31 AM, Heikki Linnakangas wrote:
>>>
>>> The same idea could of course be used to calculate the effective cache
>>> hit ratio for each table. Cache hit ratio would have the problem of feedback
>>> loops, though.
>>
>> Yeah, I'm not excited about making the planner and statistics more
>> dynamic. Feedback loops and plan instability are not fun.
>
> I might be a little out of my league here... But I was thinking about the
> cache hit ratio and feedback loops. I understand automatic tuning would be
> hard. But making automatic tuning easier (by using pg_tune for example)
> would be a big plus for most use cases.
>
> To make it easier to tune the page read costs automatically, it would be
> nice if there would be four variables instead of the current two:
>  - random_page_cost is the cost of reading a random page from storage.
> Currently it is not, it is the cost of accessing a random page, taking in
> account it might be in memory.
>  - seq_page_cost is the cost of reading pages sequentially from storage
>  - memory_page_cost is the cost of reading a page in memory
>  - cache_hit_ratio is the expected cache hit ratio
>
> memory_page_cost would be server global, random and seq page costs
> tablespace specific, and cache_hit_ratio relation specific. You would get
> the current behavior by tuning *_page_costs realistically, and setting
> cache_hit_ratio globally so that the expected random_page_cost /
> seq_page_cost stays the same as now.
>
> The biggest advantage of this would be that the correct values are much
> easier to detect automatically compared to current situation. This can be
> done using pg_statio_* views and IO speed testing. They should not be tuned
> automatically by PostgreSQL, at least not the cache_hit_ratio, as that leads
> to the possibility of feedback loops and plan instability. The variables
> would also be much easier to understand.
>
> There is the question if one should be allowed to tune the *_page_costs at
> all. If I am not missing something, it is possible to detect the correct
> values programmatically and they do not change if you do not change the
> hardware. Cache hit ratio is the real reason why they are currently so
> important for tuning.
>
> An example why the current random_page_cost and seq_page_cost tuning is not
> adequate is that you can only set random_page_cost per tablespace. That
> makes perfect sense if random_page_cost would be the cost of accessing a
> page in storage. But it is not, it is a combination of that and caching
> effects, so that it actually varies per relation (and over time). How do you
> set it correctly for a query where one relation is fully cached and another
> one not?
>
> Another problem is that if you use random_page_cost == seq_page_cost, you
> are effectively saying that everything is in cache. But if everything is in
> cache, the cost of page access relative to cpu_*_costs is way off. The more
> random_page_cost and seq_page_cost are different, the more they mean the
> storage access costs. When they are the same, they mean the memory page
> cost. There can be an order of magnitude in difference of a storage page
> cost and a memory page cost. So it is hard to tune the cpu_*_costs
> realistically for cases where sometimes data is in cache and sometimes not.
>
> Ok, enough hand waving for one post :) Sorry if this all is obvious /
> discussed before. My googling didn't turn out anything directly related,
> although these have some similarity:
>  - Per-table random_page_cost for tables that we know are always cached
> [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php]
>  - Script to compute random page cost
> [http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php]
> -  The science of optimization in practical terms?
> [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php], getting
> really interesting starting from here:
> [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php]

late reply.
You can add this link to your list:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg01140.php


-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-16 Thread Anssi Kääriäinen

On 08/14/2011 12:31 AM, Heikki Linnakangas wrote:

The same idea could of course be used to calculate the effective cache hit 
ratio for each table. Cache hit ratio would have the problem of feedback loops, 
though.

Yeah, I'm not excited about making the planner and statistics more
dynamic. Feedback loops and plan instability are not fun.
I might be a little out of my league here... But I was thinking about 
the cache hit ratio and feedback loops. I understand automatic tuning 
would be hard. But making automatic tuning easier (by using pg_tune for 
example) would be a big plus for most use cases.


To make it easier to tune the page read costs automatically, it would be 
nice if there would be four variables instead of the current two:
  - random_page_cost is the cost of reading a random page from storage. 
Currently it is not, it is the cost of accessing a random page, taking 
in account it might be in memory.

  - seq_page_cost is the cost of reading pages sequentially from storage
  - memory_page_cost is the cost of reading a page in memory
  - cache_hit_ratio is the expected cache hit ratio

memory_page_cost would be server global, random and seq page costs 
tablespace specific, and cache_hit_ratio relation specific. You would 
get the current behavior by tuning *_page_costs realistically, and 
setting cache_hit_ratio globally so that the expected random_page_cost / 
seq_page_cost stays the same as now.


The biggest advantage of this would be that the correct values are much 
easier to detect automatically compared to current situation. This can 
be done using pg_statio_* views and IO speed testing. They should not be 
tuned automatically by PostgreSQL, at least not the cache_hit_ratio, as 
that leads to the possibility of feedback loops and plan instability. 
The variables would also be much easier to understand.


There is the question if one should be allowed to tune the *_page_costs 
at all. If I am not missing something, it is possible to detect the 
correct values programmatically and they do not change if you do not 
change the hardware. Cache hit ratio is the real reason why they are 
currently so important for tuning.


An example why the current random_page_cost and seq_page_cost tuning is 
not adequate is that you can only set random_page_cost per tablespace. 
That makes perfect sense if random_page_cost would be the cost of 
accessing a page in storage. But it is not, it is a combination of that 
and caching effects, so that it actually varies per relation (and over 
time). How do you set it correctly for a query where one relation is 
fully cached and another one not?


Another problem is that if you use random_page_cost == seq_page_cost, 
you are effectively saying that everything is in cache. But if 
everything is in cache, the cost of page access relative to cpu_*_costs 
is way off. The more random_page_cost and seq_page_cost are different, 
the more they mean the storage access costs. When they are the same, 
they mean the memory page cost. There can be an order of magnitude in 
difference of a storage page cost and a memory page cost. So it is hard 
to tune the cpu_*_costs realistically for cases where sometimes data is 
in cache and sometimes not.


Ok, enough hand waving for one post :) Sorry if this all is obvious / 
discussed before. My googling didn't turn out anything directly related, 
although these have some similarity:
 - Per-table random_page_cost for tables that we know are always cached 
[http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php]

 - Script to compute random page cost
[http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php]
-  The science of optimization in practical terms?
[http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php], 
getting really interesting starting from here:

[http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php]

 - Anssi


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-15 Thread Robert Haas
On Mon, Aug 15, 2011 at 7:37 PM, Greg Smith  wrote:
> That's 5.4X as fast; not too shabby!

Awesome!

> And with the large difference in response time, things appear to be working
> as hoped even in this situation.  If you try this on your laptop, where
> drive cache size and random I/O are likely to be even slower, you might see
> an ever larger difference.

Hah!  Just in case a 5.4X performance improvement isn't good enough?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-15 Thread Greg Smith

On 08/11/2011 04:06 PM, Robert Haas wrote:

On my laptop, the first query executes in about 555 ms, while the
second one takes about 1125 ms...I expect that you could get
an even larger benefit from this type of query if you could avoid
actual disk I/O, rather than just buffer cache thrashing, but I
haven't come up with a suitable test cases for that yet (volunteers?).
   


That part I can help with, using a Linux test that kills almost every 
cache. I get somewhat faster times on my desktop here running the cached 
version like you were doing (albeit with debugging options on, so I 
wouldn't read too much into this set of numbers):


select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
 sum
--
 250279412983
(1 row)

Time: 472.778 ms

   QUERY PLAN
-
 Aggregate  (cost=133325.00..133325.01 rows=1 width=4)
   ->  Nested Loop Semi Join  (cost=0.00..133075.00 rows=10 width=4)
 ->  Seq Scan on sample_data a1  (cost=0.00..15286.00 
rows=10 width=4)
 ->  Index Only Scan using pgbench_accounts_pkey on 
pgbench_accounts a  (cost=0.00..1.17 rows=1 width=4)

   Index Cond: (aid = a1.aid)
   Filter: (aid <> 1234567)

select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);
 sum
--
 250279412983

Time: 677.902 ms
explain select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);
 QUERY PLAN

 Aggregate  (cost=133325.00..133325.01 rows=1 width=4)
   ->  Nested Loop Semi Join  (cost=0.00..133075.00 rows=10 width=4)
 ->  Seq Scan on sample_data a1  (cost=0.00..15286.00 
rows=10 width=4)
 ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts 
a  (cost=0.00..1.17 rows=1 width=4)

   Index Cond: (aid = a1.aid)
   Filter: (bid <> 1234567)

If I setup my gsmith account to be able to start and stop the server 
with pg_ctl and a valid PGDATA, and drop these two scripts in that home 
directory:


== index-only-1.sql ==

\timing
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);

explain select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);

== index-only-2.sql ==

\timing
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);

explain select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);

I can then run this script as root:

#!/bin/bash
ME="gsmith"
su - $ME -c "pg_ctl stop -w"
echo 3 > /proc/sys/vm/drop_caches
su - $ME -c "pg_ctl start -w"
su - $ME -c "psql -ef index-only-1.sql"
su - $ME -c "pg_ctl stop -w"
echo 3 > /proc/sys/vm/drop_caches
su - $ME -c "pg_ctl start -w"
su - $ME -c "psql -ef index-only-2.sql"

And get results that start with zero information cached in RAM, showing 
a much more dramatic difference.  Including some snippets of interesting 
vmstat too, the index-only one gets faster as it runs while the regular 
one is pretty flat:


select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
Time: 31677.683 ms

$ vmstat 5
procs ---memory-- ---swap-- -io -system-- 
cpu
 0  1  0 15807288   4388 12644000  4681   118 1407 2432  1  
1 89 10
 1  1  0 15779388   4396 15444800  358717 1135 2058  1  
0 86 13
 0  1  0 15739956   4396 19367200  5800 0 1195 2056  1  
0 87 12
 0  1  0 15691844   4396 24183200  7053 3 1299 2044  1  
0 86 13
 0  1  0 15629736   4396 30409600  799537 1391 2053  1  
0 87 12
 0  1  0 15519244   4400 41426800 1163914 1448 2189  1  
0 87 12


select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);
Time: 172381.235 ms

$ vmstat 5
procs ---memory-- ---swap-- -io -system-- 
cpu
 0  1  0 15736500   4848 19644400  314222 1092 1989  1  
0 86 13
 0  1  0 15711948   4852 22107200  3411 1 1039 1943  0  
0 88 12
 0  1  0 15685412   4852 24749600  361834  1997  0  
0 86 13

[this is the middle part, rate doesn't vary too much]

That's 5.4X as fast; not too shabby!  Kind of interesting how much 
different the I/O pattern is on the 

Re: [HACKERS] index-only scans

2011-08-15 Thread Jim Nasby
On Aug 13, 2011, at 4:31 PM, Heikki Linnakangas wrote:
>> The example is much more realistic if the query is a fetch of N latest rows 
>> from a table. Very common use case, and the whole relation's visibility 
>> statistics are completely wrong for that query.
> 
> That is somewhat compensated by the fact that tuples that are accessed more 
> often are also more likely to be in cache. Fetching the heap tuple to check 
> visibility is very cheap when the tuple is in cache.
> 
> I'm not sure how far that compensates it, though. I'm sure there's typically 
> nevertheless a fairly wide range of pages that have been modified since the 
> last vacuum, but not in cache anymore.

http://xkcd.org/937/ :)

Could something be added to pg_stats that tracks visibility map usefulness on a 
per-attribute basis? Perhaps another set of stats buckets that show visibility 
percentages for each stats bucket?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-13 Thread Heikki Linnakangas

On 13.08.2011 23:35, Kääriäinen Anssi wrote:

"""
Now, suppose that we know that 50% of the heap pages have their
visibility map bits set.  What's the chance that this query won't need
a heap fetch?  Well, the chance is 50% *if* you assume that a row
which has been quiescent for a long time is just as likely to be
queried as one that has been recently inserted or updated.  However,
in many real-world use cases, nothing could be farther from the truth.

What do we do about that?
"""

The example is much more realistic if the query is a fetch of N latest rows 
from a table. Very common use case, and the whole relation's visibility 
statistics are completely wrong for that query.


That is somewhat compensated by the fact that tuples that are accessed 
more often are also more likely to be in cache. Fetching the heap tuple 
to check visibility is very cheap when the tuple is in cache.


I'm not sure how far that compensates it, though. I'm sure there's 
typically nevertheless a fairly wide range of pages that have been 
modified since the last vacuum, but not in cache anymore.



Wouldn't it be great if there was something like pg_stat_statements that would 
know the statistics per query, derived from usage...

Even if the statistics are not available per query, the statistics could be 
calculated from the relation usage: the weighted visibility of the pages would 
be pages_visible_when_read / total_pages_read for the relation. That percentage 
would minimize the average cost of the plans much better than just the 
non-weighted visibility percentage.

For the above example, if the usage is 90% read the N latest rows and we assume 
they are never visible, the weighted visibility percentage would be 10% while 
the non-weighted visibility percentage could be 90%. Even if the visibility 
percentage would be incorrect for the queries reading old rows, by definition 
of the weighted visibility percentage there would not be too many of them.

The same idea could of course be used to calculate the effective cache hit 
ratio for each table. Cache hit ratio would have the problem of feedback loops, 
though.


Yeah, I'm not excited about making the planner and statistics more 
dynamic. Feedback loops and plan instability are not fun. I think we 
should rather be more aggressive in setting the visibility map bits.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-13 Thread Kääriäinen Anssi
"""
Now, suppose that we know that 50% of the heap pages have their
visibility map bits set.  What's the chance that this query won't need
a heap fetch?  Well, the chance is 50% *if* you assume that a row
which has been quiescent for a long time is just as likely to be
queried as one that has been recently inserted or updated.  However,
in many real-world use cases, nothing could be farther from the truth.

What do we do about that?
"""

The example is much more realistic if the query is a fetch of N latest rows 
from a table. Very common use case, and the whole relation's visibility 
statistics are completely wrong for that query. Wouldn't it be great if there 
was something like pg_stat_statements that would know the statistics per query, 
derived from usage...

Even if the statistics are not available per query, the statistics could be 
calculated from the relation usage: the weighted visibility of the pages would 
be pages_visible_when_read / total_pages_read for the relation. That percentage 
would minimize the average cost of the plans much better than just the 
non-weighted visibility percentage.

For the above example, if the usage is 90% read the N latest rows and we assume 
they are never visible, the weighted visibility percentage would be 10% while 
the non-weighted visibility percentage could be 90%. Even if the visibility 
percentage would be incorrect for the queries reading old rows, by definition 
of the weighted visibility percentage there would not be too many of them.

The same idea could of course be used to calculate the effective cache hit 
ratio for each table. Cache hit ratio would have the problem of feedback loops, 
though.

Of course, keeping such statistic could be more expensive than the benefit it 
gives. On the other hand, page hit percentage is already available...

 - Anssi
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-13 Thread Robert Haas
On Fri, Aug 12, 2011 at 5:39 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>
>> That's one of the points I asked for feedback on in my original
>> email.  "How should the costing be done?"
>
> It seems pretty clear that there should be some cost adjustment.  If
> you can't get good numbers somehow on what fraction of the heap
> accesses will be needed, I would suggest using a magic number based
> on the assumption that half the heap access otherwise necessary will
> be done.  It wouldn't be the worst magic number in the planner.  Of
> course, real numbers are always better if you can get them.

It wouldn't be that difficult (I think) to make VACUUM and/or ANALYZE
gather some statistics; what I'm worried about is that we'd have
correlation problems.  Consider a wide table with an index on (id,
name), and the query:

SELECT name FROM tab WHERE id = 12345

Now, suppose that we know that 50% of the heap pages have their
visibility map bits set.  What's the chance that this query won't need
a heap fetch?  Well, the chance is 50% *if* you assume that a row
which has been quiescent for a long time is just as likely to be
queried as one that has been recently inserted or updated.  However,
in many real-world use cases, nothing could be farther from the truth.

What do we do about that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-12 Thread Kevin Grittner
Robert Haas  wrote:
 
> That's one of the points I asked for feedback on in my original
> email.  "How should the costing be done?"
 
It seems pretty clear that there should be some cost adjustment.  If
you can't get good numbers somehow on what fraction of the heap
accesses will be needed, I would suggest using a magic number based
on the assumption that half the heap access otherwise necessary will
be done.  It wouldn't be the worst magic number in the planner.  Of
course, real numbers are always better if you can get them.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-12 Thread Robert Haas
2011/8/12 PostgreSQL - Hans-Jürgen Schönig :
> is there any plan to revise the cost for index only scans compared to what it 
> is now?

That's one of the points I asked for feedback on in my original email.
 "How should the costing be done?"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-12 Thread PostgreSQL - Hans-Jürgen Schönig
On Aug 12, 2011, at 10:03 PM, Heikki Linnakangas wrote:

> On 11.08.2011 23:06, Robert Haas wrote:
>> Comments, testing, review appreciated...
> 
> I would've expected this to use an index-only scan:
> 
> postgres=# CREATE TABLE foo AS SELECT generate_series(1,10) AS id;
> SELECT 10
> postgres=# CREATE INDEX i_foo ON foo (id) WHERE id = 10;
> CREATE INDEX
> postgres=# VACUUM ANALYZE foo;
> VACUUM
> postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
>   QUERY PLAN
> -
> Index Scan using i_foo on foo  (cost=0.00..8.27 rows=1 width=4)
>   Index Cond: (id = 10)
> (2 rows)
> 
> If it's not a predicate index, then it works:
> 
> postgres=# DROP INDEX i_foo;
> DROP INDEX
> postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
>  QUERY PLAN
> ---
> Index Only Scan using i_foo2 on foo  (cost=0.00..8.28 rows=1 width=4)
>   Index Cond: (id = 10)
> (2 rows)


is there any plan to revise the cost for index only scans compared to what it 
is now?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-12 Thread Robert Haas
On Fri, Aug 12, 2011 at 4:03 PM, Heikki Linnakangas
 wrote:
> On 11.08.2011 23:06, Robert Haas wrote:
>>
>> Comments, testing, review appreciated...
>
> I would've expected this to use an index-only scan:
>
> postgres=# CREATE TABLE foo AS SELECT generate_series(1,10) AS id;

Ugh.  I think there's something wrong with this test:

+   if (index->amcanreturn && !index->predOK && !index->indexprs)

I think that should probably say something like (ind->indpred == NIL
|| ind->predOK).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >