from:"Jesper Pedersen"

Re: [HACKERS] [POC] hash partitioning

2017-10-06 Thread Jesper Pedersen


Hi Amul,

On 09/28/2017 05:56 AM, amul sul wrote:

It does not really do the partition pruning via constraint exclusion and I don't
think anyone is going to use the remainder in the where condition to fetch
data and hash partitioning is not meant for that.

But I am sure that we could solve this problem using your and Beena's work
toward faster partition pruning[1] and Runtime Partition Pruning[2].

Will think on this changes if it is required for the pruning feature.



Could you rebase on latest master ?

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] initdb w/ restart

2017-09-28 Thread Jesper Pedersen


Hi,

In a case of

initdb /tmp/pgsql

followed by

pg_ctl -D /tmp/pgsql/ -l /tmp/logfile restart

you'll get

pg_ctl: PID file "/tmp/pgsql/postmaster.pid" does not exist
Is server running?
starting server anyway
pg_ctl: could not read file "/tmp/pgsql/postmaster.opts"

The attached patch changes the message to "trying to start server 
anyway" to highlight it is an attempt, not something that will happen.


Probably not a good idea to change the logic around pg_ctl.c:688, hence 
this suggestion.


Thoughts ?

Best regards,
 Jesper
>From 9e8cdda3173a25f1e14cccd7261877f160d1b0f7 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Thu, 28 Sep 2017 15:31:24 -0400
Subject: [PATCH] Change message for restarting a server from a directory
 without a PID file. This account for the case where a restart happens after
 an initdb

Author: Jesper Pedersen 
---
 src/bin/pg_ctl/pg_ctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 4e02c4cea1..f5281a36a8 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -965,7 +965,7 @@ do_restart(void)
 		write_stderr(_("%s: PID file \"%s\" does not exist\n"),
 	 progname, pid_file);
 		write_stderr(_("Is server running?\n"));
-		write_stderr(_("starting server anyway\n"));
+		write_stderr(_("trying to start server anyway\n"));
 		do_start();
 		return;
 	}
-- 
2.13.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitions: \d vs \d+

2017-09-28 Thread Jesper Pedersen


On 09/28/2017 09:19 AM, Maksim Milyutin wrote:
E.g. "No partition constraint" vs. "Partition constraint: 
satisfies_hash_partition(...)".


I also noticed ambiguity in printing "No partition constraint" in 
non-verbose mode and "Partition constraint:..." in verbose one for 
partition tables regardless of the type of partition.
Attached small patch removes any output about partition constraint in 
non-verbose mode.




Yeah, that could be one way.

It should likely be backported to REL_10_STABLE, so the question is if 
we are too late in the release cycle to change that output.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partitions: \d vs \d+

2017-09-28 Thread Jesper Pedersen


Hi,

Using hash partitions I noticed that \d gives

D=# \d T_p63
   Table "public.T_p63"
Column | Type  | Collation | Nullable | Default
---+---+---+--+-



Partition of: T FOR VALUES WITH (modulus 64, remainder 63)
No partition constraint
Indexes:
"T_p63" btree (X, Y)

where as \d+ gives

D=# \d+ T_p63
   Table "public.T_p63"
Column | Type  | Collation | Nullable | Default | 
Storage  | Stats target | Description

---+---+---+--+-+--+--+-



Partition of: T FOR VALUES WITH (modulus 64, remainder 63)
Partition constraint: satisfies_hash_partition(64, 63, 
hashint4extended(X, '8816678312871386367'::bigint))

Indexes:
"T_p63" btree (X, Y)

E.g. "No partition constraint" vs. "Partition constraint: 
satisfies_hash_partition(...)".


Current master (7769fc000) with [1] and [2].

[1] https://commitfest.postgresql.org/14/1059/
[2] https://commitfest.postgresql.org/14/1089/

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multicolumn hash indexes

2017-09-27 Thread Jesper Pedersen


On 09/26/2017 08:11 PM, Robert Haas wrote:

On Tue, Sep 26, 2017 at 7:18 PM, Tom Lane  wrote:

Tomasz Ostrowski  writes:

I've noticed that hash indexes can't currently (in PG10) be multicolumn.
Are they technically hard to implement or just nobody took such a feature?


It's not simple, particularly not if you wish that the index would support
queries specifying conditions for just a subset of the indexed columns
(an assumption that's buried pretty deeply in the planner, for one thing).
Then you couldn't compute the hash.


Whoa, that seems like moving the goalposts.  Somebody could design a
hash index that was intended to answer such queries, but it's not the
one we've got.  I think we should just aim to support equality queries
on all columns.  That seems like a fairly straightforward
generalization of what we've already got.



This would require that the applications that are using the database 
knows about the index structure in order to pass down the right columns.


I would say that in most cases that applications doesn't know about 
index structures. So, multiple indexes would have to be created (col1), 
(col1, col2), (col1, col3), ... which isn't ideal.


Maybe an initial proof-of-concept could store the hash of the first 
column (col1) plus the hash of all columns (col1, col2, col3) in the 
index, and see what requirements / design decisions would appear from that.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-27 Thread Jesper Pedersen


On 09/27/2017 03:05 AM, amul sul wrote:

Attached rebased patch, thanks.



While reading through the patch I thought it would be better to keep
MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
highlight that these are "keywords" for hash partition.

Also updated some of the documentation.



Thanks a lot for the patch, included in the attached version.



Thank you.

Based on [1] I have moved the patch to "Ready for Committer".

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYsw3pusDen4_A44c7od%2BbEAST0eYo%2BjODtyofR0W2soQ%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Jesper Pedersen

On 09/26/2017 10:33 AM, Robert Haas wrote:

On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen
wrote:

Could you share your thoughts on the usage of PartitionAppendInfo's
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.

This brings up something that I've kind of been thinking about. There
are sort of four cases when it comes to partition pruning:

1. There is exactly one matching partition. For example, this happens
when there is an equality constraint on every partition column.

2. There are multiple matching partitions which are consecutive. For
example, there is a single level of range partitioning with no default
partition and the single partitioning column is constrained by < > <=
or >=.

3. There are multiple matching partitions which are not consecutive.
This case is probably rare, but it can happen if there is a default
partition, if there are list partitions with multiple bounds that are
interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5),
and the query allows values >= 4 and <= 5), if the query involves OR
conditions, or if there are multiple levels of partitioning (e.g.
partition by a, subpartition by b, put a range constraint on a and an
equality constraint on b).

4. There are no matching partitions.

One of the goals of this algorithm is to be fast. The obvious way to
cater to case (3) is to iterate through all partitions and test
whether each one works, returning a Bitmapset, but that is O(n).
Admittedly, it might be O(n) with a pretty small constant factor, but
it still seems like exactly the sort of thing that we want to avoid
given the desire to scale to higher partition counts.

I propose that we create a structure that looks like this:

struct foo {
int min_partition;
int max_partition;
Bitmapset *extra_partitions;
};

This indicates that all partitions from min_partition to max_partition
need to be scanned, and in addition any partitions in extra_partitions
need to be scanned. Assuming that we only consider cases where all
partition keys or a leading subset of the partition keys are
constrained, we'll generally be able to get by with just setting
min_partition and max_partition, but extra_partitions can be used to
handle default partitions and interleaved list bounds. For equality
on all partitioning columns, we can do a single bsearch of the bounds
to identify the target partition at a given partitioning level, and
the same thing works for a single range-bound. If there are two
range-bounds (< and > or <= and >= or whatever) we need to bsearch
twice. The default partition, if any and if matched, must also be
included. When there are multiple levels of partitioning things get a
bit more complex -- if someone wants to knock out a partition that
breaks up the range, we might need to shrink the main range to cover
part of it and kick the other indexes out to extra_partitions.

But the good thing is that in common cases with only O(lg n) effort we
can return O(1) data that describes what will be scanned. In cases
where that's not practical we expend more effort but still prune with
maximal effectiveness.

For OLTP style applications 1) would be the common case, and with hash
partitions it would be one equality constraint.

So, changing the method signature to use a data type as you described
above instead of the explicit min_datum_idx / max_datum_idx output
parameters would be more clear.

One could advocate (*cough*) that the hash partition patch [1] should be
merged first in order to find other instances of where other CommitFest
entries doesn't account for hash partitions at the moment in their
method signatures; Beena noted something similar in [2]. I know that you
said otherwise [3], but this is CommitFest 1, so there is time for a
revert later, and hash partitions are already useful in internal testing.

[1] https://commitfest.postgresql.org/14/1059/
[2]
https://www.postgresql.org/message-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A%40mail.gmail.com

[3] http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve catcache/syscache performance.

2017-09-26 Thread Jesper Pedersen


On 09/26/2017 06:41 AM, tushar wrote:

On 09/22/2017 11:45 AM, Andres Freund wrote:

Here's a variant that cleans up the previous changes a bit, and adds
some further improvements:


I tested with different pgbench options with  master v/s patch and found 
an improvement.  I have applied 001 and 003 patch on PG Head ,patch 0002 
was already committed.


Virtual Machine configuration - Centos 6.5 x64 / 16 GB RAM / 8 VCPU core 
processor


Scaling factor=30

pgbench -M prepared -T 200 postgres

PG Head   -  tps = 902.225954 (excluding connections establishing).
PG HEAD+patch -  tps = 1001.896381 (10.97+% vs. head)


pgbench -M prepared -T 300 postgres

PG Head   -  tps = 920.108333 (excluding connections establishing).
PG HEAD+patch -  tps = 1023.89542 (11.19+% vs. head)

pgbench -M prepared -T 500 postgres

PG Head   -  tps = 995.178227 (excluding connections establishing)
PG HEAD+patch -  tps = 1078.3 (+8.34% vs. head)


Later I modified the create_many_cols.sql file (previously attached) and 
instead of
only using int  , I mixed it with varchar/int4/numeric/float and run 
pgbench

with different time duration


pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 300  postgres

PG Head   -  tps =  5540.143877 (excluding connections establishing).
PG HEAD+patch -  tps =  5679.713493 (2.50+% vs. head)


pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 500  postgres

PG Head   -  tps = 5519.212709 (excluding connections establishing).
PG HEAD+patch -  tps = 5967.059155 (8.11+% vs. head)


pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 700  postgres

PG Head   -  tps = 5640.314495(excluding connections establishing).
PG HEAD+patch -  tps = 6012.223147 (6.59+% vs. head)



I'm also seeing a speedup on a 2S/28C/56T/256Gb + 2 x RAID10 SSD machine 
using -M prepared, -M prepared -N and -M prepared -S scenarios with 
various scale factors, and custom queries.


Small typo in 0002- / commit 791961:

diff --git a/src/include/utils/hashutils.h b/src/include/utils/hashutils.h
index 35281689e8..366bd0e78b 100644
--- a/src/include/utils/hashutils.h
+++ b/src/include/utils/hashutils.h
@@ -22,7 +22,7 @@ hash_combine(uint32 a, uint32 b)


 /*
- * Simple inline murmur hash implementation hashing a 32 bit ingeger, for
+ * Simple inline murmur hash implementation hashing a 32 bit integer, for
  * performance.
  */
 static inline uint32

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

2017-09-26 Thread Jesper Pedersen


Hi Amit,

On 09/15/2017 04:50 AM, Amit Langote wrote:

On 2017/09/15 11:16, Amit Langote wrote:

I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.


Rebased patches attached.  Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required.  Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.

A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1].  Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here.  It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches.  I would certainly like
to review  0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).



Could you share your thoughts on the usage of PartitionAppendInfo's 
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.


I'm looking at get_partitions_for_keys.

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-18 Thread Jesper Pedersen


On 09/15/2017 02:30 AM, amul sul wrote:

Attached rebased patch, thanks.



While reading through the patch I thought it would be better to keep 
MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order 
to highlight that these are "keywords" for hash partition.


Also updated some of the documentation.

V20 patch passes make check-world, and my testing (typical 64 
partitions, and various ATTACH/DETACH scenarios).


Thanks for working on this !

Best regards,
 Jesper
>From 189a40a5ca6c7a1bc79b750cbc95584b3061fda5 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Mon, 18 Sep 2017 11:13:54 -0400
Subject: [PATCH] * Documentation updates * Use caps for MODULUS / REMAINDER
 when CREATE TABLE is so too

---
 doc/src/sgml/ddl.sgml  | 10 -
 doc/src/sgml/ref/alter_table.sgml  |  2 +-
 src/backend/catalog/partition.c|  8 +++
 src/test/regress/expected/alter_table.out  | 20 +-
 src/test/regress/expected/create_table.out | 34 +++---
 src/test/regress/sql/alter_table.sql   | 20 +-
 src/test/regress/sql/create_table.sql  | 28 
 7 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 24b36caad3..e38d8fc0a0 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2881,11 +2881,11 @@ VALUES ('Albany', NULL, NULL, 'NY');
 

 
- The table is partitioned by specifying modulus and remainder for each
- partition. Each partition holds rows for which the hash value of
- partition keys when divided by specified modulus produces specified
- remainder. For more clarification on modulus and remainder please refer
- .
+ The table is partitioned by specifying a modulus and a remainder for each
+ partition. Each partition will hold the rows for which the hash value of
+ the partition key divided by the specified modulus will produce the specified
+ remainder. Refer to 
+ for additional clarification on modulus and remainder.
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index a6eefb8564..b5fb93edac 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1424,7 +1424,7 @@ ALTER TABLE cities
Attach a partition to hash partitioned table:
 
 ALTER TABLE orders
-ATTACH PARTITION orders_p4 FOR VALUES WITH (modulus 4, remainder 3);
+ATTACH PARTITION orders_p4 FOR VALUES WITH (MODULUS 4, REMAINDER 3);
 
 
   
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5c11c7ecea..3696b9a711 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1694,13 +1694,13 @@ make_partition_op_expr(PartitionKey key, int keynum,
  *	CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
  *
  * CREATE TABLE p_p1 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 2, remainder 1);
+ *	FOR VALUES WITH (MODULUS 2, REMAINDER 1);
  * CREATE TABLE p_p2 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 4, remainder 2);
+ *	FOR VALUES WITH (MODULUS 4, REMAINDER 2);
  * CREATE TABLE p_p3 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 8, remainder 0);
+ *	FOR VALUES WITH (MODULUS 8, REMAINDER 0);
  * CREATE TABLE p_p4 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 8, remainder 4);
+ *	FOR VALUES WITH (MODULUS 8, REMAINDER 4);
  *
  * This function will return one of the following in the form of an
  * expression:
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 124cbe483c..304fb97291 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3490,22 +3490,22 @@ CREATE TABLE hash_parted (
 	a int,
 	b int
 ) PARTITION BY HASH (a custom_opclass);
-CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (modulus 4, remainder 0);
+CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (MODULUS 4, REMAINDER 0);
 CREATE TABLE fail_part (LIKE hpart_1);
-ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (modulus 8, remainder 4);
+ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 4);
 ERROR:  partition "fail_part" would overlap partition "hpart_1"
-ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (modulus 8, remainder 0);
+ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 0);
 ERROR:  partition "fail_part" would overlap partition "hpart_1"
 DROP TABLE fail_part;
 -- check validation when attaching hash partitions
 -- check that violating rows are correctly reported
 CREATE TABLE hpart_2 (LIKE hash_parted);
 INSERT INTO hpart_2 VALUES (3, 0);
-ALTER TABLE hash_parted ATTACH PARTITION hpart_2 FOR VALUES WITH (modulus 4, remainder 1

Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen


On 09/14/2017 01:52 PM, Robert Haas wrote:

On Thu, Sep 14, 2017 at 1:07 PM, Jesper Pedersen
 wrote:

Yeah, it would be nice to have a syntax like

) PARTITION BY HASH (col) WITH (AUTO_CREATE = 64);

But then there also needs to be a way to create the 64 associated indexes
too for everything to be easy.


Well, for that, there's this proposal:

http://postgr.es/m/c8fe4f6b-ff46-aae0-89e3-e936a35f0...@postgrespro.ru

As several people have right pointed out, there's a lot of work to be
done on partitioning it to get it to where we want it to be.  Even in
v10, it's got significant benefits, such as much faster bulk-loading,
but I don't hear anybody disputing the notion that a lot more work is
needed.  The good news is that a lot of that work is already in
progress; the bad news is that a lot of that work is not done yet.

But I think that's OK.  We can't solve every problem at once, and I
think we're moving things along here at a reasonably brisk pace.  That
didn't stop me from complaining bitterly to someone just yesterday
that we aren't moving faster still, but unfortunately EnterpriseDB has
only been able to get 12 developers to do any work at all on
partitioning this release cycle, and 3 of those have so far helped
only with review and benchmarking.  It's a pity we can't do more, but
considering how many community projects are 1-person efforts I think
it's pretty good.

To be clear, I know you're not (or at least I assume you're not)
trying to beat me up about this, just raising a concern, and I'm not
trying to beat you up either, just let you know that it is definitely
on the radar screen but not there yet.



Definitely not a complain about the work being done.

I think the scope of Amul's and others work on hash partition support is 
where it needs to be. Improvements can always follow in future release.


My point was that is easy to script the definition of the partitions and 
their associated indexes, so it is more important to focus on the core 
functionality with the developer / review resources available.


However, it is a little bit difficult to follow the dependencies between 
different partition patches, so I may not always provide sane feedback, 
as seen in [1].


[1] 
https://www.postgresql.org/message-id/579077fd-8f07-aff7-39bc-b92c855cdb70%40redhat.com


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen


On 09/14/2017 12:56 PM, Robert Haas wrote:

On Thu, Sep 14, 2017 at 12:54 PM, David Fetter  wrote:

Should we be pointing the gun away from people's feet by making hash
partitions that cover the space automagically when the partitioning
scheme[1] is specified?  In other words, do we have a good reason to have
only some of the hash partitions so defined by default?


Sure, we can add some convenience syntax for that, but I'd like to get
the basic stuff working before doing that kind of polishing.

If nothing else, I assume Keith Fiske's pg_partman will provide a way
to magically DTRT about an hour after this goes in.  But probably we
can do better in core easily enough.



Yeah, it would be nice to have a syntax like

) PARTITION BY HASH (col) WITH (AUTO_CREATE = 64);

But then there also needs to be a way to create the 64 associated 
indexes too for everything to be easy.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen


Hi,

On 09/14/2017 12:05 PM, Robert Haas wrote:

On Thu, Sep 14, 2017 at 11:39 AM, Jesper Pedersen
 wrote:

When I do

CREATE TABLE mytab (
   a integer NOT NULL,
   b integer NOT NULL,
   c integer,
   d integer
) PARTITION BY HASH (b);

and create 64 partitions;

CREATE TABLE mytab_p00 PARTITION OF mytab FOR VALUES WITH (MODULUS 64,
REMAINDER 0);
...
CREATE TABLE mytab_p63 PARTITION OF mytab FOR VALUES WITH (MODULUS 64,
REMAINDER 63);

and associated indexes

CREATE INDEX idx_p00 ON mytab_p00 USING btree (b, a);
...
CREATE INDEX idx_p63 ON mytab_p63 USING btree (b, a);

Populate the database, and do ANALYZE.

Given

EXPLAIN (ANALYZE, VERBOSE, BUFFERS ON) SELECT a, b, c, d FROM mytab WHERE b
= 42

gives

Append
   -> Index Scan using idx_p00 (cost rows=7) (actual rows=0)
   ...
   -> Index Scan using idx_p63 (cost rows=7) (actual rows=0)

E.g. all partitions are being scanned. Of course one partition will contain
the rows I'm looking for.


Yeah, we need Amit Langote's work in
http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce7...@lab.ntt.co.jp
to land and this patch to be adapted to make use of it.  I think
that's the major thing still standing in the way of this. Concerns
were also raised about not having a way to see the hash function, but
we fixed that in 81c5e46c490e2426db243eada186995da5bb0ba7 and
hopefully this patch has been updated to use a seed (I haven't looked
yet).  And there was a concern about hash functions not being
portable, but the conclusion of that was basically that most people
think --load-via-partition-root will be a satisfactory workaround for
cases where that becomes a problem (cf. commit
23d7680d04b958de327be96ffdde8f024140d50e).  So this is the major
remaining issue that I know about.



Thanks for the information, Robert !

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen


Hi Amul,

On 09/14/2017 04:58 AM, amul sul wrote:

On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen 
This patch needs a rebase.



Thanks for your note.
 
Attached is the patch rebased on the latest master head.
Also added error on creating default partition for the hash partitioned 
table,
and updated document & test script for the same.



Thanks !

When I do

CREATE TABLE mytab (
  a integer NOT NULL,
  b integer NOT NULL,
  c integer,
  d integer
) PARTITION BY HASH (b);

and create 64 partitions;

CREATE TABLE mytab_p00 PARTITION OF mytab FOR VALUES WITH (MODULUS 64, 
REMAINDER 0);

...
CREATE TABLE mytab_p63 PARTITION OF mytab FOR VALUES WITH (MODULUS 64, 
REMAINDER 63);


and associated indexes

CREATE INDEX idx_p00 ON mytab_p00 USING btree (b, a);
...
CREATE INDEX idx_p63 ON mytab_p63 USING btree (b, a);

Populate the database, and do ANALYZE.

Given

EXPLAIN (ANALYZE, VERBOSE, BUFFERS ON) SELECT a, b, c, d FROM mytab 
WHERE b = 42


gives

Append
  -> Index Scan using idx_p00 (cost rows=7) (actual rows=0)
  ...
  -> Index Scan using idx_p63 (cost rows=7) (actual rows=0)

E.g. all partitions are being scanned. Of course one partition will 
contain the rows I'm looking for.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-09-14 Thread Jesper Pedersen


On 09/11/2017 11:01 AM, Jesper Pedersen wrote:

Thanks for working on this !



Moved to "Ready for Committer".

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-13 Thread Jesper Pedersen


Hi Amul,

On 09/08/2017 08:40 AM, amul sul wrote:

Rebased 0002 against this commit & renamed to 0001, PFA.



This patch needs a rebase.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2017-09-13 Thread Jesper Pedersen


Hi,

On 08/29/2017 05:04 AM, Mithun Cy wrote:

Test Setting:
=
Server configuration:
./postgres -c shared_buffers=8GB -N 300 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 -c
wal_buffers=256MB &

pgbench configuration:
scale_factor = 300
./pgbench -c $threads -j $threads -T $time_for_reading -M prepared -S  postgres

The machine has 64 cores with this patch I can see server starts
improvement after 64 clients. I have tested up to 256 clients. Which
shows performance improvement nearly max 39% [1]. This is the best
case for the patch where once computed snapshotData is reused further.

The worst case seems to be small, quick write transactions which
frequently invalidate the cached SnapshotData before it is reused by
any next GetSnapshotData call. As of now, I tested simple update
tests: pgbench -M Prepared -N on the same machine with the above
server configuration. I do not see much change in TPS numbers.

All TPS are median of 3 runs
Clients TPS-With Patch 05   TPS-Base%Diff
1 752.461117755.186777  -0.3%
64   32171.296537   31202.153576   +3.1%
128 41059.660769   40061.929658   +2.49%

I will do some profiling and find out why this case is not costing us
some performance due to caching overhead.



I have done a run with this patch on a 2S/28C/56T/256Gb w/ 2 x RAID10 
SSD machine.


Both for -M prepared, and -M prepared -S I'm not seeing any improvements 
(1 to 375 clients); e.g. +-1%.


Although the -M prepared -S case should improve, I'm not sure that the 
extra overhead in the -M prepared case is worth the added code complexity.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-09-11 Thread Jesper Pedersen


Hi,

On 09/08/2017 03:35 PM, Sokolov Yura wrote:

I'm seeing

-M prepared: Up to 11% improvement
-M prepared -S: No improvement, no regression ("noise")
-M prepared -N: Up to 12% improvement

for all runs the improvement shows up the closer you get to the number
of CPU threads, or above. Although I'm not seeing the same
improvements as you on very large client counts there are definitely
improvements :)


It is expected:
- patch "fixes NUMA": for example, it doesn't give improvement on 1 socket
   at all (I've tested it using numactl to bind to 1 socket)
- and certainly it gives less improvement on 2 sockets than on 4 sockets
   (and 28 cores vs 72 cores also gives difference),
- one of hot points were CLogControlLock, and it were fixed with
   "Group mode for CLOG updates" [1]



I'm planning to re-test that patch.



+static inline bool
+LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode,
LWLockMode waitmode)

I'll leave it to the Committer to decide if this method is too big to
be "inline".


GCC 4.9 doesn't want to inline it without directive, and function call
is then remarkable in profile.

Attach contains version with all suggestions applied except remove of
"inline".



Yes, ideally the method will be kept at "inline".


Open questions:
---
* spins_per_delay as extern
* Calculation of skip_wait_list


Currently calculation of skip_wait_list is mostly empirical (ie best
i measured).



Ok, good to know.


I strongly think that instead of spins_per_delay something dependent
on concrete lock should be used. I tried to store it in a LWLock
itself, but it were worse.


Yes, LWLock should be kept as small as possible, and cache line aligned 
due to the cache storms, as shown by perf c2c.



Recently I understand it should be stored in array indexed by tranche,
but I didn't implement it yet, and therefore didn't measure.



Different constants for the LWLock could have an impact, but the 
constants would also be dependent on machine setup, and work load.


Thanks for working on this !

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-09-08 Thread Jesper Pedersen


Hi,

On 07/18/2017 01:20 PM, Sokolov Yura wrote:

I'm sending rebased version with couple of one-line tweaks.
(less skip_wait_list on shared lock, and don't update spin-stat on 
aquiring)




I have been running this patch on a 2S/28C/56T/256Gb w/ 2 x RAID10 SSD 
setup (1 to 375 clients on logged tables).


I'm seeing

-M prepared: Up to 11% improvement
-M prepared -S: No improvement, no regression ("noise")
-M prepared -N: Up to 12% improvement

for all runs the improvement shows up the closer you get to the number 
of CPU threads, or above. Although I'm not seeing the same improvements 
as you on very large client counts there are definitely improvements :)


Some comments:
==

lwlock.c:
-
+ * This race is avoiding by taking lock for wait list using CAS with a old
+ * state value, so it could succeed only if lock is still held. 
Necessary flag

+ * is set with same CAS.
+ *
+ * Inside LWLockConflictsWithVar wait list lock is reused to protect 
variable

+ * value. So first it is acquired to check variable value, then flags are
+ * set with second CAS before queueing to wait list to ensure lock were not
+ * released yet.

 * This race is avoided by taking a lock for the wait list using CAS 
with the old
 * state value, so it would only succeed if lock is still held. 
Necessary flag

 * is set using the same CAS.
 *
 * Inside LWLockConflictsWithVar the wait list lock is reused to 
protect the variable
 * value. First the lock is acquired to check the variable value, then 
flags are
 * set with a second CAS before queuing to the wait list in order to 
ensure that the lock was not

 * released yet.


+static void
+add_lwlock_stats_spin_stat(LWLock *lock, SpinDelayStatus *delayStatus)

Add a method description.

+   /*
+* This barrier is never needed for correctness, and it 
is no-op
+* on x86. But probably first iteration of cas loop in
+* ProcArrayGroupClearXid will succeed oftener with it.
+*/

* "more often"

+static inline bool
+LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode, LWLockMode 
waitmode)


I'll leave it to the Committer to decide if this method is too big to be 
"inline".


+   /*
+* We intentionally do not call finish_spin_delay here, cause loop above
+* usually finished in queueing into wait list on contention, and 
doesn't
+* reach spins_per_delay (and so, doesn't sleep inside of
+* perform_spin_delay). Also, different LWLocks has very different
+* contention pattern, and it is wrong to update spin-lock statistic 
based
+* on LWLock contention.
+*/

/*
 * We intentionally do not call finish_spin_delay here, because the 
loop above
 * usually finished by queuing into the wait list on contention, and 
doesn't

 * reach spins_per_delay thereby doesn't sleep inside of
 * perform_spin_delay. Also, different LWLocks has very different
 * contention pattern, and it is wrong to update spin-lock statistic based
 * on LWLock contention.
 */


s_lock.c:
-
+   if (status->spins == 0)
+   /* but we didn't spin either, so ignore */
+   return;

Use { } for the if, or move the comment out of the nesting for readability.

Open questions:
---
* spins_per_delay as extern
* Calculation of skip_wait_list


You could run the patch through pgindent too.

Passes make check-world.

Status: Waiting on Author

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Jesper Pedersen


Hi Jeff,

On 09/05/2017 03:47 PM, Jeff Janes wrote:

I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using
both logged and unlogged tables. Also ran an internal benchmark which
didn't show anything either.



What scale factor and client count? How many cores per socket?  It looks
like Sokolov was just starting to see gains at 200 clients on 72 cores,
using -N transaction.


I have done a run with scale factor 300, and another with 3000 on a 
2S/28C/56T/256Gb w/ 2 x RAID10 SSD machine; up to 200 clients.


I would consider the runs as "noise" as I'm seeing +-1% for all client 
counts, so nothing like Yura is seeing in [1] for the higher client counts.


I did a run with -N too using scale factor 300, using the settings in 
[1], but with same result (+-1%).


[1] 
https://www.postgresql.org/message-id/d62d7d9d473d07e172d799d5a57e7...@postgrespro.ru


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jesper Pedersen


Hi,

On 05/25/2017 11:12 AM, Sokolov Yura wrote:

I agree that lonely semicolon looks bad.
Applied your suggestion for empty loop body (/* skip */).

Patch in first letter had while(true), but I removed it cause
I think it is uglier:
- `while(true)` was necessary for grouping read with `if`,
- but now there is single statement in a loop body and it is
   condition for loop exit, so it is clearly just a loop.

Optimization is valid cause compare_exchange always store old value
in `old` variable in a same atomic manner as atomic read.



I have tested this patch on a 2-socket machine, but don't see any 
performance change in the various runs. However, there is no regression 
either in all cases.


As such, I have marked the entry "Ready for Committer".

Remember to add a version postfix to your patches such that is easy to 
identify which is the latest version.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jesper Pedersen


On 09/05/2017 02:24 PM, Tom Lane wrote:

Jesper Pedersen  writes:

I have tested this patch on a 2-socket machine, but don't see any
performance change in the various runs. However, there is no regression
either in all cases.


Hm, so if we can't demonstrate a performance win, it's hard to justify
risking touching this code.  What test case(s) did you use?



I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using 
both logged and unlogged tables. Also ran an internal benchmark which 
didn't show anything either.


Setting the entry back to "Needs Review" for additional feedback from 
others.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jesper Pedersen


Hi,

On 05/25/2017 11:12 AM, Sokolov Yura wrote:

I agree that lonely semicolon looks bad.
Applied your suggestion for empty loop body (/* skip */).

Patch in first letter had while(true), but I removed it cause
I think it is uglier:
- `while(true)` was necessary for grouping read with `if`,
- but now there is single statement in a loop body and it is
   condition for loop exit, so it is clearly just a loop.

Optimization is valid cause compare_exchange always store old value
in `old` variable in a same atomic manner as atomic read.



I have tested this patch on a 2-socket machine, but don't see any 
performance change in the various runs. However, there is no regression 
either in all cases.


As such, I have marked the entry "Ready for Committer".

Remember to add a version postfix to your patches such that is easy to 
identify which is the latest version.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-24 Thread Jesper Pedersen


On 08/24/2017 01:21 AM, Ashutosh Sharma wrote:

Done.

Attached are the patches with above changes.



Thanks !

Based on the feedback in this thread, I have moved the patch to "Ready 
for Committer".


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Jesper Pedersen


On 08/23/2017 07:38 AM, Amit Kapila wrote:

Thanks for the new version.  I again looked at the patches and fixed
quite a few comments in the code and ReadMe.  You have forgotten to
update README for the changes in vacuum patch
(0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
don't have anything more to add.  If you are okay with changes, then
we can move it to Ready For Committer unless someone else has some
more comments.



Just some minor comments.

README:
+   it's pin till the end of scan)

its pin till the end of the scan)

+To minimize lock/unlock traffic, hash index scan always searches entire 
hash


To minimize lock/unlock traffic, hash index scan always searches the 
entire hash


hashsearch.c:

+static inline void _hash_saveitem(HashScanOpaque so, int itemIndex,
+  OffsetNumber offnum, IndexTuple itup);

There are other instances of "inline" in the code base, so I guess that 
this is ok.


+* Advance to next tuple on current page; or if there's no more, try to

Advance to the next tuple on the current page; or if done, try to

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dtrace probes

2017-04-21 Thread Jesper Pedersen


On 04/20/2017 10:30 AM, Jesper Pedersen wrote:

I think this fix is harmless and has some value in terms of
consistency.  One minor suggestion is that you should leave a space
after typecasting.

- TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);

There should be a space like "(int) mode".




v2 attached.



I managed to attach the same patch again, so here is v3.

Best regards,
 Jesper


From 0d964df84950ca90c08ed6dd77a575d4b70ea7db Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 18 Apr 2017 11:44:18 -0400
Subject: [PATCH] Explicit cast LWLockMode, which an enum, to int in order to
 match the dtrace definition of the lwlock methods. Thereby all call sites
 will have the same definition instead of a mix between signed and unsigned.

Author: Jesper Pedersen 
---
 src/backend/storage/lmgr/lwlock.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index 3e13394..c551be2 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1226,7 +1226,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
LWLockReportWaitStart(lock);
-   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int) mode);
 
for (;;)
{
@@ -1248,7 +1248,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
}
 #endif
 
-   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int) mode);
LWLockReportWaitEnd();
 
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1257,7 +1257,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
result = false;
}
 
-   TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), (int) mode);
 
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1308,14 +1308,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
RESUME_INTERRUPTS();
 
LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), (int) 
mode);
}
else
{
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
-   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), (int) mode);
}
return !mustwait;
 }
@@ -1387,7 +1387,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
LWLockReportWaitStart(lock);
-   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int) 
mode);
 
for (;;)
{
@@ -1405,7 +1405,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
Assert(nwaiters < MAX_BACKENDS);
}
 #endif
-   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int) 
mode);
LWLockReportWaitEnd();
 
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1435,7 +1435,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
/* Failed to get lock, so release interrupt holdoff */
RESUME_INTERRUPTS();
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), 
mode);
+   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), 
(int) mode);
}
else
{
@@ -1443,7 +1443,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
-   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), (int) 
mode);
}
 
return !mustwait;
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dtrace probes

2017-04-20 Thread Jesper Pedersen


Hi,

On 04/20/2017 09:24 AM, Amit Kapila wrote:

The lwlock dtrace probes define LWLockMode as int, and the
TRACE_POSTGRESQL_LWLOCK methods are called using both a variable and
constant definition.

This leads to a mix of argument definitions depending on the call site, as
seen in probes.txt file.

A fix is to explicit cast 'mode' to int such that all call sites will use
the

 argument #2 4 signed   bytes

definition. Attached patch does this.



I think this fix is harmless and has some value in terms of
consistency.  One minor suggestion is that you should leave a space
after typecasting.

- TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);

There should be a space like "(int) mode".




v2 attached.


I have verified all dtraces probes for their type, and only the lock__
methods doesn't aligned with its actual types.



Do you see any problem with that?



Not really, but it would be more clear what the value space of each of 
the parameters were.




Depending on the feedback I can add this patch to the open item list in
order to fix it for PostgreSQL 10.



Is there any commit in PG-10 which has caused this behavior?  If not,
then I don't think it should be added to open items of PG-10.




It is really a bug fix, so it could even be back patched.

Thanks for the feedback !

Best regards,
 Jesper


>From 7175dc8e05ff703229bd6cab6b254587ffc076c8 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 18 Apr 2017 11:44:18 -0400
Subject: [PATCH] Explicit cast LWLockMode, which an enum, to int in order to
 match the dtrace definition of the lwlock methods. Thereby all call sites
 will have the same definition instead of a mix between signed and unsigned.

Author: Jesper Pedersen 
---
 src/backend/storage/lmgr/lwlock.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..abf5fbb 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1226,7 +1226,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int)mode);
 
 		for (;;)
 		{
@@ -1248,7 +1248,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1257,7 +1257,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), (int)mode);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1308,14 +1308,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), (int)mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), (int)mode);
 	}
 	return !mustwait;
 }
@@ -1387,7 +1387,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int)mode);
 
 			for (;;)
 			{
@@ -1405,7 +1405,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1435,7 +1435,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), (int)mode);
 	}
 	else
 	{
@@ -1443,7 +1443,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), (int)mode);
 	}

[HACKERS] dtrace probes

2017-04-18 Thread Jesper Pedersen


Hi,

The lwlock dtrace probes define LWLockMode as int, and the 
TRACE_POSTGRESQL_LWLOCK methods are called using both a variable and 
constant definition.


This leads to a mix of argument definitions depending on the call site, 
as seen in probes.txt file.


A fix is to explicit cast 'mode' to int such that all call sites will 
use the


 argument #2 4 signed   bytes

definition. Attached patch does this.

I have verified all dtraces probes for their type, and only the lock__ 
methods doesn't aligned with its actual types. However, that would 
require a change to probes.d, and therefore PostgreSQL 11 material.


Depending on the feedback I can add this patch to the open item list in 
order to fix it for PostgreSQL 10.


Best regards,
 Jesper
>From d6f5c119c057c7ff8c84f88bb4122a1ca245a7d4 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 18 Apr 2017 11:44:18 -0400
Subject: [PATCH] Explicit cast LWLockMode, which an enum, to int in order to
 match the dtrace definition of the lwlock methods. Thereby all call sites
 will have the same definition instead of a mix between signed and unsigned.

Author: Jesper Pedersen 
---
 src/backend/storage/lmgr/lwlock.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..abf5fbb 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1226,7 +1226,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int)mode);
 
 		for (;;)
 		{
@@ -1248,7 +1248,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1257,7 +1257,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), (int)mode);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1308,14 +1308,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), (int)mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), (int)mode);
 	}
 	return !mustwait;
 }
@@ -1387,7 +1387,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int)mode);
 
 			for (;;)
 			{
@@ -1405,7 +1405,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1435,7 +1435,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), (int)mode);
 	}
 	else
 	{
@@ -1443,7 +1443,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), (int)mode);
 	}
 
 	return !mustwait;
-- 
2.7.4

/usr/local/bin/postgres postgresql:clog__checkpoint__start [sema 0xe2351a]
  location #1 0x4fc706
argument #1 1 signed   bytes @ 0
  location #2 0x4fc72d
argument #1 1 signed   bytes @ 1
/usr/local/bin/postgres postgresql:clog__checkpoint__done [sema 0xe2351c]
  location #1 0x4fc725
argument #1 1 signed   bytes @ 0
  location #2 0x4fc74c
argument #1 1 signed   bytes @ 1
/usr/local/bin/postgres postgresql:multixact__checkpoint__start [sema 0xe23522]
  location #1 0x500d07
argument #1 1 signed   bytes @ 0
  location #2 0x500dbc
argument #1 1 signed   bytes @ 1
/usr/local/bin/postgres postgresql:mu

Re: [HACKERS] bumping HASH_VERSION to 3

2017-03-31 Thread Jesper Pedersen


On 03/31/2017 02:17 PM, Robert Haas wrote:

Starting a new thread about this to get more visibility.

Despite the extensive work that has been done on hash indexes this
release, we have thus far not made any change to the on-disk format
that is not nominally backward-compatible.  Commit
293e24e507838733aba4748b514536af2d39d7f2 did make a change for new
hash indexes, but included backward-compatibility code so that old
indexes would continue to work.  However, I'd like to also commit
Mithun Cy's patch to expand hash indexes more gradually -- latest
version in 
http://postgr.es/m/cad__oujd-ibxm91zcqziayftwqjxnfqgmv361v9zke83s6i...@mail.gmail.com
-- and that's not backward-compatible.

It would be possible to write code to convert the old metapage format
to the new metapage format introduced by that patch, and it wouldn't
be very hard, but I think it would be better to NOT do that, and
instead force everybody upgrading to v10 to rebuild all of their hash
indexes.   If we don't do that, then we'll never know whether
instances of hash index corruption reported against v10 or higher are
caused by defects in the new code, because there's always the chance
that the hash index could have been built on a pre-v10 version, got
corrupted because of the lack of WAL-logging, and then been brought up
to v10+ via pg_upgrade.  Forcing a reindex in v10 kills three birds
with one stone:

- No old, not logged, possibly corrupt hash indexes floating around
after an upgrade to v10.
- Can remove the backward-compatibility code added by
293e24e507838733aba4748b514536af2d39d7f2 instead of keeping it around
forever.
- No need to worry about doing an in-place upgrade of the metapage for
the above-mentioned patch.

Thoughts?



+1

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-30 Thread Jesper Pedersen


Hi Ashutosh,

On 03/29/2017 09:16 PM, Ashutosh Sharma wrote:

This patch needs a rebase.


Please try applying these patches on top of [1]. I think you should be able
to apply it cleanly. Sorry, I think I forgot to mention this point in my
earlier mail.

[1] -
https://www.postgresql.org/message-id/CAE9k0P%3DV2LhtyeMXd295fhisp%3DNWUhRVJ9EZQCDowWiY9rSohQ%40mail.gmail.com



Thanks, that works !

As you have provided a patch for Robert's comments, and no other review 
have been posted I'm moving this patch to "Ready for Committer" for 
additional committer feedback.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-29 Thread Jesper Pedersen


Hi,

On 03/27/2017 09:34 AM, Ashutosh Sharma wrote:

Hi,


I think you should consider refactoring this so that it doesn't need
to use goto.  Maybe move the while (offnum <= maxoff) logic into a
helper function and have it return itemIndex.  If itemIndex == 0, you
can call it again.


okay, Added a helper function for _hash_readpage(). Please check v4
patch attached with this mail.


>>

This is not a full review, but I'm out of time for the moment.


No worries. I will be ready for your further review comments any time.
Thanks for the review.



This patch needs a rebase.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Jesper Pedersen


Hi Mithun,

On 03/26/2017 01:56 AM, Mithun Cy wrote:

Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:


I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.


Great!, I will continue with spares 1-dimensional improvement.



I ran some performance scenarios on the patch to see if the increased 
'spares' allocation had an impact. I havn't found any regressions in 
that regard.


Attached patch contains some small fixes, mainly to the documentation - 
on top of v7.


Best regards,
 Jesper

>From 5545e48ab7136f17b3d471e0ee679a6db6040865 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Mon, 27 Mar 2017 14:15:00 -0400
Subject: [PATCH] Small fixes

---
 src/backend/access/hash/README | 50 +++---
 src/backend/access/hash/hashpage.c | 26 ++--
 src/backend/access/hash/hashutil.c | 24 +-
 3 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 6721ee1..ca46de7 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -58,50 +58,50 @@ rules to support a variable number of overflow pages while not having to
 move primary bucket pages around after they are created.
 
 Primary bucket pages (henceforth just "bucket pages") are allocated in
-power-of-2 groups, called "split points" in the code.  That means on every new
-split point we double the existing number of buckets.  And, it seems bad to
-allocate huge chunks of bucket pages all at once and we take ages to consume it.
-To avoid this exponential growth of index size, we did a trick to breakup
-allocation of buckets at splitpoint into 4 equal phases.  If 2^x is the total
-buckets need to be allocated at a splitpoint (from now on we shall call this
-as splitpoint group), then we allocate 1/4th (2^(x - 2)) of total buckets at
-each phase of splitpoint group. Next quarter of allocation will only happen if
+power-of-2 groups, called "split points" in the code.  That means at every new
+split point we double the existing number of buckets.  Allocating huge chucks
+of bucket pages all at once isn't optimal and we will take ages to consume those.
+To avoid this exponential growth of index size, we did use a trick to breakup
+allocation of buckets at the split points into 4 equal phases.  If 2^x is the total
+buckets needed to be allocated at a split point (from now on we shall call this
+a split point group), then we allocate 1/4th (2^(x - 2)) of total buckets at
+each phase of the split point group. Next quarter of allocation will only happen if
 buckets of previous phase has been already consumed.  Since for buckets number
-< 4 we cannot further divide it in to multiple phases, the first splitpoint
+< 4 we cannot further divide it in to multiple phases, the first split point
 group 0's allocation is done as follows {1, 1, 1, 1} = 4 buckets in total, the
 numbers in curly braces indicate number of buckets allocated within each phase
-of splitpoint group 0.  In next splitpoint group 1 the allocation phases will
-be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for splitpoint group 2
+of split point group 0.  In next split point group 1 the allocation phases will
+be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for split point group 2
 and 3 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and
-{4, 4, 4, 4} = 32 buckets in total.  We can see that at each splitpoint group
-we double the total number of buckets from previous group but in a incremental
-phase.  The bucket pages allocated within one phase of a splitpoint group will
+{4, 4, 4, 4} = 32 buckets in total.  We can see that at each split point group
+we double the total number of buckets from previous group but in an incremental
+phase.  The bucket pages allocated within one phase of a split point group will
 appear consecutively in the index.  This addressing scheme allows the physical
 location of a bucket page to be computed from the bucket number relatively
 easily, using only a small amount of control information.  If we look at the
-function _hash_spareindex for a given bucket number we first compute splitpoint
-group it belongs to and then the phase with in it to which the bucket belongs
-to.  Adding them we get the global splitpoint phase number S to which the
+function _hash_spareindex for a given bucket number we first compute the split point
+group it belongs to and then the phase to which the bucket belongs
+to.  Adding them we get the global split point phase number S to which the
 bucket belongs and then simply add "hashm_spares[S] + 1" (where hashm_spares[] is
 an array stored in the metapage) with given bucket number to compute its
 physical address.  The hashm_spares[S] can be interpreted as the total number
 of overflow pages that have been

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Jesper Pedersen


Hi,

On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:

On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:

0001v2:

In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
part, and do the assignment inside

  if (so->numKilled < MaxIndexTuplesPerPage)

instead.



Done. Please have a look into the attached v3 patch.



No new comments for 0002 and 0003.


okay. Thanks.



I'll keep the entry in 'Needs Review' if Alexander, or others, want to 
add their feedback.


(Best to post the entire patch series each time)

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Jesper Pedersen


Hi,

On 03/22/2017 09:32 AM, Ashutosh Sharma wrote:

Done. Please refer to the attached v2 version of patch.



Thanks.


1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
patch rewrites the hash index scan module to work in page-at-a-time
mode. It basically introduces two new functions-- _hash_readpage() and
_hash_saveitem(). The former is used to load all the qualifying tuples
from a target bucket or overflow page into an items array. The latter
one is used by _hash_readpage to save all the qualifying tuples found
in a page into an items array. Apart from that, this patch bascially
cleans _hash_first(), _hash_next and hashgettuple().



0001v2:

In hashgettuple() you can remove the 'currItem' and 'offnum' from the 
'else' part, and do the assignment inside


  if (so->numKilled < MaxIndexTuplesPerPage)

instead.


No new comments for 0002 and 0003.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-21 Thread Jesper Pedersen


Hi,

On 02/14/2017 12:27 AM, Ashutosh Sharma wrote:

Currently, Hash Index scan works tuple-at-a-time, i.e. for every
qualifying tuple in a page, it acquires and releases the lock which
eventually increases the lock/unlock traffic. For example, if an index
page contains 100 qualified tuples, the current hash index scan has to
acquire and release the lock 100 times to read those qualified tuples
which is not good from performance perspective and it also impacts
concurency with VACUUM.

Considering above points, I would like to propose a patch that allows
hash index scan to work in page-at-a-time mode. In page-at-a-time
mode, once lock is acquired on a target bucket page, the entire page
is scanned and all the qualified tuples are saved into backend's local
memory. This reduces the lock/unlock calls for retrieving tuples from
a page. Moreover, it also eliminates the problem of re-finding the
position of the last returned index tuple and more importanly it
allows VACUUM to release lock on current page before moving to the
next page which eventually improves it's concurrency with scan.

Attached patch modifies hash index scan code for page-at-a-time mode.
For better readability, I have splitted it into 3 parts,



Due to the commits on master these patches applies with hunks.

The README should be updated to mention the use of page scan.

hash.h needs pg_indent.


1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
patch rewrites the hash index scan module to work in page-at-a-time
mode. It basically introduces two new functions-- _hash_readpage() and
_hash_saveitem(). The former is used to load all the qualifying tuples
from a target bucket or overflow page into an items array. The latter
one is used by _hash_readpage to save all the qualifying tuples found
in a page into an items array. Apart from that, this patch bascially
cleans _hash_first(), _hash_next and hashgettuple().



For _hash_next I don't see this - can you explain ?

+ *
+ * On failure exit (no more tuples), we release pin and set
+ * so->currPos.buf to InvalidBuffer.


+ * Returns true if any matching items are found else returns false.

s/Returns/Return/g


2) 0002-Remove-redundant-function-_hash_step-and-some-of-the.patch:
this patch basically removes the redundant function _hash_step() and
some of the unused members of HashScanOpaqueData structure.



Looks good.


3) 0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index.patch:
this patch basically improves the locking strategy for VACUUM in hash
index. As the new hash index scan works in page-at-a-time, vacuum can
release the lock on previous page before acquiring a lock on the next
page, hence, improving hash index concurrency.



+* As the new hash index scan work in page at a time mode,

Remove 'new'.


I have also done the benchmarking of this patch and would like to
share the results for the same,

Firstly, I have done the benchmarking with non-unique values and i
could see a performance improvement of 4-7%. For the detailed results
please find the attached file 'results-non-unique values-70ff', and
ddl.sql, test.sql are test scripts used in this experimentation. The
detail of non-default GUC params and pgbench command are mentioned in
the result sheet. I also did the benchmarking with unique values at
300 and 1000 scale factor and its results are provided in
'results-unique-values-default-ff'.



I'm seeing similar results, and especially with write heavy scenarios.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-03 Thread Jesper Pedersen


On 02/03/2017 11:41 AM, Jesper Pedersen wrote:

contrib/pageinspect actually seems to lack *any* infrastructure
for sharing functions across modules.  It's time to remedy that.
I propose inventing "pageinspect.h" to hold commonly visible
declarations, and moving get_page_from_raw() into rawpage.c,
which seems like a reasonably natural home for it.  (Alternatively,
we could invent a pageinspect.c file to hold utility functions,
but that might be overkill.)


No objections.



Attached is v1 of this w/ verify_hash_page() using get_page_from_raw().

Sorry for all the problems.



Disregard, as Tom has committed a fix.

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-03 Thread Jesper Pedersen


Hi,

On 02/03/2017 11:36 AM, Robert Haas wrote:

On Fri, Feb 3, 2017 at 10:11 AM, Tom Lane  wrote:

BTW, the buildfarm is still crashing on ia64 and sparc64 members.
I believe this is the same problem addressed in 84ad68d64 for
pageinspect's GIN functions, to wit, the payload of a "bytea"
is not maxaligned, so machines that care about alignment won't be
happy when trying to fetch 64-bit values out of a bytea page image.

Clearly, the fix should be to start using the get_page_from_raw()
function that Peter introduced in that patch.  But it's static in
ginfuncs.c, which I thought was a mistake at the time, and it's
proven so now.

contrib/pageinspect actually seems to lack *any* infrastructure
for sharing functions across modules.  It's time to remedy that.
I propose inventing "pageinspect.h" to hold commonly visible
declarations, and moving get_page_from_raw() into rawpage.c,
which seems like a reasonably natural home for it.  (Alternatively,
we could invent a pageinspect.c file to hold utility functions,
but that might be overkill.)


No objections.



Attached is v1 of this w/ verify_hash_page() using get_page_from_raw().

Sorry for all the problems.

Best regards,
 Jesper

>From d674c58098580ec59f9e1e47255c68b8497178b1 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 3 Feb 2017 11:28:19 -0500
Subject: [PATCH] Add pageinspect.h

---
 contrib/pageinspect/brinfuncs.c   |  1 +
 contrib/pageinspect/btreefuncs.c  |  1 +
 contrib/pageinspect/fsmfuncs.c|  1 +
 contrib/pageinspect/ginfuncs.c| 21 +-
 contrib/pageinspect/hashfuncs.c   |  3 +-
 contrib/pageinspect/pageinspect.h | 61 +++
 contrib/pageinspect/rawpage.c | 25 
 7 files changed, 92 insertions(+), 21 deletions(-)
 create mode 100644 contrib/pageinspect/pageinspect.h

diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 7b877e3..23136d2 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -9,6 +9,7 @@
  */
 #include "postgres.h"
 
+#include "pageinspect.h"
 #include "access/htup_details.h"
 #include "access/brin.h"
 #include "access/brin_internal.h"
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 3f09d5f..e63b5fb 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -27,6 +27,7 @@
 
 #include "postgres.h"
 
+#include "pageinspect.h"
 #include "access/nbtree.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_am.h"
diff --git a/contrib/pageinspect/fsmfuncs.c b/contrib/pageinspect/fsmfuncs.c
index 6541646..372de39 100644
--- a/contrib/pageinspect/fsmfuncs.c
+++ b/contrib/pageinspect/fsmfuncs.c
@@ -19,6 +19,7 @@
 
 #include "postgres.h"
 
+#include "pageinspect.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
index cea77d3..5cfcbcb 100644
--- a/contrib/pageinspect/ginfuncs.c
+++ b/contrib/pageinspect/ginfuncs.c
@@ -9,6 +9,7 @@
  */
 #include "postgres.h"
 
+#include "pageinspect.h"
 #include "access/gin.h"
 #include "access/gin_private.h"
 #include "access/htup_details.h"
@@ -29,26 +30,6 @@ PG_FUNCTION_INFO_V1(gin_page_opaque_info);
 PG_FUNCTION_INFO_V1(gin_leafpage_items);
 
 
-static Page
-get_page_from_raw(bytea *raw_page)
-{
-	int			raw_page_size;
-	Page		page;
-
-	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
-	if (raw_page_size < BLCKSZ)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("input page too small (%d bytes)", raw_page_size)));
-
-	/* make a copy so that the page is properly aligned for struct access */
-	page = palloc(raw_page_size);
-	memcpy(page, VARDATA(raw_page), raw_page_size);
-
-	return page;
-}
-
-
 Datum
 gin_metapage_info(PG_FUNCTION_ARGS)
 {
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 83b4698..09bb9ee 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -10,6 +10,7 @@
 
 #include "postgres.h"
 
+#include "pageinspect.h"
 #include "access/hash.h"
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
@@ -67,7 +68,7 @@ verify_hash_page(bytea *raw_page, int flags)
  errdetail("Expected size %d, got %d",
 		   BLCKSZ, raw_page_size)));
 
-	page = VARDATA(raw_page);
+	page = get_page_from_raw(raw_page);
 
 	if (PageIsNew(page))
 		ereport(ERROR,
diff --git a/contrib/pageinspect/pageinspect.h b/contrib/pageinspect/pageinspect.h
new file mode 100644
index 000..9ec3aad
--- /dev/null
+++ b/contrib/pageinspect/pageinspect.h
@@ -0,0 +1,61 @@
+/*
+ * pageinspect.h
+ *		Functions to investigate the content of indexes
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/pageinspect.h
+ */
+#ifndef PAGEINSPECT_H
+#define PAGEINSPECT_H
+
+#include "postgres.h"
+
+#include "funcapi.h"
+
+// Raw page functions
+
+extern Datum get_raw_page(P

Re: [HACKERS] pageinspect: Hash index support

2017-02-03 Thread Jesper Pedersen


On 02/02/2017 02:28 PM, Jesper Pedersen wrote:

On 02/02/2017 02:24 PM, Robert Haas wrote:

So, committed.  Wow, I wish every patch had this many reviewers.



Thanks Robert !



This message should have included a thank you to everybody who provided 
valuable feedback for this feature, and for that I'm sorry.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-02 Thread Jesper Pedersen


On 02/02/2017 02:24 PM, Robert Haas wrote:

So, committed.  Wow, I wish every patch had this many reviewers.



Thanks Robert !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-26 Thread Jesper Pedersen


On 01/23/2017 02:53 PM, Jesper Pedersen wrote:

I have done some more testing with this, and have moved to the patch
back to 'Needs Review' pending Amit's comments.



Moved to "Ready for Committer".

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-23 Thread Jesper Pedersen


Hi Ashutosh,

On 01/20/2017 03:24 PM, Jesper Pedersen wrote:

Yeah, those are the steps; just with a Skylake laptop.

However, I restarted with a fresh master, with WAL v8 and MV v5, and
can't reproduce the issue.



I have done some more testing with this, and have moved to the patch 
back to 'Needs Review' pending Amit's comments.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-20 Thread Jesper Pedersen


Hi Ashutosh,

On 01/20/2017 04:18 AM, Ashutosh Sharma wrote:

okay, Thanks for confirming that.

I would like to update you that I am not able to reproduce this issue
at my end. I suspect that the steps i am following might be slightly
different than your's. Could you please have a look at steps mentioned
below and confirm if there is something different that I am doing.

Firstly, I am running the test-case on following git commit in head:


commit ba61a04bc7fefeee03416d9911eb825c4897c223
Author: Tom Lane 
Date:   Thu Jan 19 19:52:13 2017 -0500

Avoid core dump for empty prepared statement in an aborted transaction.

Brown-paper-bag bug in commit ab1f0c822: the old code here coped with
null CachedPlanSource.raw_parse_tree, the new code not so much.
Per report from Dave Cramer.


On top of above commit, I have applied WAL v8 patch for hash index and
MV v5 patch.

Now, with an --enable-cassert build I am following below steps:

1) Created a 'test' database

2) psql -d test -f ~/ddl.sql

where ddl.sql is,

-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
-- ddl.sql --

3) pgbench -M prepared -c 10 -j 10 -T 1800 -f ~/test.sql test

where test.sql is,

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
-- test.sql --


Machine details are as follows:

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 8

Also, It would be great if you could confirm as if you have been
getting this issue repeatedly. Thanks.



Yeah, those are the steps; just with a Skylake laptop.

However, I restarted with a fresh master, with WAL v8 and MV v5, and 
can't reproduce the issue.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-19 Thread Jesper Pedersen


Hi Ashutosh,

On 01/10/2017 08:40 AM, Jesper Pedersen wrote:

On 01/10/2017 05:24 AM, Ashutosh Sharma wrote:

Thanks for reporting this problem. It is basically coming because i
forgot to unpin the bucketbuf in hash_xlog_vacuum_one_page(). Please
find the attached v5 patch that fixes the issue.



The crash is now fixed, but the

--- test.sql ---
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
--- test.sql ---

case gives

client 6 aborted in command 3 of script 0; ERROR:  deadlock detected
DETAIL:  Process 14608 waits for ShareLock on transaction 1444620;
blocked by process 14610.
Process 14610 waits for ShareLock on transaction 1444616; blocked by
process 14608.
HINT:  See server log for query details.
CONTEXT:  while rechecking updated tuple (12,3) in relation "test"
...

using pgbench -M prepared -c 10 -j 10 -T 300 -f test.sql test



I'm not seeing this deadlock with just the WAL v8 patch applied.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-18 Thread Jesper Pedersen


Hi,

On 01/18/2017 04:54 AM, Ashutosh Sharma wrote:

Is there a reason for keeping the input arguments for
hash_bitmap_info() different from hash_page_items()?



Yes, there are two reasons behind it.

Firstly, we need metapage to identify the bitmap page that holds the
information about the overflow page passed as an input to this
function.

Secondly, we will have to input overflow block number as an input to
this function so as to determine the overflow bit number which can be
used further to identify the bitmap page.

+   /* Read the metapage so we can determine which bitmap page to use */
+   metabuf = _hash_getbuf(indexRel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+   metap = HashPageGetMeta(BufferGetPage(metabuf));
+
+   /* Identify overflow bit number */
+   ovflbitno = _hash_ovflblkno_to_bitno(metap, ovflblkno);
+
+   bitmappage = ovflbitno >> BMPG_SHIFT(metap);
+   bitmapbit = ovflbitno & BMPG_MASK(metap);
+
+   if (bitmappage >= metap->hashm_nmaps)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("invalid overflow bit number %u", ovflbitno)));
+
+   bitmapblkno = metap->hashm_mapp[bitmappage];




As discussed off-list (along with my patch that included some of these 
fixes), it would require calculation from the user to be able to provide 
the necessary pages through get_raw_page().


Other ideas on how to improve this are most welcome.


Apart from above comments, the attached patch also handles all the
comments from Mithun.



Please, include a version number for your patch files in the future.

Fixed in this version:

* verify_hash_page: Display magic in hex, like hash_metapage_info
* Update header for hash_page_type

Moving the patch back to 'Needs Review'.

Best regards,
 Jesper

>From 8e5a68cfaf979bcab88fb9358b88a89bc780a277 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Wed, 18 Jan 2017 08:42:02 -0500
Subject: [PATCH] Add support for hash index in pageinspect contrib module v15

Authors: Ashutosh Sharma and Jesper Pedersen.
---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 574 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  76 
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 148 +++
 src/backend/access/hash/hashovfl.c|   8 +-
 src/include/access/hash.h |   1 +
 7 files changed, 810 insertions(+), 9 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 87a28e9..052a8e1 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		   brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..f157fcc
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,574 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_page_type);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_bitmap_info);
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+
+#define IS_HASH(r) ((r)->rd_rel->relam == HASH_AM_OID)
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPage

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-01-12 Thread Jesper Pedersen


On 12/27/2016 01:58 AM, Amit Kapila wrote:

After recent commit's 7819ba1e and 25216c98, this patch requires a
rebase.  Attached is the rebased patch.



This needs a rebase after commit e898437.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-12 Thread Jesper Pedersen


Hi,

On 01/11/2017 03:16 PM, Ashutosh Sharma wrote:


I have rephrased it to make it more clear.



Rebased, and removed the compile warn in hashfuncs.c

Best regards,
 Jesper

>From 8a07230b89b97280f0f1d645145da1fd140969c6 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Thu, 12 Jan 2017 14:00:45 -0500
Subject: [PATCH] Add support for hash index in pageinspect contrib module v13

Authors: Ashutosh Sharma and Jesper Pedersen.
---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 548 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  76 
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 148 +++
 src/backend/access/hash/hashovfl.c|  52 +--
 src/include/access/hash.h |   1 +
 7 files changed, 806 insertions(+), 31 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 87a28e9..d7f3645 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..25e9641
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,548 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_page_type);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_bitmap_info);
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+
+#define IS_HASH(r) ((r)->rd_rel->relam == HASH_AM_OID)
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, int flags)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+
+	if (PageIsNew(page))
+		ereport(ERROR,
+(errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains unexpected zero page")));
+
+	if (PageGetSpecialSize(page) != MAXALIGN(sizeof(HashPageOpaqueData)))
+		ereport(ERROR,
+(errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains corrupted page")));
+
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	if (!(pageopaque->hasho_flag & flags))
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("page is not a hash page of expected type"),
+			errdetail("Expected hash page type: %08x got: %08x.",
+flags, pageopaque->hasho_flag)));
+	return page;
+}
+
+/* ---

Re: [HACKERS] Retiring from the Core Team

2017-01-12 Thread Jesper Pedersen


On 01/11/2017 07:29 PM, Josh Berkus wrote:

You will have noticed that I haven't been very active for the past year.
 My new work on Linux containers and Kubernetes has been even more
absorbing than I anticipated, and I just haven't had a lot of time for
PostgreSQL work.

For that reason, as of today, I am stepping down from the PostgreSQL
Core Team.



Thanks for all your work over the years !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-10 Thread Jesper Pedersen


On 01/10/2017 10:39 AM, Tom Lane wrote:

Robert Haas  writes:

No, you're missing the point.  The patch doesn't need to add
pageinspect--1.6.sql at all, nor does it remove pageinspect--1.5.sql.
It only needs to add pageinspect--1.5--1.6.sql and change the default
version to 1.6.  No other changes are required.


Right, this is a new recommended process since commit 40b449ae8,
which see for rationale.

Use, eg, commit 11da83a0e as a model for extension update patches.



Thanks for the commit ids !

Revised patched attached.

Best regards,
 Jesper


>From 2c75e4af17903c22961f23ed5455614340d0dd4e Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 3 Jan 2017 07:49:42 -0500
Subject: [PATCH] Add support for hash index in pageinspect contrib module v11

Authors: Ashutosh Sharma and Jesper Pedersen.

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 558 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  76 
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 149 +++
 src/backend/access/hash/hashovfl.c|  52 +--
 src/include/access/hash.h |   1 +
 7 files changed, 817 insertions(+), 31 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 87a28e9..d7f3645 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..525e780
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,558 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_page_type);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_bitmap_info);
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+
+#define IS_HASH(r) ((r)->rd_rel->relam == HASH_AM_OID)
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, int flags)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+
+	if (PageIsNew(page))
+		ereport(ERROR,
+(errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains unexpected zero page")));
+
+	if (PageGetSpecialSize(page) != MAXALIGN(sizeof(HashPageOpaqueData)))
+		ereport(ERROR,
+(errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains corrupted page")));
+
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque-

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-10 Thread Jesper Pedersen


Hi Ashutosh,

On 01/10/2017 05:24 AM, Ashutosh Sharma wrote:

Thanks for reporting this problem. It is basically coming because i
forgot to unpin the bucketbuf in hash_xlog_vacuum_one_page(). Please
find the attached v5 patch that fixes the issue.



The crash is now fixed, but the

--- test.sql ---
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
--- test.sql ---

case gives

client 6 aborted in command 3 of script 0; ERROR:  deadlock detected
DETAIL:  Process 14608 waits for ShareLock on transaction 1444620; 
blocked by process 14610.
Process 14610 waits for ShareLock on transaction 1444616; blocked by 
process 14608.

HINT:  See server log for query details.
CONTEXT:  while rechecking updated tuple (12,3) in relation "test"
...

using pgbench -M prepared -c 10 -j 10 -T 300 -f test.sql test

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-09 Thread Jesper Pedersen


Hi Ashutosh,

On 01/06/2017 12:54 AM, Ashutosh Sharma wrote:

using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test

crashes after a few minutes with

TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*)
(&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)


Attached v4 patch fixes this assertion failure.



Yes, that fixes the problem.

However (master / WAL v7 / MV v4),

--- ddl.sql ---
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
--- ddl.sql ---

--- test.sql ---
\set id random(1,10)
\set val random(0,10)
BEGIN;
DELETE FROM test WHERE id = :id;
INSERT INTO test VALUES (:id, :val);
COMMIT;
--- test.sql ---

gives

#9  0x0098a83e in elog_finish (elevel=20, fmt=0xb6ea92 
"incorrect local pin count: %d") at elog.c:1378
#10 0x007f0b33 in LockBufferForCleanup (buffer=1677) at 
bufmgr.c:3605
#11 0x00549390 in XLogReadBufferForRedoExtended 
(record=0x2afced8, block_id=1 '\001', mode=RBM_NORMAL, 
get_cleanup_lock=1 '\001', buf=0x7ffe3ee27c8c) at xlogutils.c:394
#12 0x004c5026 in hash_xlog_vacuum_one_page (record=0x2afced8) 
at hash_xlog.c:1109

#13 0x004c5547 in hash_redo (record=0x2afced8) at hash_xlog.c:1214
#14 0x0053a361 in StartupXLOG () at xlog.c:6975
#15 0x007a4ca0 in StartupProcessMain () at startup.c:216

on the slave instance in a master-slave setup.

Also, the src/backend/access/README file should be updated with a 
description of the changes which this patch provides.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-05 Thread Jesper Pedersen


Hi Ashutosh,

On 01/04/2017 06:13 AM, Ashutosh Sharma wrote:

Attached is the v3 patch rebased on postgreSQL HEAD and WAL v7 patch.
It also takes care of all the previous comments from Jesper - [1].



With an --enable-cassert build (master / WAL v7 / MV v3) and

-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
-- ddl.sql --

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
-- test.sql --

using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test

crashes after a few minutes with

TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*) 
(&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)


BTW, better rename 'hashkillitems' to '_hash_kill_items' to follow the 
naming convention in hash.h


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-05 Thread Jesper Pedersen


Hi Ashutosh,

On 01/05/2017 07:13 AM, Ashutosh Sharma wrote:

* Readded pageinspect--1.6.sql

In order to have the latest pageinspect interface in 1 file, as we need
something to install from.


I think there should be no problem even if we simply add
pageinspect--1.5--1.6.sql file instead of removing 1.5.sql as with the
latest changes to the create extension infrastructure in postgresql it
automatically finds a lower version script if unable to find the
default version script and then upgrade it to the latest version.


Removing --1.5.sql otherwise would give

test=# CREATE EXTENSION "pageinspect";
ERROR:  extension "pageinspect" has no installation script nor update path
for version "1.6"


I didn't get this. Do you mean to say that if you add 1.6.sql and do
not remove 1.5.sql you get this error when trying to add pageinspect
module. I didn't
get any such error. See below,

postgres=# create extension pageinspect WITH VERSION '1.6';
CREATE EXTENSION



The previous patch was using pageinspect--1.5.sql as a base, and then 
uses pageinspect--1.5-1.6.sql to upgrade to version 1.6.


Removing pageinspect--1.5.sql, and adding pageinspect--1.6.sql with the 
current interface will use pageinspect--1.6.sql directly where as 
existing installations will use the upgrade scripts.


Hence I don't see a reason why we should keep pageinspect--1.5.sql 
around when we can provide a clear interface description in a 
pageinspect--1.6.sql.



Peter has already written a test-case-[1] based on your earlier patch
for supporting hash index with pageinspect module. Once the latest
patch (v10) becomes stable i will share a separete patch having a
test-case for hash index. Infact I will try to modify an already
existing patch by Peter.



Ok, sounds good.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-04 Thread Jesper Pedersen


Hi Ashutosh,

On 12/20/2016 05:55 AM, Ashutosh Sharma wrote:

1) It introduces two new functions hash_page_type() and
hash_bitmap_info(). hash_page_type basically displays the type of hash
page whereas hash_bitmap_info() shows the status of a bit for a
particular overflow page in bitmap page of hash index.

2) The functions hash_page_stats() and hash_page_items() basically
shows the information about data stored in bucket and overflow pages
of hash index. If a metapage or bitmap page is passed as an input to
this function it throws an error saying "Expected page type:'' got:
''".

3) It also improves verify_hash_page() function to handle any type of
page in hash index. It is now being used as
verify_hash_page('raw_page', ) to verify if the page passed
to this function is of 'page_type' or not. Here page_type can be
LH_META_PAGE, LH_BUCKET_PAGE, LH_OVERFLOW_PAGE, LH_BITMAP_PAGE.



Here is an updated patch with some changes.

* Rename convert_ovflblkno_to_bitno to _hash_ovflblkno_to_bitno

Such that there is only 1 method, which is exposed

* Readded pageinspect--1.6.sql

In order to have the latest pageinspect interface in 1 file, as we need 
something to install from.


Removing --1.5.sql otherwise would give

test=# CREATE EXTENSION "pageinspect";
ERROR:  extension "pageinspect" has no installation script nor update 
path for version "1.6"


* Minor documentation changes

Look it over, and maybe there is a simple test case we can add for this.

Best regards,
 Jesper

>From 46374ba3aba57d290c1940fee4a12ed31c12342f Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 3 Jan 2017 07:49:42 -0500
Subject: [PATCH] Add support for hash index in pageinspect contrib module v10

Authors: Ashutosh Sharma and Jesper Pedersen.

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 558 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  76 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 -
 contrib/pageinspect/pageinspect--1.6.sql  | 351 
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 149 +++
 src/backend/access/hash/hashovfl.c|  52 +--
 src/include/access/hash.h |   1 +
 9 files changed, 1168 insertions(+), 310 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 87a28e9..ecf0df0 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..525e780
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,558 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_page_type);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_bitmap_info);
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+
+#define IS_HASH(r) ((r)->rd_rel->relam == HASH_AM_OID)
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hash

Re: [HACKERS] Microvacuum support for Hash Index

2016-12-30 Thread Jesper Pedersen


On 11/11/2016 12:11 AM, Ashutosh Sharma wrote:

Hi Jesper,


Some initial comments.

_hash_vacuum_one_page:

+   END_CRIT_SECTION();
+   _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

The _hash_chgbufaccess() needs a comment.

You also need a place where you call pfree for so->killedItems - maybe in
hashkillitems().


Thanks for reviewing this patch. I would like to update you that this
patch has got dependency on a patch for concurrent hash index and WAL
log in hash index. So, till these two patches for hash index are not
stable I won't be able to share you a next version of patch for
supporting microvacuum in hash index.



This can be rebased on the WAL v7 patch [1]. In addition to the previous 
comments you need to take commit 7819ba into account.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BdmGNTFMnLO4EbOWJDHUq%3D%2Ba2L8T%3D72ifXeh-Kd8HOsg%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-12-20 Thread Jesper Pedersen


Hi,

On 12/20/2016 05:55 AM, Ashutosh Sharma wrote:

Attached is the revised patch. Please have a look and let me know your
feedback. I have also changed the status for this patch in the
upcoming commitfest to "needs review".  Thanks.



I was planning to submit a new version next week for CF/January, so I'll 
review your changes with the previous feedback in mind.


Thanks for working on this !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_catversion builtin function

2016-12-14 Thread Jesper Pedersen


On 12/14/2016 08:52 AM, Robert Haas wrote:

But I understand your concern, so "Rejected" is ok under

https://commitfest.postgresql.org/12/906/


I have a better reason for rejecting this patch: we already have this feature.

rhaas=# select catalog_version_no from pg_control_system();
 catalog_version_no

  201612081
(1 row)




Ah, perfect !

Thanks, Robert

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_catversion builtin function

2016-12-14 Thread Jesper Pedersen


On 12/13/2016 10:33 AM, Tom Lane wrote:

Jesper Pedersen  writes:

Attached is a new builtin function that exposes the CATALOG_VERSION_NO
constant under the pg_catversion() function, e.g.


I'm pretty sure that we intentionally didn't expose that, reasoning that
users should only care about the user-visible version number.  What
exactly is the argument for exposing this?



I'm using it to get the catalog version from a running instance in order 
to figure out if a dump/restore is needed for the next daily build -- 
instead of keeping the catversion.h file around for each installation, 
with script magic.


Test databases are external to PostgreSQL's test suite, and one is quite 
big, so "cp" is faster than dump/restore :)


But I understand your concern, so "Rejected" is ok under

https://commitfest.postgresql.org/12/906/

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-12-13 Thread Jesper Pedersen


On 12/11/2016 11:37 PM, Amit Kapila wrote:

On Sun, Dec 11, 2016 at 11:54 AM, Amit Kapila  wrote:

On Wed, Dec 7, 2016 at 2:02 AM, Jeff Janes  wrote:

With above fixes, the test ran successfully for more than a day.


There was a small typo in the previous patch which is fixed in
attached.  I don't think it will impact the test results if you have
already started the test with the previous patch, but if not, then it
is better to test with attached.



A mix work load (INSERT, DELETE and VACUUM primarily) is successful here 
too using _v2.


Thanks !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_catversion builtin function

2016-12-13 Thread Jesper Pedersen


Hi Hackers,

Attached is a new builtin function that exposes the CATALOG_VERSION_NO 
constant under the pg_catversion() function, e.g.


test=# SELECT pg_catversion();
 pg_catversion
---
 201612121
(1 row)

Although it mostly useful during the development cycle to verify if 
dump/restore is needed; it could have other use-cases.


I'm unsure of the OID assignment rules - feel free to point me towards 
information regarding this.


I'll register this patch with the next CF.

Best regards,
 Jesper
>From 39d52f5389bf3ef1814c1f201df6531feb2a5c7f Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 13 Dec 2016 08:27:18 -0500
Subject: [PATCH] pg_catversion builtin function

---
 doc/src/sgml/func.sgml  |  6 ++
 src/backend/utils/adt/version.c | 10 ++
 src/include/catalog/pg_proc.h   |  3 +++
 src/include/utils/builtins.h|  1 +
 4 files changed, 20 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0f9c9bf..6fc78ab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15497,6 +15497,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
text
PostgreSQL version information. See also  for a machine-readable version.
   
+
+  
+   pg_catversion()
+   int
+   returns the catalog version number
+  
  
 

diff --git a/src/backend/utils/adt/version.c b/src/backend/utils/adt/version.c
index f247992..55f97fa 100644
--- a/src/backend/utils/adt/version.c
+++ b/src/backend/utils/adt/version.c
@@ -14,6 +14,7 @@
 
 #include "postgres.h"
 
+#include "catalog/catversion.h"
 #include "utils/builtins.h"
 
 
@@ -22,3 +23,12 @@ pgsql_version(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_TEXT_P(cstring_to_text(PG_VERSION_STR));
 }
+
+/*
+ * Return the catalog version number
+ */
+Datum
+pgsql_catversion(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT32(CATALOG_VERSION_NO);
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cd7b909..b23f54a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -235,6 +235,9 @@ DATA(insert OID = 1258 (  textcat		   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
 DATA(insert OID =  84 (  boolne			   PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "16 16" _null_ _null_ _null_ _null_ _null_ boolne _null_ _null_ _null_ ));
 DATA(insert OID =  89 (  version		   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pgsql_version _null_ _null_ _null_ ));
 DESCR("PostgreSQL version string");
+DATA(insert OID = 7000 (  pg_catversionPGNSP PGUID 12 1 0 0 0 f f f f t f s s 0 0 23 "" _null_ _null_ _null_ _null_ _null_ pgsql_catversion _null_ _null_ _null_ ));
+DESCR("PostgreSQL catalog version number");
+
 
 DATA(insert OID = 86  (  pg_ddl_command_in		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 32 "2275" _null_ _null_ _null_ _null_ _null_ pg_ddl_command_in _null_ _null_ _null_ ));
 DESCR("I/O");
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ed1623..83b2846 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -893,6 +893,7 @@ extern Datum text_format_nv(PG_FUNCTION_ARGS);
 
 /* version.c */
 extern Datum pgsql_version(PG_FUNCTION_ARGS);
+extern Datum pgsql_catversion(PG_FUNCTION_ARGS);
 
 /* xid.c */
 extern Datum xidin(PG_FUNCTION_ARGS);
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2016-12-01 Thread Jesper Pedersen


On 11/11/2016 12:11 AM, Ashutosh Sharma wrote:

Thanks for reviewing this patch. I would like to update you that this
patch has got dependency on a patch for concurrent hash index and WAL
log in hash index. So, till these two patches for hash index are not
stable I won't be able to share you a next version of patch for
supporting microvacuum in hash index.



As the concurrent hash index patch was committed in 6d46f4 this patch 
needs a rebase.


I have moved this submission to the next CF.

Thanks for working on this !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-12-01 Thread Jesper Pedersen


On 09/28/2016 11:55 AM, Mithun Cy wrote:

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes  wrote:
 > I think that this needs to be updated again for v8 of concurrent and v5
of wal

Adding the rebased patch over [1] + [2]



As the concurrent hash index patch was committed in 6d46f4 this patch 
needs a rebase.


I have moved this submission to the next CF.

Thanks for working on this !

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2016-11-03 Thread Jesper Pedersen


Hi,

On 11/02/2016 01:38 AM, Ashutosh Sharma wrote:

While replaying the delete/vacuum record on standby, it can conflict
with some already running queries.  Basically the replay can remove
some row which can be visible on standby.  You need to resolve
conflicts similar to what we do in btree delete records (refer
btree_xlog_delete).


Agreed. Thanks for putting this point. I have taken care of it in the
attached v2 patch.



Some initial comments.

_hash_vacuum_one_page:

+   END_CRIT_SECTION();
+   _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

The _hash_chgbufaccess() needs a comment.

You also need a place where you call pfree for so->killedItems - maybe 
in hashkillitems().


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-11-02 Thread Jesper Pedersen


On 11/01/2016 03:28 PM, Peter Eisentraut wrote:

Ok, thanks for your feedback.

Maybe "Returned with Feedback" is more appropriate, as there is still
development left.


I have applied the documentation change that introduced subsections,
which seems quite useful independently.  I have also committed the tests
that I proposed and will work through the failures.

As no new patch has been posted for the 2016-11 CF, I will close the
patch entry now.  Please submit an updated patch when you have the time,
keeping an eye on ongoing work to update hash indexes.



Agreed.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-10-03 Thread Jesper Pedersen


On 09/29/2016 04:02 PM, Peter Eisentraut wrote:

On 9/29/16 4:00 PM, Peter Eisentraut wrote:

Since the commit fest is drawing to a close, I'll set this patch as
returned with feedback.


Actually, the CF app informs me that moving to the next CF is more
appropriate, so I have done that.



Ok, thanks for your feedback.

Maybe "Returned with Feedback" is more appropriate, as there is still 
development left.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-09-29 Thread Jesper Pedersen


On 09/29/2016 11:58 AM, Peter Eisentraut wrote:

On 9/27/16 10:10 AM, Jesper Pedersen wrote:

contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 346 ++


I think there is still a misunderstanding about these extension files.
You only need to supply pageinspect--1.5--1.6.sql and leave all the
other ones alone.



Ok.

Left the 'DATA' section in the Makefile, and the control file as is.

Best regards,
 Jesper

>From b48623beda6aad43116d46ff44de55bdc7547325 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 408 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 146 -
 5 files changed, 609 insertions(+), 16 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..a76b683
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,408 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+	char	   *type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, bool metap)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	if (metap)
+	{
+		if (pageopaque->hasho_flag != LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is not a metadata page"),
+	 errdetail("Expected %08x, got %08x.",
+			   LH_META_PAGE, pageopaque->hasho_flag)));
+	}
+	else
+	{
+		if (pageopaque->hasho_flag == LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is a metadata page"),
+	 errdetail("Flags %08x.",
+			   pageopaque->hasho_flag)));
+	}
+
+	return page;
+}
+
+/* -
+ * GetHashPageStatistics()
+ *
+ * C

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-09-27 Thread Jesper Pedersen


On 09/25/2016 01:00 AM, Amit Kapila wrote:

Attached patch fixes the problem, now we do perform full page writes
for bitmap pages.  Apart from that, I have rebased the patch based on
latest concurrent index patch [1].  I have updated the README as well
to reflect the WAL logging related information for different
operations.

With attached patch, all the review comments or issues found till now
are addressed.



I have been running various tests, and applications with this patch 
together with the CHI v8 patch [1].


As I havn't seen any failures and doesn't currently have additional 
feedback I'm moving this patch to "Ready for Committer" for their feedback.


If others have comments, move the patch status back in the CommitFest 
application, please.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-27 Thread Jesper Pedersen


On 09/20/2016 09:02 AM, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 11:22 AM, Amit Kapila  wrote:


I do want to work on it, but it is always possible that due to some
other work this might get delayed.  Also, I think there is always a
chance that while doing that work, we face some problem due to which
we might not be able to use that optimization.  So I will go with your
suggestion of removing hashscan.c and it's usage for now and then if
required we will pull it back.  If nobody else thinks otherwise, I
will update this in next patch version.



In the attached patch, I have removed the support of hashscans.  I
think it might improve performance by few percentage (especially for
single row fetch transactions) as we have registration and destroy of
hashscans.




I have been running various tests, and applications with this patch 
together with the WAL v5 patch [1].


As I havn't seen any failures and doesn't currently have additional 
feedback I'm moving this patch to "Ready for Committer" for their feedback.


If others have comments, move the patch status back in the CommitFest 
application, please.


[1] 
https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-09-27 Thread Jesper Pedersen


On 09/26/2016 10:45 PM, Peter Eisentraut wrote:

On 9/26/16 1:39 PM, Jesper Pedersen wrote:

Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally
readable in this case. But, I can change the patch if needed.


The point is that to use BuildTupleFromCStrings() you need to convert
numbers to strings, and then they are converted back.  This is not a
typical way to write row-returning functions.



Ok.

Changed:
 * BuildTupleFromCStrings -> xyzGetDatum
 * 'type' field: char -> text w/ full description
 * Removed 'type' information from documentation

Best regards,
 Jesper

>From d6dd73ffef9deafc938b9fe7119db0928494cbf9 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 408 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 346 ++
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 146 -
 7 files changed, 955 insertions(+), 295 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..a76b683
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,408 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+	char	   *type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, bool metap)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	if (metap)
+	{
+		if (pageopaque->hasho_flag != LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is not a metadata page"),
+	 errdetail("Expected %08x, got %08x.",
+			   LH_META_PAGE, pageopaque->hasho_flag)));
+	}
+	else
+	{
+		if (pageopaque->hasho_flag == LH_

Re: [HACKERS] pageinspect: Hash index support

2016-09-26 Thread Jesper Pedersen


On 09/23/2016 01:56 AM, Amit Kapila wrote:

While looking at patch, I noticed below code which seems somewhat problematic:

+ stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
+
+ /* page type (flags) */
+ if (opaque->hasho_flag & LH_META_PAGE)
+ stat->type = 'm';
+ else if (opaque->hasho_flag & LH_OVERFLOW_PAGE)
+ stat->type = 'v';
+ else if (opaque->hasho_flag & LH_BUCKET_PAGE)
+ stat->type = 'b';
+ else if (opaque->hasho_flag & LH_BITMAP_PAGE)
+ stat->type = 'i';

In the above code, it appears that you are trying to calculate
max_avail space for all pages in same way.  Don't you need to
calculate it differently for bitmap page or meta page as they don't
share the same format as other type of pages?



Correct, however the max_avail calculation was removed in v6, since we 
don't display average item size anymore.


Thanks for the feedback !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-09-26 Thread Jesper Pedersen


Hi,

On 09/23/2016 12:10 AM, Peter Eisentraut wrote:

On 9/21/16 9:30 AM, Jesper Pedersen wrote:

Attached is v5, which add basic page verification.


There are still some things that I found that appear to follow the old
style (btree) conventions instead the new style (brin, gin) conventions.

- Rename hash_metap to hash_metapage_info.



Done.


- Don't use BuildTupleFromCStrings().  (There is nothing inherently
wrong with it, but why convert numbers to strings and back again when it
can be done more directly.)



Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally 
readable in this case. But, I can change the patch if needed.



- Documentation for hash_page_stats still has blkno argument.



Fixed.


- In hash_metap, the magic field could be type bytea, so that it
displays in hex.  Or text like the brin functions.



Changed to 'text'.


Some other comments:

- hash_metap result fields spares and mapp should be arrays of integer.



B-tree and BRIN uses a 'text' field as output, so left as is.


(Incidentally, the comment in hash.h talks about bitmaps[] but I think
it means hashm_mapp[].)

- As of very recently, we don't need to move pageinspect--1.5.sql to
pageinspect--1.6.sql anymore.  Just add pageinspect--1.5--1.6.sql.



We need to rename the pageinspect--1.5.sql file and add the new methods, 
right ? Or ?



- In the documentation for hash_page_stats(), the "+" sign is misaligned.



Fixed.


- In hash_page_items, the fields itemlen, nulls, vars are always 16,
false, false.  So perhaps there is no need for them.  Similarly, the
hash_page_stats in hash_page_stats is always 16.



Fixed, by removing them. We can add them back later if needed.


- The data field could be of type bytea.



Left as is, for same reasons as 'spares' and 'mapp'.


- Add a pointer in the documentation to the relevant header file.



Done.


Bonus:

- Add subsections in the documentation (general functions, B-tree
functions, etc.)



Done.


- Add tests.



Thanks for the review !

Best regards,
 Jesper

>From 24c3ff3cb48ca2076d589eda9027a0abe73b5aad Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 400 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 346 ++
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 151 +-
 7 files changed, 952 insertions(+), 295 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..96e72a9
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,400 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+	char		type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	Bl

Re: [HACKERS] pageinspect: Hash index support

2016-09-21 Thread Jesper Pedersen


On 09/21/2016 08:43 AM, Michael Paquier wrote:

page_stats / page_items should not be used on the metadata page.

As these functions are marked as superuser only it is expected that people
provides the correct input, especially since the raw page structure is used
as the input.


btree functions use the block number to do some sanity checks. You
cannot do that here as only bytea functions are available, but you
could do it in verify_hash_page by looking at the opaque data and look
at LH_META_PAGE. Then add a boolean argument into verify_hash_page to
see if the caller expects a meta page or not and just issue an error.
Actually it would be a good idea to put in those safeguards, even if I
agree with you that calling those functions is at the risk of the
user... Could you update the patch in this sense?

I had fun doing the same tests, aka running the items and stats
functions on a meta page, and the meta function on a non-meta page,
but at my surprise I did not see a crash, so perhaps I was lucky and
perhaps that was because of OSX.



Attached is v5, which add basic page verification.

Thanks for the feedback !

Best regards,
 Jesper

>From 05fe7b9056bcbb2f29809798229640499576c3a9 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 422 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  63 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 -
 contrib/pageinspect/pageinspect--1.6.sql  | 338 +
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 104 +++
 7 files changed, 933 insertions(+), 285 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..3ee57dc
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,422 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+
+PG_FUNCTION_INFO_V1(hash_metap);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		max_avail;
+	uint32		free_size;
+	uint32		avg_item_size;
+	char		type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, bool metap)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_p

Re: [HACKERS] pageinspect: Hash index support

2016-09-21 Thread Jesper Pedersen


On 09/21/2016 02:14 AM, Michael Paquier wrote:

Adjusted in v4.


Thanks for the new version.


Code/doc will need an update once the CHI patch goes in.


If you are referring to the WAL support, I guess that this patch or
the other patch could just rebase and adjust as needed.



It is the main patch [1] that defines the new constants for page type. 
But I'll submit an update for pageinspect when needed.



hash_page_items and hash_page_stats share a lot of common points with
their btree equivalents, still doing a refactoring would just impact
the visibility of the code, and it is wanted as educational in this
module, so let's go with things the way you suggest.



Ok.


+ 
+  The type information will be 'm' for a metadata page,
+  'v' for an overflow page,
'b' for a bucket page,
+  'i' for a bitmap page, and
'u' for an unused page.
+ 
Other functions don't go into this level of details, so I would just
rip out this paragraph.



I'll add an annotation for this part, and leave it for the committer to 
decide, since Jeff wanted documentation for the 'type' information.



The patch looks in pretty good shape to me, so I am switching it as
ready for committer.



Thanks for your feedback !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-09-21 Thread Jesper Pedersen


Hi,

On 09/21/2016 07:24 AM, Ashutosh Sharma wrote:

The development branch is @
https://github.com/jesperpedersen/postgres/tree/pageinspect_hash


I am working on centOS 7. I am still facing the issue when applying
your patch using "git apply" command but if i use 'patch' command it
works fine however i am seeing some warning messages with 'patch'
command as well.



git apply w/ v4 works here, so you will have to investigate what happens 
on your side.



I continued reviewing your v4 patch and here are some more comments:

1). Got below error if i pass meta page to hash_page_items(). Does
hash_page_items() accept only bucket or bitmap page as input?

postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index', 0));
ERROR:  invalid memory alloc request size 18446744073709551593



page_stats / page_items should not be used on the metadata page.

As these functions are marked as superuser only it is expected that 
people provides the correct input, especially since the raw page 
structure is used as the input.



2). Server crashed when below query was executed on a code that was
build with cassert-enabled.

postgres=# SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

The callstack is as follows:

(gdb) bt
#0  0x7fef794f15f7 in raise () from /lib64/libc.so.6
#1  0x7fef794f2ce8 in abort () from /lib64/libc.so.6
#2  0x00967381 in ExceptionalCondition
(conditionName=0x7fef699c942d "!(((id)->lp_len != 0))",
errorType=0x7fef699c92d4 "FailedAssertion",
fileName=0x7fef699c9397 "hashfuncs.c", lineNumber=123) at assert.c:54
#3  0x7fef699c6ed6 in GetHashPageStatistics (page=0x1060c2c "",
stat=0x7ffd76846230) at hashfuncs.c:123
#4  0x7fef699c70a4 in hash_page_stats (fcinfo=0x7ffd768463e0) at
hashfuncs.c:169
#5  0x00682e6b in ExecMakeTableFunctionResult
(funcexpr=0x10457f0, econtext=0x1044e28, argContext=0x104ee38,
expectedDesc=0x1047640,
randomAccess=0 '\000') at execQual.c:2216
#6  0x006a88e2 in FunctionNext (node=0x1044d10) at nodeFunctionscan.c:94
#7  0x0068a3d9 in ExecScanFetch (node=0x1044d10,
accessMtd=0x6a882b , recheckMtd=0x6a8c10
) at execScan.c:95
#8  0x0068a448 in ExecScan (node=0x1044d10, accessMtd=0x6a882b
, recheckMtd=0x6a8c10 ) at
execScan.c:145
#9  0x006a8c45 in ExecFunctionScan (node=0x1044d10) at
nodeFunctionscan.c:268
#10 0x0067f0cf in ExecProcNode (node=0x1044d10) at execProcnode.c:449
#11 0x0067b40b in ExecutePlan (estate=0x1044bf8,
planstate=0x1044d10, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001',
numberTuples=0, direction=ForwardScanDirection, dest=0x10444c8) at
execMain.c:1567
#12 0x00679527 in standard_ExecutorRun (queryDesc=0xfac778,
direction=ForwardScanDirection, count=0) at execMain.c:338
#13 0x006793ab in ExecutorRun (queryDesc=0xfac778,
direction=ForwardScanDirection, count=0) at execMain.c:286
#14 0x00816b3e in PortalRunSelect (portal=0xfa49e8, forward=1
'\001', count=0, dest=0x10444c8) at pquery.c:948
#15 0x008167d1 in PortalRun (portal=0xfa49e8,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x10444c8,
altdest=0x10444c8,
completionTag=0x7ffd76846c60 "") at pquery.c:789
#16 0x00810a27 in exec_simple_query (query_string=0x1007018
"SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));") at
postgres.c:1094
#17 0x00814b5e in PostgresMain (argc=1, argv=0xfb1d08,
dbname=0xf873d8 "postgres", username=0xfb1b70 "edb") at
postgres.c:4070
#18 0x0078990c in BackendRun (port=0xfacb50) at postmaster.c:4260




Same deal here.



3). Could you please let me know, what does the hard coded values '*5'
and '+1' represents in the below lines of code. You have used them
when allocating memory for storing spare pages allocated before
certain split point and block number of bitmap pages inside
hash_metap().

spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1);

mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1);

I guess it would be better to add some comments if using any hard coded values.



It is the space needed to output the values.


4). Inside hash_page_stats(), you are first calling verify_hash_page()
to verify if it is a hash page or not and


No, we assume that the page is a valid hash page structure, since there 
is an explicit cast w/o any further checks.



then calling
GetHashPageStatistics() to get the page stats wherein you are trying
to check what is the type of hash page. Below is the code snippet for
this,

+   /* page type (flags) */
+   if (opaque->hasho_flag & LH_META_PAGE)
+   stat->type = 'm';
+   else if (opaque->hasho_flag & LH_OVERFLOW_PAGE)
+   stat->type = 'v';
+   else if (opaque->hasho_flag & LH_BUCKET_PAGE)
+   stat->type = 'b';
+   else if (opaque-

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-09-20 Thread Jesper Pedersen


On 09/16/2016 02:42 AM, Amit Kapila wrote:

Okay, Thanks for pointing out the same.  I have fixed it.  Apart from
that, I have changed _hash_alloc_buckets() to initialize the page
instead of making it completely Zero because of problems discussed in
another related thread [1].  I have also updated README.



Thanks.

This needs a rebase against the CHI v8 [1] patch.

[1] 
https://www.postgresql.org/message-id/CAA4eK1+X=8sud1uczdzne3d9cgi9kw+kjxp2tnw7sx5w8pl...@mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-09-20 Thread Jesper Pedersen


Hi,

On 09/20/2016 01:46 PM, Ashutosh Sharma wrote:

I am getting a fatal error along with some warnings when trying to
apply the v3 patch using "git apply" command.

[ashu@localhost postgresql]$ git apply
0001-pageinspect-Hash-index-support_v3.patch
0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace.
  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace.
DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace.
pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace.
pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace.
pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
fatal: git apply: bad git-diff - expected /dev/null on line 46



Which platform are you on ?

The development branch is @ 
https://github.com/jesperpedersen/postgres/tree/pageinspect_hash



Also, i think the USAGE for hash_metap() is refering to
hash_metap_bytea(). Please correct it. I have just started reviewing
the patch, will keep on posting my comments upon further review.


Fixed in v4.

Thanks for the review.

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-09-20 Thread Jesper Pedersen


On 09/20/2016 12:45 PM, Jeff Janes wrote:

Is the 2nd "1" in this call needed?

SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1)

As far as I can tell, that argument is only used to stuff into the output
field "blkno", it is not used to instruct the interpretation of the raw
page itself.  It doesn't seem worthwhile to have the parameter that only
echos back to the user what the user already put in (twice).  The only
existing funtions which take the blkno argument are those that don't use
the get_raw_page style.

Also, should we document what the single letter values mean in the
hash_page_stats.type column?  It is not obvious that 'i' means bitmap, for
example.



Adjusted in v4. Code/doc will need an update once the CHI patch goes in.

Best regards,
 Jesper

>From 1f27a2bb28cc6dfea9cba015d7cceab768f67d0a Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 403 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  63 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 338 +
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 104 +++
 7 files changed, 914 insertions(+), 285 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..6efbf22
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,403 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+
+PG_FUNCTION_INFO_V1(hash_metap);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		max_avail;
+	uint32		free_size;
+	uint32		avg_item_size;
+	char		type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	return page;
+}
+
+/* -
+ * GetHashPageStatistics()
+ *
+ * Collect statistics of single hash page
+ * -
+ */
+static void
+GetHashPageStatistics(Page page, HashPageStat *stat)
+{
+	PageHeader	phdr = (PageHeader) page;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(pa

Re: [HACKERS] pageinspect: Hash index support

2016-09-20 Thread Jesper Pedersen


On 09/20/2016 03:19 AM, Michael Paquier wrote:

You did not get right the comments from Alvaro upthread. The following
functions are added with this patch:
 function hash_metap(text)
 function hash_metap_bytea(bytea)
 function hash_page_items(text,integer)
 function hash_page_items_bytea(bytea)
 function hash_page_stats(text,integer)
 function hash_page_stats_bytea(bytea,integer)

Now the following set of functions would be sufficient:
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
The last time pageinspect has been updated, when BRIN functions have
been added, it has been discussed to just use (bytea) as an argument
interface and just rely on get_raw_page() to get the pages wanted, so
I think that we had better stick with that and keep things simple.



Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked 
for both.


Attached is v3 with only the bytea based methods.

Alvaro, Michael and Jeff - Thanks for the review !

Best regards,
 Jesper


>From 0aff82ccb40f00efe9e48cacef9c8a45c1327da2 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 408 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  64 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 339 +
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 100 +++
 7 files changed, 917 insertions(+), 285 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..4d2cd2a
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,408 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+
+PG_FUNCTION_INFO_V1(hash_metap);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		blkno;
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		max_avail;
+	uint32		free_size;
+	uint32		avg_item_size;
+	char		type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	return page;
+}
+
+/* -
+ * GetHashPageStatistics()
+ *

Re: [HACKERS] pageinspect: Hash index support

2016-09-19 Thread Jesper Pedersen


On 09/14/2016 04:21 PM, Jeff Janes wrote:

I suggest that pageinspect functions are more convenient to use via the
get_raw_page interface, that is, instead of reading the buffer
themselves, the buffer is handed over from elsewhere and they receive it
as bytea.  This enables other use cases such as grabbing a page from one
server and examining it in another one.



I've never needed to do that, but it does sound like it might be useful.
But it is also annoying to have to do that when you want to examine a
current server.  Could we use overloading, so that it can be used in either
way?

For hash_page_items, the 'data' is always a 32 bit integer, isn't it?
(unlike other pageinspect features, where the data could be any
user-defined data type).  If so, I'd rather see it in plain hex (without
the spaces, without the trailing zeros)

+   /* page type (flags) */
+   if (opaque->hasho_flag & LH_UNUSED_PAGE)
+   stat->type = 'u';

This can never be true, because LH_UNUSED_PAGE is zero.  You should make
this be the fall-through case, and have LH_META_PAGE be explicitly tested
for.



This version adds the overloaded get_raw_page based methods. However, 
I'm not verifying the page other than page_id, as hasho_flag can contain 
multiple flags in Amit's patches. And, I don't see having different page 
ids as a benefit - at least not part of this patch.


We should probably just have one of the method sets, so I'll send a v3 
with the set voted for.


I kept the 'data' field as is, for now.

Best regards,
 Jesper

>From 0dc44e4b3cc1d31c53684a41fbd7959978e69089 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 766 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql | 111 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 386 +
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 103 
 7 files changed, 1372 insertions(+), 285 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..380203c
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,766 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+
+
+PG_FUNCTION_INFO_V1(hash_metap_bytea);
+PG_FUNCTION_INFO_V1(hash_metap);
+PG_FUNCTION_INFO_V1(hash_page_items_bytea);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats_bytea);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
+#define IS_HASH(r) ((r)->rd_rel->relam == HASH_AM_OID)
+
+/* note: BlockNumber is unsigned, hence can't be negative */
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+		if ( RelationGetNumberOfBlocks(rel) <= (BlockNumber) (blkno) ) \
+			 elog(ERROR, "block number out of range"); }
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		blkno;
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		max_avail;
+	uint32		free_size;
+	uint32		avg_item_size;
+	char		type;
+
+	/* opaque data */
+	B

Re: [HACKERS] Hash Indexes

2016-09-16 Thread Jesper Pedersen


On 09/16/2016 03:18 AM, Amit Kapila wrote:

Attached is a run with 1000 rows.



I think 1000 is also less, you probably want to run it for 100,000 or
more rows.  I suspect that the reason why you are seeing the large
difference between btree and hash index is that the range of values is
narrow and there may be many overflow pages.



Attached is 100,000.


I think for CHI is would be Robert's and others feedback. For WAL, there is
[1].



I have fixed your feedback for WAL and posted the patch.


Thanks !


I think the
remaining thing to handle for Concurrent Hash Index patch is to remove
the usage of hashscan.c from code if no one objects to it, do let me
know if I am missing something here.



Like Robert said, hashscan.c can always come back, and it would take a 
call-stack out of the 'am' methods.


Best regards,
 Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-15 Thread Jesper Pedersen

On 09/15/2016 02:03 AM, Amit Kapila wrote:

Same thing here - where the fields involving the hash index aren't updated.

Do you mean that for such cases also you see 40-60% gain?

No, UPDATEs are around 10-20% for our cases.

I have done a run to look at the concurrency / TPS aspect of the
implementation - to try something different than Mark's work on testing the
pgbench setup.

With definitions as above, with SELECT as

-- select.sql --
\set id random(1,10)
BEGIN;
SELECT * FROM test WHERE id = :id;
COMMIT;

and UPDATE/Indexed with an index on 'val', and finally UPDATE/Nonindexed w/o
one.

[1] [2] [3] is new_hash - old_hash is the existing hash implementation on
master. btree is master too.

Machine is a 28C/56T with 256Gb RAM with 2 x RAID10 SSD for data + wal.
Clients ran with -M prepared.

[1]
https://www.postgresql.org/message-id/caa4ek1+erbp+7mdkkahjzwq_dtdkocbpt7lswfwcqvuhbxz...@mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAD__OujvYghFX_XVkgRcJH4VcEbfJNSxySd9x=1wp5vylvk...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/caa4ek1juyr_ab7bxfnsg5+jqhiwgklkgacfk9bfd4mlffk6...@mail.gmail.com

Don't know if you find this useful due to the small number of rows, but let
me know if there are other tests I can run, f.ex. bump the number of rows.

It might be useful to test with higher number of rows because with so
less data contention is not visible,

Attached is a run with 1000 rows.

but I think in general with your,
jeff's and mine own tests it is clear that there is significant win
for read-only cases and for read-write cases where index column is not
updated. Also, we don't find any regression as compare to HEAD which
is sufficient to prove the worth of patch.

Very much agreed.

I think we should not
forget that one of the other main reason for this patch is to allow
WAL logging for hash indexes.

Absolutely. There are scenarios that will have a benefit of switching to
a hash index.

I think for now, we have done
sufficient tests for this patch to ensure it's benefit, now if any
committer wants to see something more we can surely do it.

Ok.

I think
the important thing at this stage is to find out what more (if
anything) is left to make this patch as "ready for committer".

I think for CHI is would be Robert's and others feedback. For WAL, there
is [1].

[1]
https://www.postgresql.org/message-id/5f8b4681-1229-92f4-4315-57d780d9c128%40redhat.com

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-14 Thread Jesper Pedersen


Hi,

On 09/14/2016 07:24 AM, Amit Kapila wrote:

On Wed, Sep 14, 2016 at 12:29 AM, Jesper Pedersen
 wrote:

On 09/13/2016 07:26 AM, Amit Kapila wrote:


Attached, new version of patch which contains the fix for problem
reported on write-ahead-log of hash index thread [1].



I have been testing patch in various scenarios, and it has a positive
performance impact in some cases.

This is especially seen in cases where the values of the indexed column are
unique - SELECTs can see a 40-60% benefit over a similar query using b-tree.



Here, I think it is better if we have the data comparing the situation
of hash index with respect to HEAD as well.  What I mean to say is
that you are claiming that after the hash index improvements SELECT
workload is 40-60% better, but where do we stand as of HEAD?



The tests I have done are with a copy of a production database using the 
same queries sent with a b-tree index for the primary key, and the same 
with a hash index. Those are seeing a speed-up of the mentioned 40-60% 
in execution time - some involve JOINs.


Largest of those tables is 390Mb with a CHAR() based primary key.


UPDATE also sees an improvement.



Can you explain this more?  Is it more compare to HEAD or more as
compare to Btree?  Isn't this contradictory to what the test in below
mail shows?



Same thing here - where the fields involving the hash index aren't updated.


In cases where the indexed column value isn't unique, it takes a long time
to build the index due to the overflow page creation.

Also in cases where the index column is updated with a high number of
clients, ala

-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;

w/ 100 clients - it takes longer than the b-tree counterpart (2921 tps for
hash, and 10062 tps for b-tree).



Thanks for doing the tests.  Have you applied both concurrent index
and cache the meta page patch for these tests?  So from above tests,
we can say that after these set of patches read-only workloads will be
significantly improved even better than btree in quite-a-few useful
cases.


Agreed.


 However when the indexed column is updated, there is still a
large gap as compare to btree (what about the case when the indexed
column is not updated in read-write transaction as in our pgbench
read-write transactions, by any chance did you ran any such test?).


I have done a run to look at the concurrency / TPS aspect of the 
implementation - to try something different than Mark's work on testing 
the pgbench setup.


With definitions as above, with SELECT as

-- select.sql --
\set id random(1,10)
BEGIN;
SELECT * FROM test WHERE id = :id;
COMMIT;

and UPDATE/Indexed with an index on 'val', and finally UPDATE/Nonindexed 
w/o one.


[1] [2] [3] is new_hash - old_hash is the existing hash implementation 
on master. btree is master too.


Machine is a 28C/56T with 256Gb RAM with 2 x RAID10 SSD for data + wal. 
Clients ran with -M prepared.


[1] 
https://www.postgresql.org/message-id/caa4ek1+erbp+7mdkkahjzwq_dtdkocbpt7lswfwcqvuhbxz...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAD__OujvYghFX_XVkgRcJH4VcEbfJNSxySd9x=1wp5vylvk...@mail.gmail.com
[3] 
https://www.postgresql.org/message-id/caa4ek1juyr_ab7bxfnsg5+jqhiwgklkgacfk9bfd4mlffk6...@mail.gmail.com


Don't know if you find this useful due to the small number of rows, but 
let me know if there are other tests I can run, f.ex. bump the number of 
rows.



I
think we need to focus on improving cases where index columns are
updated, but it is better to do that work as a separate patch.



Ok.

Best regards,
 Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-13 Thread Jesper Pedersen


On 09/13/2016 07:26 AM, Amit Kapila wrote:

Attached, new version of patch which contains the fix for problem
reported on write-ahead-log of hash index thread [1].



I have been testing patch in various scenarios, and it has a positive 
performance impact in some cases.


This is especially seen in cases where the values of the indexed column 
are unique - SELECTs can see a 40-60% benefit over a similar query using 
b-tree. UPDATE also sees an improvement.


In cases where the indexed column value isn't unique, it takes a long 
time to build the index due to the overflow page creation.


Also in cases where the index column is updated with a high number of 
clients, ala


-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;

w/ 100 clients - it takes longer than the b-tree counterpart (2921 tps 
for hash, and 10062 tps for b-tree).


Jeff mentioned upthread the idea of moving the lock to a bucket meta 
page instead of having it on the main meta page. Likely a question for 
the assigned committer.


Thanks for working on this !

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-09-13 Thread Jesper Pedersen


On 09/13/2016 07:41 AM, Amit Kapila wrote:

README:
+in_complete split flag.  The reader algorithm works correctly, as it will
scan

What flag ?



in-complete-split flag which indicates that split has to be finished
for that particular bucket.  The value of these flags are
LH_BUCKET_NEW_PAGE_SPLIT and LH_BUCKET_OLD_PAGE_SPLIT for new and old
bucket respectively.  It is set in hasho_flag in special area of page.
I have slightly expanded the definition in README, but not sure if it
is good idea to mention flags defined in hash.h. Let me know, if still
it is unclear or you want some thing additional to be added in README.



I think it is better now.


hashxlog.c:

hash_xlog_move_page_contents
hash_xlog_squeeze_page

Both have "bukcetbuf" (-> "bucketbuf"), and

+   if (BufferIsValid(bukcetbuf));

->

+   if (BufferIsValid(bucketbuf))

and indent following line:

LockBufferForCleanup(bukcetbuf);

hash_xlog_delete

has the "if" issue too.



Fixed all the above cosmetic issues.



I meant there is an extra ';' on the "if" statements:

+   if (BufferIsValid(bukcetbuf)); <--

in hash_xlog_move_page_contents, hash_xlog_squeeze_page and 
hash_xlog_delete.


Best regards,
 Jesper





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-13 Thread Jesper Pedersen


On 09/12/2016 10:42 PM, Amit Kapila wrote:

The following script hangs on idx_val creation - just with v5, WAL patch
not applied.



Are you sure it is actually hanging? I see 100% cpu for a few minutes but
the index eventually completes ok for me (v5 patch applied to today's
master).



It completed for me as well.  The second index creation is taking more
time and cpu, because it is just inserting duplicate values which need
lot of overflow pages.



Yeah, sorry for the false alarm. It just took 3m45s to complete on my 
machine.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-09-12 Thread Jesper Pedersen


Hi,

On 09/07/2016 05:58 AM, Amit Kapila wrote:

Okay, I have fixed this issue as explained above.  Apart from that, I
have fixed another issue reported by Mark Kirkwood upthread and few
other issues found during internal testing by Ashutosh Sharma.

The locking issue reported by Mark and Ashutosh is that the patch
didn't maintain the locking order while adding overflow page as it
maintains in other write operations (lock the bucket pages first and
then metapage to perform the write operation).  I have added the
comments in _hash_addovflpage() to explain the locking order used in
modified patch.

During stress testing with pgbench using master-standby setup, we
found an issue which indicates that WAL replay machinery doesn't
expect completely zeroed pages (See explanation of RBM_NORMAL mode
atop XLogReadBufferExtended).  Previously before freeing the overflow
page we were zeroing it, now I have changed it to just initialize the
page such that the page will be empty.

Apart from above, I have added support for old snapshot threshold in
the hash index code.

Thanks to Ashutosh Sharma for doing the testing of the patch and
helping me in analyzing some of the above issues.

I forgot to mention in my initial mail that Robert and I had some
off-list discussions about the design of this patch, many thanks to
him for providing inputs.



Some initial feedback.

README:
+in_complete split flag.  The reader algorithm works correctly, as it 
will scan


What flag ?

hashxlog.c:

hash_xlog_move_page_contents
hash_xlog_squeeze_page

Both have "bukcetbuf" (-> "bucketbuf"), and

+   if (BufferIsValid(bukcetbuf));

->

+   if (BufferIsValid(bucketbuf))

and indent following line:

LockBufferForCleanup(bukcetbuf);

hash_xlog_delete

has the "if" issue too.

hash.h:

Move the XLog related content to hash_xlog.h

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-12 Thread Jesper Pedersen


On 09/01/2016 11:55 PM, Amit Kapila wrote:

I have fixed all other issues you have raised.  Updated patch is
attached with this mail.



The following script hangs on idx_val creation - just with v5, WAL patch 
not applied.


Best regards,
 Jesper



zero.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-09-08 Thread Jesper Pedersen

On 09/05/2016 02:50 PM, Mithun Cy wrote:

On Sep 2, 2016 7:38 PM, "Jesper Pedersen"
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.

I think that the

+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

trick should be documented in the README, as hashm_maxbucket is defined
as uint32 where as hasho_prevblkno is a BlockNumber.

(All bucket variables should probably use the Bucket definition instead
of uint32).

For the archives, this patch conflicts with the WAL patch [1].

[1]
https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-09-02 Thread Jesper Pedersen


On 07/22/2016 06:02 AM, Mithun Cy wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.



Could you provide a rebased patch based on Amit's v5 ?

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-01 Thread Jesper Pedersen


On 08/05/2016 07:36 AM, Amit Kapila wrote:

On Thu, Aug 4, 2016 at 8:02 PM, Mithun Cy  wrote:

I did some basic testing of same. In that I found one issue with cursor.



Thanks for the testing.  The reason for failure was that the patch
didn't take into account the fact that for scrolling cursors, scan can
reacquire the lock and pin on bucket buffer multiple times.   I have
fixed it such that we release the pin on bucket buffers after we scan
the last overflow page in bucket. Attached patch fixes the issue for
me, let me know if you still see the issue.



Needs a rebase.

hashinsert.c

+* reuse the space.  There is no such apparent benefit from finsihing 
the

-> finishing

hashpage.c

+ * retrun the buffer, else return InvalidBuffer.

-> return

+   if (blkno == P_NEW)
+   elog(ERROR, "hash AM does not use P_NEW");

Left over ?

+ * for unlocking it.

-> for unlocking them.

hashsearch.c

+* bucket, but not pin, then acuire the lock on new bucket and again

-> acquire

hashutil.c

+ * half.  It is mainly required to finsh the incomplete splits where we are

-> finish

Ran some tests on a CHAR() based column which showed good results. Will 
have to compare with a run with the WAL patch applied.


make check-world passes.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pageinspect: Hash index support

2016-08-30 Thread Jesper Pedersen


Hi,

Attached is a patch that adds support for hash indexes in pageinspect.

The functionality (hash_metap, hash_page_stats and hash_page_items) 
follows the B-tree functions.


This patch will need an update once Amit's and Mithun's work on 
Concurrent Hash Indexes is committed to account for the new meta-page 
constants.


I'll create a CommitFest entry for this submission.

Feedback is most welcome.

Best regards,
 Jesper
>From 55262d5fa3822afbae94f4989627dd65e91fe098 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 472 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  64 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 ---
 contrib/pageinspect/pageinspect--1.6.sql  | 339 ++
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 100 ++
 7 files changed, 981 insertions(+), 285 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..598b7b2
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,472 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+
+
+PG_FUNCTION_INFO_V1(hash_metap);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
+#define IS_HASH(r) ((r)->rd_rel->relam == HASH_AM_OID)
+
+/* note: BlockNumber is unsigned, hence can't be negative */
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+		if ( RelationGetNumberOfBlocks(rel) <= (BlockNumber) (blkno) ) \
+			 elog(ERROR, "block number out of range"); }
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		blkno;
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		max_avail;
+	uint32		free_size;
+	uint32		avg_item_size;
+	char		type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/* -
+ * GetHashPageStatistics()
+ *
+ * Collect statistics of single hash page
+ * -
+ */
+static void
+GetHashPageStatistics(BlockNumber blkno, Buffer buffer, HashPageStat *stat)
+{
+	Page		page = BufferGetPage(buffer);
+	PageHeader	phdr = (PageHeader) page;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	HashPageOpaque opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	int			item_size = 0;
+	int			off;
+
+	stat->blkno = blkno;
+
+	stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
+
+	stat->dead_items = stat->live_items = 0;
+
+	stat->page_size = PageGetPageSize(page);
+
+	/* page type (flags) */
+	if (opaque->hasho_flag & LH_UNUSED_PAGE)
+		stat->type = 'u';
+	else if (opaque->hasho_flag & LH_OVERFLOW_PAGE)
+		stat->type = 'v';
+	else if (opaque->hasho_flag & LH_BUCKET_PAGE)
+		stat->type = 'b';
+	else if (opaque->hasho_flag & LH_BITMAP_PAGE)
+		stat->type = 'i';
+	else
+		stat->type = 'm';
+
+	/* hash page opaqu

Re: [HACKERS] Speedup twophase transactions

2016-05-20 Thread Jesper Pedersen


Hi,

On 04/13/2016 10:31 AM, Stas Kelvich wrote:

On 13 Apr 2016, at 01:04, Michael Paquier  wrote:
That's too late for 9.6 unfortunately, don't forget to add that in the next CF!


Fixed patch attached. There already was infrastructure to skip currently
held locks during replay of standby_redo() and I’ve extended that with check for
prepared xids.

The reason why I’m still active on this thread is because I see real problems
in deploying 9.6 in current state. Let me stress my concern: current state of 
things
_WILL_BREAK_ async replication in case of substantial load of two phase
transactions on master. And a lot of J2EE apps falls into that category, as they
wrap most of their transactions in prepare/commit. Slave server just will always
increase it lag and will never catch up. It is possible to deal with that by 
switching
to synchronous replication or inserting triggers with pg_sleep on master, but it
doesn’t looks like normal behaviour of system.



Discussed with Noah off-list I think we should revisit this for 9.6 due 
to the async replica lag as shown in [1]. The performance improvement 
for the master node is shown in [2].


As I see it there are 3 options to resolve this (in one way or another)

* Leave as is, document the behaviour in release notes/documentation
* Apply the patch for slaves
* Revert the changes done to the twophase.c during the 9.6 cycle.

All have pros/cons for the release.

Latest slave patch by Stas is on [3].

Thoughts from others on the matter would be appreciated.

[1] 
http://www.postgresql.org/message-id/e7497864-de11-4099-83f5-89fb97af5...@postgrespro.ru

[2] http://www.postgresql.org/message-id/5693f703.3000...@redhat.com
[3] https://commitfest.postgresql.org/10/523/

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Jesper Pedersen


On 04/08/2016 02:37 PM, Robert Haas wrote:

On Fri, Apr 8, 2016 at 8:49 AM, Jesper Pedersen
 wrote:

On 04/07/2016 02:29 AM, Michael Paquier wrote:

So recovery is conflicting here. My guess is that this patch is
missing some lock cleanup.

With the test case attached in my case the COMMIT PREPARED record does
not even get replayed.



Should we create an entry for the open item list [0] for this, due to the
replication lag [1] ?

CommitFest entry [2]
Original commit [3]

Cc'ed RMT.


If there is something you think needs to be fixed that is a new issue
in 9.6, then yes you should.  I don't quite understand what thing is
from reading this, so please make sure to describe it clearly.



Michael, you seem to have the necessary permission for this. Could you 
add an entry ?


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Jesper Pedersen


On 04/08/2016 02:42 PM, Robert Haas wrote:

On Tue, Jan 26, 2016 at 7:43 AM, Stas Kelvich  wrote:

Hi,

Thanks for reviews and commit!


I apologize for being clueless here, but was this patch committed?
It's still marked as "Needs Review" in the CommitFest application.



There are 2 parts to this - both in the same email thread.

Part 1 [0] dealt with 2-phase commits on the master node. Part 2 [1] 
deals with replaying on slaves, which currently shows lag.


There is still an open item found by Michael, so part 2 isn't ready to 
be moved to "Ready for Committer" yet.


[0] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=978b2f65aa1262eb4ecbf8b3785cb1b9cf4db78e

[1] https://commitfest.postgresql.org/9/523/

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Jesper Pedersen


On 04/07/2016 02:29 AM, Michael Paquier wrote:

So recovery is conflicting here. My guess is that this patch is
missing some lock cleanup.

With the test case attached in my case the COMMIT PREPARED record does
not even get replayed.



Should we create an entry for the open item list [0] for this, due to 
the replication lag [1] ?


CommitFest entry [2]
Original commit [3]

Cc'ed RMT.

[0] https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items
[1] 
http://www.postgresql.org/message-id/e7497864-de11-4099-83f5-89fb97af5...@postgrespro.ru

[2] https://commitfest.postgresql.org/9/523/
[3] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=978b2f65aa1262eb4ecbf8b3785cb1b9cf4db78e


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-04-04 Thread Jesper Pedersen


On 04/01/2016 04:39 PM, Andres Freund wrote:

On April 1, 2016 10:25:51 PM GMT+02:00, Jesper Pedersen 
 wrote:

Hi,

On 03/31/2016 06:21 PM, Andres Freund wrote:

On March 31, 2016 11:13:46 PM GMT+02:00, Jesper Pedersen

 wrote:



I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.


Yes please. I think the lock variant is realistic, the lockless did

isn't.




I have done a run with -M prepared on unlogged running 10min per data
point, up to 300 connections. Using data + wal on HDD.

I'm not seeing a difference between with and without USE_CONTENT_LOCK
--
all points are within +/- 0.5%.

Let me know if there are other tests I can perform


How do either compare to just 0002 applied?



0001 + 0002 compared to 0001 + 0002 + 0003 (either way) were pretty much 
the same +/- 0.5% on the HDD run.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-04-01 Thread Jesper Pedersen


Hi,

On 03/31/2016 06:21 PM, Andres Freund wrote:

On March 31, 2016 11:13:46 PM GMT+02:00, Jesper Pedersen 
 wrote:


I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.


Yes please. I think the lock variant is realistic, the lockless did isn't.



I have done a run with -M prepared on unlogged running 10min per data 
point, up to 300 connections. Using data + wal on HDD.


I'm not seeing a difference between with and without USE_CONTENT_LOCK -- 
all points are within +/- 0.5%.


Let me know if there are other tests I can perform.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Jesper Pedersen


Hi,

On 03/30/2016 07:09 PM, Andres Freund wrote:

Yes. That looks good. My testing shows that increasing the number of
buffers can increase both throughput and reduce latency variance. The
former is a smaller effect with one of the discussed patches applied,
the latter seems to actually increase in scale (with increased
throughput).


I've attached patches to:
0001: Increase the max number of clog buffers
0002: Implement 64bit atomics fallback and optimize read/write
0003: Edited version of Simon's clog scalability patch

WRT 0003 - still clearly WIP - I've:
- made group_lsn pg_atomic_u64*, to allow for tear-free reads
- split content from IO lock
- made SimpleLruReadPage_optShared always return with only share lock
   held
- Implement a different, experimental, concurrency model for
   SetStatusBit using cmpxchg. A define USE_CONTENT_LOCK controls which
   bit is used.

I've tested this and saw this outperform Amit's approach. Especially so
when using a read/write mix, rather then only reads. I saw over 30%
increase on a large EC2 instance with -btpcb-like@1 -bselect-only@3. But
that's in a virtualized environment, not very good for reproducability.

Amit, could you run benchmarks on your bigger hardware? Both with
USE_CONTENT_LOCK commented out and in?

I think we should go for 1) and 2) unconditionally. And then evaluate
whether to go with your, or 3) from above. If the latter, we've to do
some cleanup :)



I have been testing Amit's patch in various setups and work loads, with 
up to 400 connections on a 2 x Xeon E5-2683 (28C/56T @ 2 GHz), not 
seeing an improvement, but no regression either.


Testing with 0001 and 0002 do show up to a 5% improvement when using a 
HDD for data + wal - about 1% when using 2 x RAID10 SSD - unlogged.


I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.

Thanks for your work on this !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-03-30 Thread Jesper Pedersen

On 03/30/2016 09:19 AM, Stas Kelvich wrote:

> +++ b/src/test/recovery/t/006_twophase.pl
> @@ -0,0 +1,226 @@
> +# Checks for recovery_min_apply_delay
> +use strict;
> This description is wrong, this file has been copied from 005.

Yep, done.

>
> +my $node_master = get_new_node("Candie");
> +my $node_slave = get_new_node('Django');
> Please let's use a node names that are more descriptive.

Hm, it’s hard to create descriptive names because test changes master/slave
roles for that nodes several times during test. It’s possible to call them
“node1” and “node2” but I’m not sure that improves something. But anyway I’m not
insisting on that particular names and will agree with any reasonable 
suggestion.

>
> +# Switch to synchronous replication
> +$node_master->append_conf('postgresql.conf', qq(
> +synchronous_standby_names = '*'
> +));
> +$node_master->restart;
> Reloading would be fine.

Good catch, done.

>
> +   /* During replay that lock isn't really necessary, but let's take
> it anyway */
> +   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
> +   for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> +   {
> +   gxact = TwoPhaseState->prepXacts[i];
> +   proc = &ProcGlobal->allProcs[gxact->pgprocno];
> +   pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
> +
> +   if (TransactionIdEquals(xid, pgxact->xid))
> +   {
> +   gxact->locking_backend = MyBackendId;
> +   MyLockedGxact = gxact;
> +   break;
> +   }
> +   }
> +   LWLockRelease(TwoPhaseStateLock);
> Not taking ProcArrayLock here?

All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock, so
I thick that’s safe. Also I’ve deleted comment above that block, probably it’s
more confusing than descriptive.

>
> The comment at the top of XlogReadTwoPhaseData is incorrect.

Yep, fixed.

>
> RecoverPreparedFromXLOG and RecoverPreparedFromFiles have a lot of
> code in common, having this duplication is not good, and you could
> simplify your patch.

I reworked patch to avoid duplicated code between
RecoverPreparedFromXLOG/RecoverPreparedFromFiles and also
between FinishPreparedTransaction/XlogRedoFinishPrepared.

Patch applies with hunks, and test cases are passing.

Best regards,
 Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-03-21 Thread Jesper Pedersen


On 03/18/2016 12:50 PM, Stas Kelvich wrote:

On 11 Mar 2016, at 19:41, Jesper Pedersen  wrote:



Thanks for review, Jesper.


Some comments:

* The patch needs a rebase against the latest TwoPhaseFileHeader change


Done.


* Rework the check.sh script into a TAP test case (src/test/recovery), as 
suggested by Alvaro and Michael down thread


Done. Originally I thought about reducing number of tests (11 right now), but 
now, after some debugging, I’m more convinced that it is better to include them 
all, as they are really testing different code paths.


* Add documentation for RecoverPreparedFromXLOG


Done.


Thanks, Stas.

I have gone over this version, and tested with --enable-tap-tests + make 
check in src/test/recovery, which passes.


Simon, do you want to move this entry to "Ready for Committer" and take 
the 2REVIEWER section as part of that, or leave it in "Needs Review" 
and update the thread ?


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-19 Thread Jesper Pedersen


On 03/15/2016 01:17 AM, Amit Kapila wrote:

I have updated the comments and changed the name of one of a variable from
"all_trans_same_page" to "all_xact_same_page" as pointed out offlist by
Alvaro.




I have done a run, and don't see any regressions.

Intel Xeon 28C/56T @ 2GHz w/ 256GB + 2 x RAID10 (data + xlog) SSD.

I can provide perf / flamegraph profiles if needed.

Thanks for working on this !

Best regards,
 Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 129 matches

Mail list logo