Re: [HACKERS] Hash tables in dynamic shared memory

2016-11-19 Thread John Gorman
I reviewed the dht-v2.patch and found that it is in excellent shape.

Benchmarking shows that this performs somewhat faster than
dynahash which is surprising because it is doing DSA address
translations on the fly.

One area where this could run faster is to reduce the amount of
time when the hash is in the RESIZE_IN_PROGRESS state.

When a hash partition reaches 75% full a resize begins by allocating
an empty new hash bucket array with double the number of buckets.
This begins the RESIZE_IN_PROGRESS state where there is both an
old and a new hash bucket array.

During the RESIZE state all operations such as find, insert,
delete and iterate run slower due to having to look items up in
two hash bucket arrays.

Whenever a new item is inserted into the hash and the hash is in
the RESIZE state the current code takes the time to move one item
from the old hash partition to the new hash partition. During
insertion an exclusive lock is already held on the partition so
this is an efficient time to do the resize cleanup work.

However since we only clean up one item per insertion we do not
complete the cleanup and free the old hash bucket array until we
are due to start yet another resize operation.

So we are effectively always in the RESIZE state which slows down
operations and wastes some space.

Here are the timings for insert and find in nanoseconds on a
Macbook Pro. The insert_resize figure includes the resize work to
move one item from the old to the new hash bucket array.

insert_resize: 945.98
insert_clean:  450.39
find_resize:   348.90
find_clean:293.16

The attached dht-v2-resize-cleanup.patch patch applies on top of
the dht-v2.patch and speeds up the resize cleanup process so that
hashes spend most of their time in the clean state.

It does this by cleaning up more than one old item during
inserts. This patch cleans up three old items.

There is also the case where a hash receives all of its inserts
at the beginning and the rest of the work load is all finds. This
patch also cleans up two items for each find.

The downside of doing extra cleanup early is some additional
latency. Here are the experimental figures and the approximate
formulas for different numbers of cleanups for inserts. Note that
we cannot go lower than one cleanup per insert.

Resize work in inserts: 729.79 insert + 216.19 * cleanups
1  945.98
2 1178.13
3 1388.73
4 1617.04
5 1796.91

Here are the timings for different numbers of cleanups for finds.
Note that since we do not hold an exclusive partition lock on a find
there is the additional overhead of taking that lock.

Resize work in finds: 348.90 dirty_find + 233.45 lock + 275.78 * cleanups
0  348.90
1  872.88
2 1138.98
3 1448.94
4 1665.31
5 1975.91

The new effective latencies during the resize vs. the clean states.

#define DHT_INSERT_CLEANS 3
#define DHT_SEARCH_CLEANS 2

insert_resize: 1388.73  -- 3 cleaned
insert_clean:   450.39
find_resize:   1138.98  -- 2 cleaned
find_clean: 293.16

The overall performance will be faster due to not having to examine
more than one hash bucket array most of the time.

--

John Gorman
http://www.enterprisedb.com


dht-v2-resize-cleanup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-05 Thread John Gorman
On Tue, Oct 4, 2016 at 10:11 PM, Tomas Vondra

For GenSlab the situation is less clear, as there probably are ways to make
> it work, but I'd vote to keep it simple for now, and simply do elog(ERROR)
> in the realloc() methods - both for Slab and GenSlab. The current use case
> (reorderbuffer) does not need that, and it seems like a can of worms to me.


Good plan. Realloc can be added later if there is an actual use case.


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-04 Thread John Gorman
SlabContext has a parent context. It can delegate requests that
it cannot handle to the parent context which is GenSlab. Genslab
can send them to the sister aset context.

This could handle all reallocs so there will be no surprises.

Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.

Maybe SlabAlloc can be happy with sizes up to chunkSize.

if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);

On Sat, Oct 1, 2016 at 10:15 PM, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

> On 10/02/2016 12:23 AM, John Gorman wrote:
>
>> I reproduced the quadradic pfree performance problem and verified that
>> these patches solved it.
>>
>> The slab.c data structures and functions contain no quadradic components.
>>
>> I noticed the sizing loop in SlabContextCreate() and came up with
>> a similar formula to determine chunksPerBlock that you arrived at.
>>
>> Firstly, I've realized there's an issue when chunkSize gets too
>>> large - once it exceeds blockSize, the SlabContextCreate() fails
>>> as it's impossible to place a single chunk into the block. In
>>> reorderbuffer, this may happen when the tuples (allocated in
>>> tup_context) get larger than 8MB, as the context uses
>>> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>>>
>>> But maybe there's a simpler solution - we may simply cap the
>>> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
>>> AllocSet handles those requests in a special way - for example
>>> instead of tracking them in freelist, those chunks got freed
>>> immediately.
>>>
>>
>> I like this approach because it fixes the performance problems
>> with smaller allocations and doesn't change how larger
>> allocations are handled.
>>
>>
> One more comment about GenSlab, particularly about unpredictability of the
> repalloc() behavior. It works for "large" chunks allocated in the AllocSet
> part, and mostly does not work for "small" chunks allocated in the
> SlabContext. Moreover, the chunkSize changes over time, so for two chunks
> of the same size, repalloc() may work on one of them and fail on the other,
> depending on time of allocation.
>
> With SlabContext it's perfectly predictable - repalloc() call fails unless
> the chunk size is exactly the same as before (which is perhaps a bit
> pointless, but if we decide to fail even in this case it'll work 100% time).
>
> But with GenSlabContext it's unclear whether the call fails or not.
>
> I don't like this unpredictability - I'd much rather have consistent
> failures (making sure people don't do repalloc() on with GenSlab). But I
> don't see a nice way to achieve that - the repalloc() call does not go
> through GenSlabRealloc() at all, but directly to SlabRealloc() or
> AllocSetRealloc().
>
> The best solution I can think of is adding an alternate version of
> AllocSetMethods, pointing to a different AllocSetReset implementation.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-01 Thread John Gorman
I reproduced the quadradic pfree performance problem and verified that
these patches solved it.

The slab.c data structures and functions contain no quadradic components.

I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.

> Firstly, I've realized there's an issue when chunkSize gets too
> large - once it exceeds blockSize, the SlabContextCreate() fails
> as it's impossible to place a single chunk into the block. In
> reorderbuffer, this may happen when the tuples (allocated in
> tup_context) get larger than 8MB, as the context uses
> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>
> But maybe there's a simpler solution - we may simply cap the
> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
> AllocSet handles those requests in a special way - for example
> instead of tracking them in freelist, those chunks got freed
> immediately.

I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.

In slab.c it looks like a line in the top comments could be clearer.
Perhaps this is what is meant.

< *  (plus alignment), now wasting memory.
> *  (plus alignment), not wasting memory.

In slab.c some lines are over 80 characters could be folded.

It would be nice to give each patch version a unique file name.

Nice patch, I enjoyed reading it!

Best, John

John Gorman

On Mon, Sep 26, 2016 at 10:10 PM, Tomas Vondra <tomas.von...@2ndquadrant.com
> wrote:

> Hi,
>
> Attached is v2 of the patch, updated based on the review. That means:
>
> - Better comment explaining how free chunks are tracked in Slab context.
>
> - Removed the unused SlabPointerIsValid macro.
>
> - Modified the comment before SlabChunkData, explaining how it relates
>   to StandardChunkHeader.
>
> - Replaced the two Assert() calls with elog().
>
> - Implemented SlabCheck(). I've ended up with quite a few checks there,
>   checking pointers between the context, block and chunks, checks due
>   to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
>   number of free chunks (bitmap, freelist vs. chunk header).
>
> - I've also modified SlabContextCreate() to compute chunksPerBlock a
>   bit more efficiently (use a simple formula instead of the loop, which
>   might be a bit too expensive for large blocks / small chunks).
>
>
> I haven't done any changes to GenSlab, but I do have a few notes:
>
> Firstly, I've realized there's an issue when chunkSize gets too large -
> once it exceeds blockSize, the SlabContextCreate() fails as it's impossible
> to place a single chunk into the block. In reorderbuffer, this may happen
> when the tuples (allocated in tup_context) get larger than 8MB, as the
> context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>
> For Slab the elog(ERROR) is fine as both parameters are controlled by the
> developer directly, but GenSlab computes the chunkSize on the fly, so we
> must not let it fail like that - that'd result in unpredictable failures,
> which is not very nice.
>
> I see two ways to fix this. We may either increase the block size
> automatically - e.g. instead of specifying specifying chunkSize and
> blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and
> then choose the smallest 2^k block large enough). For example with
> chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the
> closest 2^k block larger than 96000 bytes.
>
> But maybe there's a simpler solution - we may simply cap the chunkSize (in
> GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those
> requests in a special way - for example instead of tracking them in
> freelist, those chunks got freed immediately.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


[HACKERS] Status of 64 bit atomics

2016-05-27 Thread John Gorman
Hi All

Someone recently told me that the postgresql atomics library was incomplete
for 64 bit operations such as pg_atomic_fetch_add_u64() and should not be
used.

Can someone definitively confirm whether it is okay to rely on the 64 bit
atomics
or whether it is better to protect 64 bit operations with a spinlock?

Thanks!
John


Re: [HACKERS] Re: [COMMITTERS] pgsql: Respect TEMP_CONFIG when running contrib regression tests.

2016-02-27 Thread John Gorman
On Sat, Feb 27, 2016 at 9:25 AM, Robert Haas  wrote:

> On Sat, Feb 27, 2016 at 7:08 PM, Andrew Dunstan 
> wrote:
>
> > Perhaps what we need to do is modify pg_regress.c slightly to allow more
> > than one --temp-config argument. But that could be done later.
>
> Well, I'm pretty interested in using --temp-config for parallelism
> testing; I want to be able to run the whole regression test suite with
> a given --temp-config.  I'm in agreement with this change but if it
> doesn't play well with that need, I suppose I'll be writing that
> pg_regress.c patch sooner rather than later.


Here is a patch to allow pg_regress to include several --temp-config files.


pg_regress-temp-configs-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] errmsg() clobbers errno

2015-05-19 Thread John Gorman
Hi All

While debugging an extension I discovered that the errmsg()
function zeros out errno.

This is annoying because if the process of assembling a meaningful
error message happens to call errmsg() before calling strerror()
we lose the strerror information. This is exactly the time when we
want to preserve any available error state.

I am attaching a patch to preserve errno across errmsg() calls.

Does this seem like a good idea?

Best, John


errmsg-errno-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Incompatible trig error handling

2015-04-29 Thread John Gorman
Two of the trigonometry functions have differing error condition behavior
between Linux and OSX. The Linux behavior follows the standard set by the
other trig functions.

On Linux:

SELECT asin(2);
 ERROR:  input is out of range



SELECT acos(2);
 ERROR:  input is out of range


On OSX:

SELECT asin(2);
  asin
 --
   NaN
 (1 row)



SELECT asin(2);
  asin
 --
   NaN
 (1 row)


The attached patch brings OSX into line with the expected behaviour and the
additional regression tests verify this.

Is this worth fixing and if so what is the next step?

Best, John


trig-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Compile warnings on OSX 10.10 clang 6.0

2015-04-03 Thread John Gorman
Hi All

I am getting compile warnings on OSX 10.10 from clang 6.0:

clang: warning: argument unused during compilation: '-pthread'

The 5 warnings are where we are making a -dynamiclib and
the -pthread argument is not necessary:

./src/interfaces/libpq/
./src/interfaces/ecpg/pgtypeslib/
./src/interfaces/ecpg/ecpglib/
./src/interfaces/ecpg/compatlib/
./src/interfaces/ecpg/preproc/

This is interfering with using -Wall -Werror to catch warnings.

Any opinions as to whether this is worth fixing and if so
what the cleanest approach might be?

Thanks, John


Re: [HACKERS] Compile warnings on OSX 10.10 clang 6.0

2015-04-03 Thread John Gorman
I have confirmed that -Wno-unused-command-line-argument
suppresses the -pthread warning for clang 6.0 and does not
trigger a warning in gcc 4.9.

Works for me!

John

On Fri, Apr 3, 2015 at 5:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  Peter Eisentraut pete...@gmx.net writes:
  These warnings also happen with older versions of clang.  Now idea how
  to fix yet.  I'm thinking that clang should be fixed, because these
  warnings are stupid.

  Yeah, they're utterly stupid; whoever put them in obviously doesn't
  have a clue about typical Makefile construction.  I wonder if next
  we'll see complaints about unnecessary -D or -I switches.

  Having said that, I did look awhile ago about how we might get rid of
  them, and it seems not easy; for starters we would need to drop the
  assumption that CFLAGS can always be included when linking.  Also,
  AFAICT -pthread sometimes *is* required when linking; so it's
  not even very obvious when to suppress the switch, even if we could
  do so without wholesale rearrangement of our FLAGS handling.

 On the other hand, there's often more than one way to skin a cat.
 It occurred to me that maybe we could just turn off this class of warning,
 and after some experimentation I found out that
 -Wno-unused-command-line-argument does that, at least in the version
 of clang that Apple's currently shipping.

 Who's for enabling that if the compiler takes it?

 regards, tom lane



Re: [HACKERS] Parallel Seq Scan

2015-01-13 Thread John Gorman
On Sun, Jan 11, 2015 at 6:00 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Jan 11, 2015 at 6:01 AM, Stephen Frost sfr...@snowman.net wrote:
  So, for my 2c, I've long expected us to parallelize at the relation-file
  level for these kinds of operations.  This goes back to my other
  thoughts on how we should be thinking about parallelizing inbound data
  for bulk data loads but it seems appropriate to consider it here also.
  One of the issues there is that 1G still feels like an awful lot for a
  minimum work size for each worker and it would mean we don't parallelize
  for relations less than that size.

 Yes, I think that's a killer objection.


One approach that I has worked well for me is to break big jobs into much
smaller bite size tasks. Each task is small enough to complete quickly.

We add the tasks to a task queue and spawn a generic worker pool which eats
through the task queue items.

This solves a lot of problems.

- Small to medium jobs can be parallelized efficiently.
- No need to split big jobs perfectly.
- We don't get into a situation where we are waiting around for a worker to
finish chugging through a huge task while the other workers sit idle.
- Worker memory footprint is tiny so we can afford many of them.
- Worker pool management is a well known problem.
- Worker spawn time disappears as a cost factor.
- The worker pool becomes a shared resource that can be managed and
reported on and becomes considerably more predictable.


Re: [HACKERS] Parallel Seq Scan

2015-01-13 Thread John Gorman
On Tue, Jan 13, 2015 at 7:25 AM, John Gorman johngorm...@gmail.com wrote:



 On Sun, Jan 11, 2015 at 6:00 PM, Robert Haas robertmh...@gmail.com
 wrote:

 On Sun, Jan 11, 2015 at 6:01 AM, Stephen Frost sfr...@snowman.net
 wrote:
  So, for my 2c, I've long expected us to parallelize at the relation-file
  level for these kinds of operations.  This goes back to my other
  thoughts on how we should be thinking about parallelizing inbound data
  for bulk data loads but it seems appropriate to consider it here also.
  One of the issues there is that 1G still feels like an awful lot for a
  minimum work size for each worker and it would mean we don't parallelize
  for relations less than that size.

 Yes, I think that's a killer objection.


 One approach that I has worked well for me is to break big jobs into much
 smaller bite size tasks. Each task is small enough to complete quickly.

 We add the tasks to a task queue and spawn a generic worker pool which
 eats through the task queue items.

 This solves a lot of problems.

 - Small to medium jobs can be parallelized efficiently.
 - No need to split big jobs perfectly.
 - We don't get into a situation where we are waiting around for a worker
 to finish chugging through a huge task while the other workers sit idle.
 - Worker memory footprint is tiny so we can afford many of them.
 - Worker pool management is a well known problem.
 - Worker spawn time disappears as a cost factor.
 - The worker pool becomes a shared resource that can be managed and
 reported on and becomes considerably more predictable.


I forgot to mention that a running task queue can provide metrics such as
current utilization, current average throughput, current queue length and
estimated queue wait time. These can become dynamic cost factors in
deciding whether to parallelize.


[HACKERS] Fractions in GUC variables

2014-12-07 Thread John Gorman
This patch implements the first wiki/Todo Configuration Files item
Consider normalizing fractions in postgresql.conf, perhaps using '%'.

The Fractions in GUC variables discussion is here.

http://www.postgresql.org/message-id/467132cf.9020...@enterprisedb.com

This patch implements expressing GUC variables as percents in
postgresql.conf.

autovacuum_vacuum_scale_factor = 20%   # percent of table size before vacuum
autovacuum_analyze_scale_factor = 10%  # percent of table size before
analyze

As you can see the postgresql.conf file and the documentation read more
naturally.  I added a regression test to guc.sql. The sql interface also
accepts both numeric and percent forms although the percent form must be
quoted because '%' is an operator.

show cursor_tuple_fraction; -- 10%
set cursor_tuple_fraction = .15; -- 15%
set cursor_tuple_fraction = '33%'; -- 33%

I tagged four configuration variables to display as percents.

The attached patch applies cleanly against master and passes all regression
tests including two new tests in guc.sql.


guc_percent-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers