Re: [HACKERS] Implementing pg_receivewal --no-sync

2017-10-25 Thread Kuntal Ghosh
On Wed, Oct 25, 2017 at 6:07 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> Hi all,
>
> After thinking a bit on the subject, I have decided to submit a patch
> to do $subject. This makes pg_receivewal more consistent with
> pg_basebackup. This option is mainly useful for testing, something
> that becomes way more doable since support for --endpos has been
> added.
>
> Unsurprisingly, --synchronous and --no-sync are incompatible options.
+   
+By default, pg_receivewal flushes a WAL segment's
+contents each time a feedback message is sent to the server depending
+on the interval of time defined by
+--status-interval.
IMHO, it's okay to remove the part 'depending on
the.--status-interval'.

+This option causes
+pg_receivewal to not issue such flushes waiting,
Did you mean 'to not issue such flush waitings'?


+ [ 'pg_receivewal', '-D', $stream_dir, '--synchronous', '--no-sync' ],
+ 'failure if --synchronous specified without --no-sync');
s/without/with


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Kuntal Ghosh
On Fri, Oct 13, 2017 at 1:58 PM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
>
>
> On Fri, Oct 13, 2017 at 11:55 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>
>> On Thu, Oct 12, 2017 at 8:00 PM, Haribabu Kommi
>> <kommi.harib...@gmail.com> wrote:
>>
>> That seems like a strange choice of API.  I think it should more
>> integrated with the scan logic.  For example, if I'm doing an index
>> scan, and I get a TID, then I should be able to just say "here's a
>> TID, give me any tuples associated with that TID that are visible to
>> the scan snapshot".  Then for the current heap it will do
>> heap_hot_search_buffer, and for zheap it will walk the undo chain and
>> return the relevant tuple from the chain.
>
>
> OK, Understood.
> I will check along these lines and come up with storage API's.
>
I've some doubts regarding the following function hook:

+typedef bool (*hot_search_buffer_hook) (ItemPointer tid, Relation relation,
+Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
+bool *all_dead, bool first_call);

As per my understanding, with HOT feature  a new tuple placed on the
same page and with all indexed columns the same as its parent row
version does not get new index entries (README.HOT). For some other
storage engine, if we maintain the older version in different storage,
undo for example, and don't require a new index entry, should we still
call it HOT-chain? If not, IMHO, we may have something like
*search_buffer_hook(tid,,storageTuple,...). Depending on the
underlying storage, one can traverse hot-chain or undo-chain or some
other multi-version strategy for non-key updates.

After a successful index search, most of the index AMs set
(HeapTupleData)xs_ctup->t_self of IndexScanDescData to the tupleid in
the storage. IMHO, some changes are needed here to make it generic.

@@ -328,47 +376,27 @@ ExecStoreTuple(HeapTuple tuple,
  Assert(tuple != NULL);
  Assert(slot != NULL);
  Assert(slot->tts_tupleDescriptor != NULL);
+ Assert(slot->tts_storageslotam != NULL);
  /* passing shouldFree=true for a tuple on a disk page is not sane */
  Assert(BufferIsValid(buffer) ? (!shouldFree) : true);
For some storage engine, isn't it possible that the buffer is valid
and the tuple to be stored is formed in memory (for example, tuple
formed from UNDO, in-memory decrypted tuple etc.)

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "inconsistent page found" with checksum and wal_consistency_checking enabled

2017-09-20 Thread Kuntal Ghosh
On Wed, Sep 20, 2017 at 11:05 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 2:26 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Wed, Sep 20, 2017 at 10:22 AM, Michael Paquier
>> <michael.paqu...@gmail.com> wrote:
>>> On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagra...@pivotal.io> wrote:
>>>> Currently, page checksum is not masked by Page masking routines used by
>>>> wal_consistency_checking facility. So, when running `make installcheck` 
>>>> with
>>>> data checksum enabled and wal_consistency_checking='all', it easily and
>>>> consistently FATALs with "inconsistent page found".
>>>
>>> Indeed. This had better be fixed before PG10 is out. I am adding an open 
>>> item.
>>>
>> Oops and surprised! How come we missed that previously. If page lsns
>> are different, checksums will be different as well. Anyhow, nice catch
>> and thanks for the patch.
>
> That happens. We have really covered maaany points during many rounds
> of reviews, still I am not surprised to see one or two things that
> fell into the cracks.
Yup, that's true. :-)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "inconsistent page found" with checksum and wal_consistency_checking enabled

2017-09-19 Thread Kuntal Ghosh
On Wed, Sep 20, 2017 at 10:22 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagra...@pivotal.io> wrote:
>> Currently, page checksum is not masked by Page masking routines used by
>> wal_consistency_checking facility. So, when running `make installcheck` with
>> data checksum enabled and wal_consistency_checking='all', it easily and
>> consistently FATALs with "inconsistent page found".
>
> Indeed. This had better be fixed before PG10 is out. I am adding an open item.
>
Oops and surprised! How come we missed that previously. If page lsns
are different, checksums will be different as well. Anyhow, nice catch
and thanks for the patch.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Kuntal Ghosh
On Thu, Sep 14, 2017 at 11:33 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-09-12 00:19:48 -0700, Andres Freund wrote:
>> Hi,
>>
>> I've recently seen a benchmark in which pg_mbcliplen() showed up
>> prominently. Which it will basically in any benchmark with longer query
>> strings, but fast queries. That's not that uncommon.
>>
>> I wonder if we could avoid the cost of pg_mbcliplen() from within
>> pgstat_report_activity(), by moving some of the cost to the read
>> side. pgstat values are obviously read far less frequently in nearly all
>> cases that are performance relevant.
>>
>> Therefore I wonder if we couldn't just store a querystring that's
>> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
>> read side.  I think that should work because all *server side* encodings
>> store character lengths in the *first* byte of a multibyte character
>> (at least one clientside encoding, gb18030, doesn't behave that way).
>>
>> That'd necessitate an added memory copy in pg_stat_get_activity(), but
>> that seems fairly harmless.
>>
>> Faults in my thinking?
>
> Here's a patch that implements that idea.  Seems to work well.  I'm a
> bit loathe to add proper regression tests for this, seems awfully
> dependent on specific track_activity_query_size settings.  I did confirm
> using gdb that I see incomplete characters before
> pgstat_clip_activity(), but not after.
>
> I've renamed st_activity to st_activity_raw to increase the likelihood
> that potential external users of st_activity notice and adapt. Increases
> the noise, but imo to a very bareable amount. Don't feel strongly
> though.
>
Hello,

The patch looks good to me. I've done some regression testing with a
custom script on my local system. The script contains the following
statement:
SELECT 'aaa..' as col;

Test 1
---
duration: 300 seconds
clients/threads: 1

On HEAD TPS: 13181
+9.30% 0.27%  postgres  postgres [.] pgstat_report_activity
+8.88% 0.02%  postgres  postgres [.] pg_mbcliplen
+7.76% 4.77%  postgres  postgres [.] pg_encoding_mbcliplen
+4.06% 4.06%  postgres  postgres [.] pg_utf_mblen

With the patch TPS:13628 (+3.39%)
+0.36% 0.21%  postgres  postgres  [.] pgstat_report_activity

Test 2
---
duration: 300 seconds
clients/threads: 8

On HEAD TPS: 53084
+   12.17% 0.20%  postgres  postgres [.]
pgstat_report_activity
+   11.83% 0.02%  postgres  postgres [.] pg_mbcliplen
+   11.19% 8.03%  postgres  postgres [.] pg_encoding_mbcliplen
+3.74% 3.73%  postgres  postgres [.] pg_utf_mblen

With the patch TPS: 63949 (+20.4%)
+0.40% 0.25%  postgres  postgres  [.] pgstat_report_activity

This shows the significance of this patch in terms of performance
improvement of pgstat_report_activity. Is there any other tests I
should do for the same?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-09-13 Thread Kuntal Ghosh
On Tue, Sep 12, 2017 at 9:47 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Aleksander Alekseev <a.aleks...@postgrespro.ru> writes:
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, passed
>> Implements feature:   tested, passed
>> Spec compliant:   tested, passed
>> Documentation:tested, passed
>
>> Also I didn't manage to find any typos or obvious mistakes in the code.
>
> Thanks for reviewing!  I assume this is against the prior version of the
> patch, though, not the one I just posted with updates for contrib.
> Do you want to look at those?
>
I've performed the regression tests for the same. It passed all the
test cases. Also, I've verified the feature implementation using the
queries posted by you earlier and some of my own test cases. It is
working as expected.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-11 Thread Kuntal Ghosh
On Tue, Sep 12, 2017 at 9:06 AM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
>
>
> On Tue, Sep 12, 2017 at 2:04 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
> wrote:
>>
>> On Wed, Sep 6, 2017 at 9:16 AM, Haribabu Kommi <kommi.harib...@gmail.com>
>> wrote:
>> >
>> > Attached the latest patch and performance report.
>> >
>> While looking into the patch, I realized that a normal backend has to
>> check almost 10 if conditions at worst case inside XLogWrite(6 in
>> am_background_process method, 1 for write, 1 for blocks, 2 for
>> fsyncs), just to decide whether it is a normal backend or not. The
>> count is more for walwriter process. Although I've not done any
>> performance tests, IMHO, it might add further overhead on the
>> XLogWrite Lock.
>>
>> I was thinking whether it is possible to collect all the stats in
>> XLogWrite() irrespective of the type of backend and update the shared
>> counters at once at the end of the function. Thoughts?
>
>
> Thanks for the review.
> Yes, I agree with you that the stats update can be done at the end
> of the function to avoid some overhead. Updated patch is attached.
>
Thanks for the patch.
+ * Check whether the current process is a normal backend or not.
+ * This function checks for the background processes that does
+ * some WAL write activity only and other background processes
+ * are not considered. It considers all the background workers
+ * as WAL write activity workers.
+ *
+ * Returns false - when the current process is a normal backend
+ *true - when the current process a background process/worker
+ */
+static bool
+am_background_process()
+{
+   /* check whether current process is a background process/worker? */
+   if (!AmBackgroundWriterProcess() ||
+   !AmCheckpointerProcess() ||
+   !AmStartupProcess() ||
+   !IsBackgroundWorker ||
+   !am_walsender ||
+   !am_autovacuum_worker)
+   {
+   return false;
+   }
+
+   return true;
+}
I think you've to do AND operation here instead of OR. Isn't it?
Another point is that, the function description could start with
'Check whether the current process is a background process/worker.'

> There is an overhead with IO time calculation. Is the view is good
> enough without IO columns?
I'm not sure how IO columns are useful for tuning the WAL write/fsync
performance from an user's perspective. But, it's definitely useful
for developing/improving stuffs in XLogWrite.

>
> And also during my tests, I didn't observe any other background
> processes performing the xlogwrite operation, the values are always
> zero. Is it fine to merge them with backend columns?
>
Apart from wal writer process, I don't see any reason why you should
track other background processes separately from normal backends.
However, I may be missing some important point.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-11 Thread Kuntal Ghosh
On Wed, Sep 6, 2017 at 9:16 AM, Haribabu Kommi <kommi.harib...@gmail.com> wrote:
>
> Attached the latest patch and performance report.
>
While looking into the patch, I realized that a normal backend has to
check almost 10 if conditions at worst case inside XLogWrite(6 in
am_background_process method, 1 for write, 1 for blocks, 2 for
fsyncs), just to decide whether it is a normal backend or not. The
count is more for walwriter process. Although I've not done any
performance tests, IMHO, it might add further overhead on the
XLogWrite Lock.

I was thinking whether it is possible to collect all the stats in
XLogWrite() irrespective of the type of backend and update the shared
counters at once at the end of the function. Thoughts?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-08-09 Thread Kuntal Ghosh
On Wed, Aug 9, 2017 at 10:24 AM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
>
> I tested the latest patch and the parallel plan is getting choose for most
> of
> the init plans.
>
Thanks for testing.

> For the following query the parallel plan is not chosen. The query contains
> an init plan that refer the outer node.
>
> postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
> t2 where t1.k = (select max(k) from t3 where t3.i=t1.i));
>  QUERY PLAN
> -
>  Seq Scan on t1  (cost=0.00..22426.28 rows=448 width=12) (actual
> time=8.335..132.557 rows=2 loops=1)
>Filter: (SubPlan 2)
>Rows Removed by Filter: 894
>SubPlan 2
>  ->  Result  (cost=16.27..31.26 rows=999 width=4) (actual
> time=0.146..0.146 rows=0 loops=896)
>One-Time Filter: (t1.k = $1)
>InitPlan 1 (returns $1)
>  ->  Aggregate  (cost=16.25..16.27 rows=1 width=4) (actual
> time=0.145..0.145 rows=1 loops=896)
>->  Seq Scan on t3  (cost=0.00..16.25 rows=2 width=4)
> (actual time=0.131..0.144 rows=0 loops=896)
>  Filter: (i = t1.i)
>  Rows Removed by Filter: 900
>->  Seq Scan on t2  (cost=16.27..31.26 rows=999 width=4) (actual
> time=0.012..0.013 rows=10 loops=2)
>  Planning time: 0.272 ms
>  Execution time: 132.623 ms
> (14 rows)
>
An observation is that the filter at Result node can't be pushed down
to the sequential scan on t2 because the filter is on t1. So, it has
to scan the complete t2 relation and send all the tuple to upper node,
a worst case for parallelism. Probably, this is the reason the
optimizer doesn't pick parallel plan for the above case.

Just for clarification, do you see any changes in the plan after
forcing parallelism(parallel_tuple_cost, parallel_setup_cost,
min_parallel_table_scan_size=0)?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-07-09 Thread Kuntal Ghosh
On Fri, Jul 7, 2017 at 2:53 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Kuntal Ghosh <kuntalghosh.2...@gmail.com> writes:
Wow. Thank you for the wonderful explanation.

>> On Thu, Jul 6, 2017 at 3:45 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> if histogram_bounds are assigned as,
>> {0,9,19,29,39,49,59,69,79,89,99,109,119,129,13,..}
>> ...
>> So, if val=low, then hisfrac = (bucket_current - 1)/num_of_buckets
>> which means it assumes val is included in the previous bucket.
>
> I looked at that again and realized that one of the answers I was missing
> lies in the behavior of analyze.c's compute_scalar_stats().  When it has
> collected "nvals" values and wants to make a histogram containing
> "num_hist" entries, it does this:
>
>  * The object of this loop is to copy the first and last values[]
>  * entries along with evenly-spaced values in between.  So the
>  * i'th value is values[(i * (nvals - 1)) / (num_hist - 1)].
>
> (where i runs from 0 to num_hist - 1).  Because the "/" denotes integer
> division, this effectively means that for all but the first entry,
> we are taking the last value out of the corresponding bucket of samples.
> That's obviously true for the last histogram entry, which will be the very
> last sample value, and it's also true for earlier ones --- except for the
> *first* histogram entry, which is necessarily the first one in its bucket.
> So the first histogram bucket is actually a tad narrower than the others,
> which is visible in the typical data you showed above: 0 to 9 is only 9
> counts wide, but all the remaining buckets are 10 counts wide.  That
> explains why we need a scale adjustment just in the first bucket.
>
Agreed. But, I've some doubt regarding the amount of adjustment needed.
+  if (i == 1)
+  histfrac += eq_selec * (1.0 - binfrac);
For predicate x<=p, when p lies in the first bucket, following is the amount
of binfrac that we're off from the actual one.
(Assume the same histogram boundaries i.e., 0,9,19,...)

For p=0, (1/10)-(0/9) = 1/10 * (1 - 0)
For p=1, (2/10)-(1/9) = 1/10 * (1 - 1/9)
For p=2, (3/10)-(2/9) = 1/10 * (1 - 2/9)
For p=3, (4/10)-(3/9) = 1/10 * (1 - 3/9)

In general, 1/(high + 1 - low) * (1.0 - binfrac) and the difference in
histfrac is
(1.0 - binfrac) / (high + 1 - low) / num_of_hist_buckets.

So, the above adjustment for the first bucket looks correct when,
otherdistinct ~ (high + 1 - low) * num_of_hist_buckets

Is that always true or I'm missing something?

> I think I'm also beginning to grasp why scalarineqsel's basic estimation
> process produces an estimate for "x <= p" rather than some other condition
> such as "x < p".  For clarity, imagine that we're given p equal to the
> last histogram entry.  If the test operator is in fact "<=", then it will
> say "true" for every histogram entry, and we'll fall off the end of the
> histogram and return 1.0, which is correct.  If the test operator is "<",
> it will return "true" for all but the last entry, so that we end up with
> "i" equal to sslot.nvalues - 1 --- but we will compute binfrac = 1.0,
> if convert_to_scalar() produces sane answers, again resulting in
> histfrac = 1.0.  Similar reasoning applies for ">" and ">=", so that in
> all cases we arrive at histfrac = 1.0 if p equals the last histogram
> entry.  More generally, if p is equal to some interior histogram entry,
> say the k'th one out of n total, we end up with histfrac = (k-1)/(n-1)
> no matter which operator we probe with, assuming that convert_to_scalar()
> correctly gives us a binfrac of 0.0 or 1.0 depending on which of the
> adjacent bins we end up examining.  So, remembering that the histogram
> entries occupy the right ends of their buckets, the proper interpretation
> of that is that it's the probability of "x <= p".  (This is wrong for the
> first histogram entry, but that's why we need a correction there.)
>
True. In general, histfrac = (k-1)/(n-1) + binfrac where binfrac
depends on the linear interpolation.

For p=some histogram boundary, it'll always be the probability of
"x<=p". When p lies inside the bucket and we've a flat distribution,
then also it'll be the probability of "x<=p". But, when we've high
variance inside a bucket and p lies inside the bucket, linear
interpolation fails to capture the actual probability. But, as you've
mentioned earlier, I guess that's not a new problem.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-07 Thread Kuntal Ghosh
On Fri, Jul 7, 2017 at 7:49 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Thu, Jul 6, 2017 at 8:51 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>> On Thu, Jul 6, 2017 at 4:18 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> 
>> wrote:
>>> But, I'm little concerned/doubt regarding the following part of the code.
>>> +/*
>>> + * Converts an int64 from network byte order to native format.
>>> + */
>>> +static int64
>>> +pg_recvint64(int64 value)
>>> +{
>>> +   union
>>> +   {
>>> +   int64   i64;
>>> +   uint32  i32[2];
>>> +   } swap;
>>> +   int64   result;
>>> +
>>> +   swap.i64 = value;
>>> +
>>> +   result = (uint32) ntohl(swap.i32[0]);
>>> +   result <<= 32;
>>> +   result |= (uint32) ntohl(swap.i32[1]);
>>> +
>>> +   return result;
>>> +}
>>> Does this always work correctly irrespective of the endianess of the
>>> underlying system?
>>
>> I think this will have problem, we may need to do like
>>
>> and reverse complete array if byte order is changed
>
> This comes from the the implementation of 64b-large objects, which was
> first broken on big-endian systems, until Tom fixed it with the
> following commit, and this looks fine to me:
> commit: 26fe56481c0f7baa705f0b3265b5a0676f894a94
> author: Tom Lane <t...@sss.pgh.pa.us>
> date: Mon, 8 Oct 2012 18:24:32 -0400
> Code review for 64-bit-large-object patch.
>
Great. Besides, the logic for pg_recvint64 looks same as
fe_recvint64() defined in pg_basebackup.

I don't have any more inputs on this patch and it looks good to me.
So, I'm moving the status to ready for committer.

> At some point it would really make sense to group all things under the
> same banner (64-b LO, pg_basebackup, and now pg_rewind).
>
+1. Implementation-wise, I prefer pg_recvint64 to fe_recvint64.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-06 Thread Kuntal Ghosh
On Wed, Jul 5, 2017 at 9:35 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Jul 4, 2017 at 4:41 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> 
> wrote:
>> I've not yet started the patch and it may take some time for me to
>> understand and write
>> the patch in a correct way. Since, you've almost written the patch,
>> IMHO, please go ahead
>> and submit the patch. I'll happily review and test it. :-)
>>
>> Thanks for the notes.
>
> OK, thanks. Here you go.
>
Thanks for the patch. It looks good and it solves the existing issues.

But, I'm little concerned/doubt regarding the following part of the code.
+/*
+ * Converts an int64 from network byte order to native format.
+ */
+static int64
+pg_recvint64(int64 value)
+{
+   union
+   {
+   int64   i64;
+   uint32  i32[2];
+   } swap;
+   int64   result;
+
+   swap.i64 = value;
+
+   result = (uint32) ntohl(swap.i32[0]);
+   result <<= 32;
+   result |= (uint32) ntohl(swap.i32[1]);
+
+   return result;
+}
Does this always work correctly irrespective of the endianess of the
underlying system?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-07-06 Thread Kuntal Ghosh
On Thu, Jul 6, 2017 at 3:45 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
>> (Pokes at it some more...) Oh, interesting: it behaves that way except
>> when p is exactly the lowest histogram entry.
>
+ /*
+ * In the first bin (i==1), add a fudge factor that ensures
+ * that histfrac is at least eq_selec.  We do this because we
+ * know that the first histogram entry does satisfy the
+ * inequality (if !isgt) or not satisfy it (if isgt), so our
+ * estimate here should certainly not be zero even if binfrac
+ * is zero.  (XXX experimentally this is the correct way to do
+ * it, but why isn't it a linear adjustment across the whole
+ * histogram rather than just the first bin?)
+ */
Given that the values are distinct, (I've some doubts for real number case)

if histogram_bounds are assigned as,
{0,9,19,29,39,49,59,69,79,89,99,109,119,129,13,..}

I think the buckets are defined as,
0 < bucket1 <= 9
9 < bucket2 <=19
19 < bucket3 <= 29 and so on.

Because, the histfrac is calculated as follows:

histfrac = (double) (bucket_current - 1) + (val - low) / (high - low);
(where bucket_current is obtained by doing a binary search on
histogram_bounds.)
histfrac /= (double) (nvalues - 1);

So, if val=low, then hisfrac = (bucket_current - 1)/num_of_buckets
which means it assumes val is included in the previous bucket.

This means that it always fails to calculate the selectivity for
lowest histogram boundary. Hence, we need adjustment only for the
first bucket.

Do you think my reasoning justifies your concern?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-07-04 Thread Kuntal Ghosh
On Tue, Jul 4, 2017 at 10:56 PM, Kuntal Ghosh
<kuntalghosh.2...@gmail.com> wrote:
> On Tue, Jul 4, 2017 at 9:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Kuntal Ghosh <kuntalghosh.2...@gmail.com> writes:
>>> On Tue, Jul 4, 2017 at 9:23 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>> ... I have to admit that I've failed to wrap my brain around exactly
>>>> why it's correct.  The arguments that I've constructed so far seem to
>>>> point in the direction of applying the opposite correction, which is
>>>> demonstrably wrong.  Perhaps someone whose college statistics class
>>>> wasn't quite so long ago can explain this satisfactorily?
>>
>>> I guess that you're referring the last case, i.e.
>>> explain analyze select * from tenk1 where thousand between 10 and 10;
>>
>> No, the thing that is bothering me is why it seems to be correct to
>> apply a positive correction for ">=", a negative correction for "<",
>> and no correction for "<=" or ">".  That seems weird and I can't
>> construct a plausible explanation for it.  I think it might be a
>> result of the fact that, given a discrete distribution rather than
>> a continuous one, the histogram boundary values should be understood
>> as having some "width" rather than being zero-width points on the
>> distribution axis.  But the arguments I tried to fashion on that
>> basis led to other rules that didn't actually work.
>>
>> It's also possible that this logic is in fact wrong and it just happens
>> to give the right answer anyway for uniformly-distributed cases.
>>
> So, here are two points I think:
> 1. When should we apply(add/subtract) the correction?
> 2. What should be the correction?
>
> The first point:
> there can be further two cases,
> a) histfrac - actual_selectivity(p<=0) = 0.
Sorry for the typo. I meant (p<=10) for all the cases.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-07-04 Thread Kuntal Ghosh
On Tue, Jul 4, 2017 at 9:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Kuntal Ghosh <kuntalghosh.2...@gmail.com> writes:
>> On Tue, Jul 4, 2017 at 9:23 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> ... I have to admit that I've failed to wrap my brain around exactly
>>> why it's correct.  The arguments that I've constructed so far seem to
>>> point in the direction of applying the opposite correction, which is
>>> demonstrably wrong.  Perhaps someone whose college statistics class
>>> wasn't quite so long ago can explain this satisfactorily?
>
>> I guess that you're referring the last case, i.e.
>> explain analyze select * from tenk1 where thousand between 10 and 10;
>
> No, the thing that is bothering me is why it seems to be correct to
> apply a positive correction for ">=", a negative correction for "<",
> and no correction for "<=" or ">".  That seems weird and I can't
> construct a plausible explanation for it.  I think it might be a
> result of the fact that, given a discrete distribution rather than
> a continuous one, the histogram boundary values should be understood
> as having some "width" rather than being zero-width points on the
> distribution axis.  But the arguments I tried to fashion on that
> basis led to other rules that didn't actually work.
>
> It's also possible that this logic is in fact wrong and it just happens
> to give the right answer anyway for uniformly-distributed cases.
>
So, here are two points I think:
1. When should we apply(add/subtract) the correction?
2. What should be the correction?

The first point:
there can be further two cases,
a) histfrac - actual_selectivity(p<=0) = 0.
For this case, I've an explanation why applying the correction in
above way works correctly. (This is the case with tenk1)
Since, histfrac correctly calculates selectivity for (p<=0),
hist_selec will either be <= or > (isgt is true). Hence, there is no
need to apply the correction. A negative correction is needed for less
than operator (sel(<=10) - sel(=10)). Similarly, a positive correction
is needed for greater than and equals to operator (sel(>10) +
sel(=10)).

b) histfrac - actual_selectivity(p<=0) != 0.
This is possible when we've high variance in the histogram buckets. I
guess here things may go wrong. Please consider the following example,
UPDATE tenk1 SET thousand=11 where thousand=10;
VACUUM ANALYZE tenk1;
explain analyze select * from tenk1 where thousand between 10 and 10;
 Bitmap Heap Scan on tenk1  (cost=4.39..39.52 rows=10 width=244)
(actual time=0.018..0.018 rows=0 loops=1)

The second point:
In this case, the correction is calculated correctly as selectivity of
(p=0) because of uniform distribution. Hence, it works. When we don't
have uniform distribution, the current calculation of the correction
may prove to be over-estimation for most of the time.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-07-04 Thread Kuntal Ghosh
On Tue, Jul 4, 2017 at 9:23 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Aside from the mind-bendingly-tedious changes in pg_operator.h, the meat
> of the patch is in selfuncs.c's ineq_histogram_selectivity(), which now
> applies a correction for equal values in the cases where we were getting
> it wrong before.  While this logic seems experimentally correct (see
> above), I have to admit that I've failed to wrap my brain around exactly
> why it's correct.  The arguments that I've constructed so far seem to
> point in the direction of applying the opposite correction, which is
> demonstrably wrong.  Perhaps someone whose college statistics class
> wasn't quite so long ago can explain this satisfactorily?
>
I guess that you're referring the last case, i.e.
explain analyze select * from tenk1 where thousand between 10 and 10;

IMHO, following are the things that I've understood.
The predicate internally got translated to predicate A (p >= 10) and
predicate B (p <=10);
In ineq_histogram_selectivity,
For predicate A, hist_selec = p
For predicate B, hist_selec = 1-p
In clauselist_selectivity,
we calculate the selectivity as = ((p) + (1 - p)) - 1= 0, rounded of to 1.0e-10.

After your changes,
In ineq_histogram_selectivity,
For predicate A, hist_selec = p + correction (since, isgt=iseq)
For predicate B, hist_selec = 1-p
In clauselist_selectivity,
we calculate the selectivity as = ((p + correction) + (1 - p)) - 1= correction,

The correction is calculated as = 1 / num_distinct_values = .001.

Since, the thousand column of tenk1 is uniformly distributed, this
turns out to be the exact selectivity. (rows = .001 * 1000 = 10)

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-04 Thread Kuntal Ghosh
On Tue, Jul 4, 2017 at 12:44 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Jul 4, 2017 at 3:25 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> 
> wrote:
>> On Mon, Jul 3, 2017 at 6:50 PM, Michael Paquier
>> <michael.paqu...@gmail.com> wrote:
>>> pg_basebackup/ with fe_recvint64() has its own way to do things, as
>>> does the large object things in libpq. I would think that at least on
>>> HEAD things could be gathered with a small set of routines. It is
>>> annoying to have a third copy of the same thing. OK that's not
>>> invasive but src/common/ would be a nice place to put things.
>>>
>>> -if (PQgetlength(res, 0, 1) != sizeof(int32))
>>> +if (PQgetlength(res, 0, 1) != sizeof(long long int))
>>> This had better be int64.
>>
>> Thank you. I'll do the changes and submit the revised patch. I've
>> added an entry in commitfest for the same.
>
> Yeah... I was halfway into writing a patch myself. As it seems that
> you are planning to submit a new version, and because it won't be nice
> to step on your toes, I'll wait for your updated version.
>
I've not yet started the patch and it may take some time for me to
understand and write
the patch in a correct way. Since, you've almost written the patch,
IMHO, please go ahead
and submit the patch. I'll happily review and test it. :-)

Thanks for the notes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-04 Thread Kuntal Ghosh
On Mon, Jul 3, 2017 at 6:50 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> pg_basebackup/ with fe_recvint64() has its own way to do things, as
> does the large object things in libpq. I would think that at least on
> HEAD things could be gathered with a small set of routines. It is
> annoying to have a third copy of the same thing. OK that's not
> invasive but src/common/ would be a nice place to put things.
>
> -if (PQgetlength(res, 0, 1) != sizeof(int32))
> +if (PQgetlength(res, 0, 1) != sizeof(long long int))
> This had better be int64.

Thank you. I'll do the changes and submit the revised patch. I've
added an entry in commitfest for the same.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-04 Thread Kuntal Ghosh
On Tue, Jul 4, 2017 at 4:12 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Jul 4, 2017 at 4:27 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>>> On 7/3/17 09:53, Tom Lane wrote:
>>>> Hm.  Before we add a bunch of code to deal with that, are we sure we
>>>> *want* it to copy such files?  Seems like that's expending a lot of
>>>> data-transfer work for zero added value --- consider e.g. a server
>>>> with a bunch of old core files laying about in $PGDATA.  Given that
>>>> it's already excluded all database-data-containing files, maybe we
>>>> should just set a cap on the plausible size of auxiliary files.
>>
>>> It seems kind of lame to fail on large files these days, even if they
>>> are not often useful in the particular case.
>>
>> True.  But copying useless data is also lame.
>
> We don't want to complicate pg_rewind code with filtering
> capabilities, so if the fix is simple I think that we should include
> it and be done. That will avoid future complications as well.
>
Yeah, I agree. In the above case, it's a core dump file. So, copying
it to master seems to be of no use. But, even if we add some filtering
capabilities, it's difficult to decide which files to skip and which
files to copy.

>>> Also, most of the segment and file sizes are configurable, and we have
>>> had reports of people venturing into much larger file sizes.
>>
>> But if I understand the context correctly, we're not transferring relation
>> data files this way anyway.  If we do transfer WAL files this way, we
>> could make sure to set the cutoff larger than the WAL segment size.
>
> WAL segments are not transferred. Only the WAL data of the the target
> data folder is gone through to find all the blocks that have been
> touched from the last checkpoint before WAL forked.
>
> Now, I think that this is broken for relation files higher than 2GB,
> see fetch_file_range where the begin location is an int32.
> --
Okay. So, if the relation block size differs by 2GB or more between
the source and target directory, we've a problem.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Error while copying a large file in pg_rewind

2017-07-03 Thread Kuntal Ghosh
Hello all,

pg_rewind throws the following error when there is a file of large
size available in the Slave server's data directory.

unexpected result while sending file list: ERROR:  value "214800"
is out of range for type integer
CONTEXT:  COPY fetchchunks, line 2402, column begin: "214800"

How to reproduce

1. Set up replication between Server A(master) and Server B(slave)

2. Promote the slave server(Server B )

3. Stop the old master (Server A)

4. Create a large file in the newly promoted master's (Server B) data
directory using the below command

 dd if=/dev/zero of=large.file bs=1024 count=400

 [root@localhost data]# dd if=/dev/zero of=large.file bs=1024 count=400
400+0 records in
400+0 records out
409600 bytes (4.1 GB) copied, 8.32263 s, 492 MB/s

5. Execute pg_rewind command from old master(server A)

 ./pg_rewind -D /home/enterprisedb/master/ --debug --progress
--source-server="port=5661 user=enterprisedb dbname=edb"

IMHO, it seems to be a bug in pg_rewind.

As mentioned in pg_rewind documentation, there are few files which are
copied in whole.

"Copy all other files such as pg_xact and configuration files from the
source cluster to the target cluster (everything except the relation
files)." -- https://www.postgresql.org/docs/devel/static/app-pgrewind.html

Those files are copied in max CHUNKSIZE(default 100) bytes at a
time. In the process, pg_rewind creates a table with the following
schema and loads information about blocks that need to be copied.

CREATE TEMPORARY TABLE fetchchunks(path text, begin int4, len int4);

postgres=# select * from fetchchunks where begin != 0;
  path   |  begin   |   len
-+--+-
 pg_wal/00010002 |  100 | 100
 pg_wal/00010002 |  200 | 100
 pg_wal/00010002 |  300 | 100
 pg_wal/00010002 |  400 | 100
..
and so on.

The range for begin is between -2147483648 to +2147483647. For a 4GB
file, begin definitely goes beyond 2147483647 and it throws the
following error:
unexpected result while sending file list: ERROR:  value "214800"
is out of range for type integer
CONTEXT:  COPY fetchchunks, line 2659, column begin: "214800"

I guess we've to change the data type to bigint. Also, we need some
implementation of ntohl() for 8-byte data types. I've attached a
script to reproduce the error and a draft patch.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


standby-server-setup.sh
Description: Bourne shell script


fix_copying_large_file_pg_rewind_v1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-23 Thread Kuntal Ghosh
On Fri, Jun 23, 2017 at 3:01 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Thu, Jun 22, 2017 at 4:29 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>>> <kuntalghosh.2...@gmail.com> wrote:
>>>>> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>>>>>
>>>> Okay. Is there any particular scenario you've in mind where this may fail?
>>>
>>> It's not about failure, but about the abstraction.  When we are using
>>> the DSA we should not directly access the DSM which is under DSA.
>>>
>> Okay. I thought that I've found at least one usage of
>> dsm_find_mapping() in the code. :-)
>>
>> But, I've some more doubts.
>> 1. When should we use dsm_find_mapping()? (The first few lines of
>> dsm_attach is same as dsm_find_mapping().)
>> 2. As a user of dsa, how should we check whether my dsa handle is
>> already attached? I guess this is required because, if a user tries to
>> re-attach a dsa handle,  it's punishing the user by throwing an error
>> and the user wants to avoid such errors.
>
> I thought about this when designing the DSA API.  I couldn't think of
> any good reason to provide an 'am-I-already-attached?' function
> equivalent to dsm_find_mapping.  It seemed to me that the client code
> shouldn't ever be in any doubt about whether it's attached, and that
> wilfully or absent-mindedly throwing away dsa_area pointers and having
> to ask for them again doesn't seem like a very good design.  I suspect
> the same applies to dsm_find_mapping, and I don't see any callers in
> the source tree or indeed anywhere on the internet (based on a quick
> Google search).  But I could be missing something.
>
Thanks a lot for the clarification.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh
On Thu, Jun 22, 2017 at 9:48 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Thu, Jun 22, 2017 at 1:29 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>>> <kuntalghosh.2...@gmail.com> wrote:
>>>>> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>>>>>
>>>> Okay. Is there any particular scenario you've in mind where this may fail?
>>>
>>> It's not about failure, but about the abstraction.  When we are using
>>> the DSA we should not directly access the DSM which is under DSA.
>>>
>> Okay. I thought that I've found at least one usage of
>> dsm_find_mapping() in the code. :-)
>>
>> But, I've some more doubts.
>> 1. When should we use dsm_find_mapping()? (The first few lines of
>> dsm_attach is same as dsm_find_mapping().)
>> 2. As a user of dsa, how should we check whether my dsa handle is
>> already attached? I guess this is required because, if a user tries to
>> re-attach a dsa handle,  it's punishing the user by throwing an error
>> and the user wants to avoid such errors.
>
> From a logical point of view, there is nothing preventing the use of
> dsm_find_mapping() on a DSA handle, still the API layering looks wrong
> if you want to check for an existing mapping. So why not defining a
> new API, like dsa_find_mapping() that just wraps dsm_find_mapping()
> but has its own error handling? This would offer more flexibility for
> the future.
Yeah. That sounds reasonable. Or, dsa_attach can throw error conditionally.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Kuntal Ghosh
On Thu, Jun 22, 2017 at 6:46 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Wed, Jun 21, 2017 at 4:57 PM, jasrajd <jasr...@microsoft.com> wrote:
>> We are also seeing contention on the walwritelock and repeated writes to the
>> same offset if we move the flush outside the lock in the Azure environment.
>> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
>> bandwidth. Is there more work being done in this area?
>
> As of now, there is no patch in the development queue for Postgres 11
> that is dedicated to this particularly lock contention. There is a
> patch for LWlocks in general with power PC, but that's all:
> https://commitfest.postgresql.org/14/984/
>
> Not sure if Kuntal has plans to submit again this patch. It is
> actually a bit sad to not see things moving on and use an approach to
> group flushes.
As of now, I've no plans to re-submit the patch. Actually, I'm not
sure what I should try next. I would love to get some advice/direction
regarding this.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Kuntal Ghosh
On Thu, Dec 22, 2016 at 11:29 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
>
> How do these counts compare to the other wait events? For example
> CLogControlLock, which is what Amit's patch [1] is about?
>
> [1]
> https://www.postgresql.org/message-id/flat/84c22fbb-b9c4-a02f-384b-b4feb2c67193%402ndquadrant.com
>
Hello Tomas,

I'm really sorry for this late reply. I've somehow missed the thread.
Actually, I've seen some performance improvement with the
CLogControlLock patch. But, then it turned out all the improvements
were because of the CLogControlLock patch alone.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh
On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>>> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>>>
>> Okay. Is there any particular scenario you've in mind where this may fail?
>
> It's not about failure, but about the abstraction.  When we are using
> the DSA we should not directly access the DSM which is under DSA.
>
Okay. I thought that I've found at least one usage of
dsm_find_mapping() in the code. :-)

But, I've some more doubts.
1. When should we use dsm_find_mapping()? (The first few lines of
dsm_attach is same as dsm_find_mapping().)
2. As a user of dsa, how should we check whether my dsa handle is
already attached? I guess this is required because, if a user tries to
re-attach a dsa handle,  it's punishing the user by throwing an error
and the user wants to avoid such errors.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh
On Wed, Jun 21, 2017 at 7:07 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Wed, Jun 21, 2017 at 6:50 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> I think we can just check dsm_find_mapping() to check whether the dsm
>> handle is already attached. Something like,
>>
>> }
>> -   else
>> +   else if(!dsm_find_mapping(AutoVacuumShmem->av_dsa_handle))
>> {
>> AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
>> dsa_pin_mapping(AutoVacuumDSA);
>>
>> Thoughts?
>
> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>
Okay. Is there any particular scenario you've in mind where this may fail?



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Kuntal Ghosh
On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata <nag...@sraoss.co.jp> wrote:
>
> Attached is a patch for the documentation fix.
>
Please attach the patch as well. :-)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh
On Wed, Jun 21, 2017 at 5:45 PM, Yugo Nagata <nag...@sraoss.co.jp> wrote:
> Hi,
>
> As I report in another thread[1], I found the autovacuum launcher occurs
> the following error in PG 10 when this received SIGINT. I can repuroduce
> this by pg_cancel_backend or `kill -2 `.
>
> 2017-06-21 13:56:07.010 JST [32483] ERROR:  canceling statement due to user 
> request
> 2017-06-21 13:56:08.022 JST [32483] ERROR:  can't attach the same segment 
> more than once
> 2017-06-21 13:56:09.034 JST [32483] ERROR:  can't attach the same segment 
> more than once
> 2017-06-21 13:56:10.045 JST [32483] ERROR:  can't attach the same segment 
> more than once
> ...
>
> This errors continue until this process is terminated or the server is 
> restarted.
>
> When SIGINT is issued, the process exits from the main loop and returns
> to sigsetjmp, and calls dsa_attach() before entering into the loop again,
> this causes the error.
>
> We can fix it by calling dsa_attach() before sigsetjmp. Attached is the patch.
>
I think we can just check dsm_find_mapping() to check whether the dsm
handle is already attached. Something like,

}
-   else
+   else if(!dsm_find_mapping(AutoVacuumShmem->av_dsa_handle))
{
AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
    dsa_pin_mapping(AutoVacuumDSA);

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-06-03 Thread Kuntal Ghosh
On Sat, Jun 3, 2017 at 8:53 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/2/17 15:08, Peter Eisentraut wrote:
>> On 5/30/17 23:10, Peter Eisentraut wrote:
>>> Here is a proposed solution that splits bgw_name into bgw_type and
>>> bgw_name_extra.  bgw_type shows up in pg_stat_activity.backend_type.
>>> Uses of application_name are removed, because they are no longer
>>> necessary to identity the process type.
>>>
>>> This code appears to be buggy because I sometimes get NULL results of
>>> the backend_type lookup, implying that it couldn't find the background
>>> worker slot.  This needs another look.
>>
AFAICU, when we register a background worker using
RegisterDynamicBackgroundWorker, it's not included in
BackgroundWorkerList(which is postmaster's list of registered
background workers, in private memory). Instead, we use only
BackgroundWorkerSlots. Perhaps, this is the reason that backend_type
is NULL for parallel workers.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-06-03 Thread Kuntal Ghosh
On Sat, Jun 3, 2017 at 2:14 AM, Petr Jelinek
<petr.jeli...@2ndquadrant.com> wrote:
> On 02/06/17 21:05, Peter Eisentraut wrote:
>> On 6/2/17 02:31, Masahiko Sawada wrote:
>>> I'd say current patch makes the user difficult to
>>> distinguish between apply worker and table sync worker.
>>
>> We could arguably make apply workers and sync workers have different
>> bgw_type values.  But if you are interested in that level of detail, you
>> should perhaps look at pg_stat_subscription.  pg_stat_activity only
>> contains the "common" data, and the process-specific data is in other views.
>>
>
> Agreed with this.
>
> However, I am not sure about the bgw_name_extra. I think I would have
> preferred keeping full bgw_name field which would be used where full
> name is needed and bgw_type where only the worker type is used. The
> concatenation just doesn't sit well with me, especially if it requires
> the bgw_name_extra to start with space.
>
+1.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-06-01 Thread Kuntal Ghosh
On Thu, Jun 1, 2017 at 6:56 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 5/31/17 02:17, Kuntal Ghosh wrote:
>> On Wed, May 31, 2017 at 12:58 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>>
>>> I'd say we can fix this issue by just changing the query. Attached
>>> patch changes the query so that it can handle publication name
>>> correctly, the query gets complex, though.
>>>
>> In is_publishable_class function, there are four conditions to decide
>> whether this is a publishable class or not.
>>
>> 1. relkind == RELKIND_RELATION
>> 2. IsCatalogClass()
>> 3. relpersistence == 'p'
>> 4. relid >= FirstNormalObjectId /* Skip tables created during initdb */
>>
>> I think the modified query should have a check for the fourth condition as 
>> well.
>
> The query should be fixed like in the attached patch.
> pg_get_publication_tables() ends up calling is_publishable_class()
> internally.
>
Works fine.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-05-31 Thread Kuntal Ghosh
On Wed, May 31, 2017 at 12:58 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> I'd say we can fix this issue by just changing the query. Attached
> patch changes the query so that it can handle publication name
> correctly, the query gets complex, though.
>
In is_publishable_class function, there are four conditions to decide
whether this is a publishable class or not.

1. relkind == RELKIND_RELATION
2. IsCatalogClass()
3. relpersistence == 'p'
4. relid >= FirstNormalObjectId /* Skip tables created during initdb */

I think the modified query should have a check for the fourth condition as well.

Added to open items.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create subscription with `create_slot=false` and incorrect slot name

2017-05-23 Thread Kuntal Ghosh
On Tue, May 23, 2017 at 10:56 AM, Euler Taveira <eu...@timbira.com.br> wrote:
> 2017-05-22 17:52 GMT-03:00 Dmitry Dolgov <9erthali...@gmail.com>:
>>
>> Maybe this question was already raised before, but I couldn't find a
>> discussion. When I'm creating a subscription with `create_slot=false`
>> looks
>> like it's possible to pass a slot name with invalid characters. In this
>> particular case both publisher and subscriber were on the same machine:
>>
>> =# CREATE SUBSCRIPTION test_sub CONNECTION 'dbname=db host=host
>> port=port user=user' PUBLICATION test_pub WITH (create_slot=false,
>> slot_name='test slot');
>> NOTICE:  0: synchronized table states
>> LOCATION:  CreateSubscription, subscriptioncmds.c:443
>> CREATE SUBSCRIPTION
>> Time: 5.887 ms
>>
> The command succeed even if slot_name is invalid. That's because slot_name
> isn't validated. ReplicationSlotValidateName() should be called in
> parse_subscription_options() to avoid a pilot error. IMHO we should prevent
> a future error (use of invalid slot name).
>
+1. Since, slot_name can be provided even when create_slot is set
false, it should be validated as well while creating the subscription.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump ignoring information_schema tables which used in Create Publication.

2017-05-22 Thread Kuntal Ghosh
On Mon, May 22, 2017 at 5:22 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
> On 05/22/2017 05:12 PM, Kuntal Ghosh wrote:
>>
>> pg_dump ignores anything created under object name "pg_*" or
>> "information_schema".
>
> In this below scenario  , I am able to see - pg_dump catch the information
> of table which is created under information_schema
>
> --
> -- Name: e1; Type: VIEW; Schema: public; Owner: centos
> --
>
> CREATE VIEW e1 AS
>  SELECT abc.n
>FROM information_schema.abc;
> 
>
The view is created in public schema. Hence, you're able to take a
dump for the same. However, you'll probably get an error saying
"relation information_schema.abc doesn't exist" while restoring the
dump.

For publications, the create definition(CREATE PUBLICATION) and
addition of tables(ALTER publication ADD TABLE) are separated. Hence,
it can skip all the relations created under information_schema or
anything under "pg_*" schema.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-22 Thread Kuntal Ghosh
On Mon, May 22, 2017 at 2:54 PM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> On Wed, May 17, 2017 at 2:57 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Tue, May 16, 2017 at 2:14 PM, Ashutosh Bapat
>> <ashutosh.ba...@enterprisedb.com> wrote:
>>> On Mon, May 15, 2017 at 9:23 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>>
>>> Also, looking at the patch, it doesn't look like it take enough care
>>> to build execution state of new worker so that it can participate in a
>>> running query. I may be wrong, but the execution state initialization
>>> routines are written with the assumption that all the workers start
>>> simultaneously?
>>>
>>
>> No such assumptions, workers started later can also join the execution
>> of the query.
>>
> If we are talking of run-time allocation of workers I'd like to
> propose an idea to safeguard parallelism from selectivity-estimation
> errors. Start each query (if it qualifies for the use of parallelism)
> with a minimum number of workers (say 2) irrespective of the #planned
> workers. Then as query proceeds and we find that there is more work to
> do, we allocate more workers.
>
> Let's get to the details a little, we'll have following new variables,
> - T_int - a time interval at which we'll periodically check if the
> query requires more workers,
> - work_remaining - a variable which estimates the work yet to do. This
> will use the selectivity estimates to find the total work done and the
> remaining work accordingly. Once, the actual number of rows crosses
> the estimated number of rows, take maximum possible tuples for that
> operator as the new estimate.
>
> Now, we'll check at gather, after each T_int if the work is remaining
> and allocate another 2 (say) workers. This way we'll keep on adding
> the workers in small chunks and not in one go. Thus, saving resources
> in case over-estimation is done.
>
I understand your concern about selectivity estimation error which
affects the number of workers planned as well. But, in that case, I
would like to fix the optimizer so that it calculates the number of
workers correctly. If the optimizer thinks that we should start with n
number of workers, probably we SHOULD start with n number of workers.

However, error in selectivity estimation(The root of all evil, the
Achilles Heel of query optimization, according to Guy Lohman et al.
:)) can always prove the optimizer wrong. In that case, +1 for your
suggested approach of dynamically add or kill some workers based on
the estimated work left to do.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump ignoring information_schema tables which used in Create Publication.

2017-05-22 Thread Kuntal Ghosh
Hello,

pg_dump ignores anything created under object name "pg_*" or
"information_schema". I guess you will not have any "CREATE TABLE"
definition  as well for information_schema.abc. Related code:

else if (strncmp(nsinfo->dobj.name, "pg_", 3) == 0 ||
 strcmp(nsinfo->dobj.name, "information_schema") == 0)
{
/* Other system schemas don't get dumped */
nsinfo->dobj.dump_contains = nsinfo->dobj.dump = DUMP_COMPONENT_NONE;
}

Hence, there is no point of creating publication for it in the dump.

On Mon, May 22, 2017 at 4:22 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
> Hi,
>
> pg_dump is ignoring tables which created under information_schema schema
> for  CREATE PUBLICATION .
>
> postgres=# create database  test;
> CREATE DATABASE
> postgres=# \c test
> You are now connected to database "test" as user "centos".
> test=# create table information_schema.abc(n int);
> CREATE TABLE
> test=# create publication test for table information_schema.abc;
> CREATE PUBLICATION
> test=# select * from pg_publication_tables;
>  pubname | schemaname | tablename
> -++---
>  test| information_schema | abc
> (1 row)
>
> test=# \q
> [centos@centos-cpula regress]$ pg_dump -Fp  test > /tmp/a.a
> [centos@centos-cpula regress]$ cat /tmp/a.a|grep publication -i
> -- Name: test; Type: PUBLICATION; Schema: -; Owner: centos
> CREATE PUBLICATION test WITH (publish = 'insert, update, delete');
> ALTER PUBLICATION test OWNER TO centos;
> [centos@centos-cpula regress]$
>
> --
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-05-22 Thread Kuntal Ghosh
Yeah, it's a bug. While showing the table definition, we use the
following query for showing the related publications:
"SELECT pub.pubname\n"
  " FROM pg_catalog.pg_publication pub\n"
  " LEFT JOIN pg_catalog.pg_publication_rel pr\n"
  "  ON (pr.prpubid = pub.oid)\n"
  "WHERE pr.prrelid = '%s' OR pub.puballtables\n"
  "ORDER BY 1;"

When pub.puballtables is TRUE, we should also check whether the
relation is publishable or not.(Something like is_publishable_class
method in pg_publication.c).

However, I'm not sure whether this is the correct way to solve the problem.

On Mon, May 22, 2017 at 2:39 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
> Hi,
>
> I observed that - "create publication..all tables" ignore 'partition not
> supported' error
>
> \\create a partition table
>
> You are now connected to database "s" as user "centos".
> s=# CREATE TABLE measurement (
> s(# city_id int not null,
> s(# logdate date not null,
> s(# peaktempint,
> s(# unitsales   int
> s(# ) PARTITION BY RANGE (logdate);
> CREATE TABLE
> s=#
>
> \\try to publish only this table
>
> s=# create publication p for table  measurement;
> ERROR:  "measurement" is a partitioned table
> DETAIL:  Adding partitioned tables to publications is not supported.
> HINT:  You can add the table partitions individually.
>
> \\try to create publication for all tables
> s=# create publication p for all tables ;
> CREATE PUBLICATION
> s=# \d+ measurement
>  Table "public.measurement"
>   Column   |  Type   | Collation | Nullable | Default | Storage | Stats
> target | Description
> ---+-+---+--+-+-+--+-
>  city_id   | integer |   | not null | | plain |
> |
>  logdate   | date|   | not null | | plain |
> |
>  peaktemp  | integer |   |  | | plain |
> |
>  unitsales | integer |   |  | | plain |
> |
> Partition key: RANGE (logdate)
> Publications:
> "p"
>
> Publication 'p' has been set against partition table ,which is not
> supported.
>
> --
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-16 Thread Kuntal Ghosh
Added to open item lists.

On Tue, May 16, 2017 at 6:35 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, May 15, 2017 at 8:09 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
>> <kuntalghosh.2...@gmail.com> wrote:
>>> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>>> wrote:
>>>> On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ah...@enterprisedb.com> 
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> Server Crashes if we try to provide slot_name='none' at the time of 
>>>>> creating
>>>>> subscription -
>>>>>
>>>>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>>>>> user=centos password=f' publication abc with (slot_name='none');
>>>>> NOTICE:  synchronized table states
>>>>> server closed the connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> The connection to the server was lost. Attempting reset: Failed.
>>>>> !>
>>>>>
>>>>
>>>> Thank you for reporting.
>>>> I think create_slot and enabled should be set to false forcibly when
>>>> slot_name = 'none'. Attached patch fixes it, more test and regression
>>>> test case are needed though.
>>>>
>>> I think the patch doesn't solve the problem completely. For example,
>>> postgres=# create subscription sub3 connection 'dbname=postgres
>>> port=5432 user=edb password=edb' publication abc with
>>> (slot_name='none',create_slot=true);
>>> ERROR:  slot_name = NONE and create slot are mutually exclusive options
>>> postgres=# create subscription sub3 connection 'dbname=postgres
>>> port=5432 user=edb password=edb' publication abc with
>>> (create_slot=true,slot_name='none');
>>> ERROR:  could not connect to the publisher: could not connect to
>>> server: Connection refused
>>> Is the server running locally and accepting
>>> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>>>
>>> Changing the order of subscription parameter changes the output. I
>>> think the modifications should be done at the end of the function.
>>> Attached a patch for the same.
>>>
>>
>> Yes, you're patch fixes it correctly.
>>
>
> I've updated Kuntal's patch, added regression test for option
> combination and updated documentation.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 5:04 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, May 15, 2017 at 8:22 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>
>> While testing with logical replication, I've found that the server
>> hangs if we create a subscription to the same server. For example,
>>
>> $ ./pg_ctl -D Test start -o "-p 5432"
>> $ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
>> with (publish='delete');"
>> $ ./psql -p 5432 postgres -c "create subscription sub connection
>> 'dbname=postgres port=5432 user=edb password=edb' publication abc with
>> (slot_name='abcslot');"
>> NOTICE:  synchronized table states
>> LOG:  logical decoding found initial starting point at 0/162DF18
>> DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.
>>
>> And, it hangs. Is this an expected behavior?
>
> I guess this is what we're discussing on [1].
> Petr answered on that,
> ===
> Yes that's result of how logical replication slots work, the transaction
> that needs to finish is your transaction. It can be worked around by
> creating the slot manually via the SQL interface for example and create
> the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
> ===
>
> [1] https://www.postgresql.org/message-id/20170426165954.GK14000%40momjian.us
>
Ohh I see. Thank you.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:

While testing with logical replication, I've found that the server
hangs if we create a subscription to the same server. For example,

$ ./pg_ctl -D Test start -o "-p 5432"
$ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
with (publish='delete');"
$ ./psql -p 5432 postgres -c "create subscription sub connection
'dbname=postgres port=5432 user=edb password=edb' publication abc with
(slot_name='abcslot');"
NOTICE:  synchronized table states
LOG:  logical decoding found initial starting point at 0/162DF18
DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.

And, it hangs. Is this an expected behavior?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
>> Hi,
>>
>> Server Crashes if we try to provide slot_name='none' at the time of creating
>> subscription -
>>
>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>> user=centos password=f' publication abc with (slot_name='none');
>> NOTICE:  synchronized table states
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>
> Thank you for reporting.
> I think create_slot and enabled should be set to false forcibly when
> slot_name = 'none'. Attached patch fixes it, more test and regression
> test case are needed though.
>
I think the patch doesn't solve the problem completely. For example,
postgres=# create subscription sub3 connection 'dbname=postgres
port=5432 user=edb password=edb' publication abc with
(slot_name='none',create_slot=true);
ERROR:  slot_name = NONE and create slot are mutually exclusive options
postgres=# create subscription sub3 connection 'dbname=postgres
port=5432 user=edb password=edb' publication abc with
(create_slot=true,slot_name='none');
ERROR:  could not connect to the publisher: could not connect to
server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Changing the order of subscription parameter changes the output. I
think the modifications should be done at the end of the function.
Attached a patch for the same.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


fix_bug_of_parse_option_v1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create publication syntax is not coming properly in pg_dump / pg_dumpall

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 3:06 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, May 15, 2017 at 5:04 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
>> Hi,
>>
>> I observed that in pg_dump/pg_dumpall - 'create publication' syntax is not
>> coming properly if only specified value  is mentioned in publish.
>>
>> Testcase to reproduce -
>>
>> \\create a publication
>>
>> postgres=# CREATE PUBLICATION abc for all tables with (publish='insert');
>> CREATE PUBLICATION
>>
>> \\take the plain dump
>>
>> [centos@centos-cpula bin]$ ./pg_dump -FP -p 5000 postgres  > /tmp/a.a
>>
>> \\check the syntax
>>
>> [centos@centos-cpula bin]$ cat /tmp/a.a |grep 'create publication abc' -i
>> CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, , ');
>>
>> \\try to execute the same syntax against psql terminal
>>
>> postgres=# CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, ,
>> ');
>> ERROR:  invalid publish list
>>
>> Same is valid for pg_dumpall as well..
>>
>
> Thank you for reporting.
>
> Hm, It's a bug of pg_dump. Attached patch should fix both pg_dump and
> pg_dumpall.
>
I've reproduced the bug and the patch fixes the issue for me. Also,
tested with different combinations of insert, delete and update.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-04-11 Thread Kuntal Ghosh
On Wed, Apr 12, 2017 at 6:02 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Apr 11, 2017 at 8:11 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> On Wed, Apr 12, 2017 at 3:40 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> I notice looking at pg_stat_activity that the logical replication launcher
>>> sets its application_name to "logical replication launcher".  This seems
>>> inconsistent (no other standard background process sets application_name),
>>> redundant with other columns that already tell you what it is, and an
>>> unreasonable consumption of horizontal space in the tabular output.
>>> Can we drop that?  If we do have to have something like that, what about
>>> putting it in the "query" field where it's much less likely to be
>>> substantially wider than any other entry in the column?
>>
>> It seems to me that the logic behind that is to be able to identify
>> easily the logical replication launcher in pg_stat_activity, so using
>> the query field instead sounds fine for me as we need another way than
>> backend_type to guess what is this bgworker.
>
> Wait, why do we need two ways?
>
For backend_type=background worker, application_name shows the name of
the background worker (BackgroundWorker->bgw_name). I think we need
some way to distinguish among different background workers. But,
application_name may not be the correct field to show the information.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Kuntal Ghosh
On Tue, Apr 11, 2017 at 2:36 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri <nehakhat...@gmail.com> wrote:
>> On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>> 1. Forget BGW_NEVER_RESTART workers in
>>> ResetBackgroundWorkerCrashTimes() rather than leaving them around to
>>> be cleaned up after the conclusion of the restart, so that they go
>>> away before rather than after shared memory is reset.
>>
+1 for the solution.

>> Now with this, would it still be required to forget BGW_NEVER_RESTART
>> workers in maybe_start_bgworker():
>>
>> if (rw->rw_crashed_at != 0)
>> {
>> if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
>> {
>> ForgetBackgroundWorker();
>> continue;
>> }
>>  ..
>> }
>
> Well, as noted before, not every background worker failure leads to a
> crash-and-restart; for example, a non-shmem-connected worker or one
> that exits with status 1 will set rw_crashed_at but will not cause a
> crash-and-restart cycle.   It's not 100% clear to me whether the code
> you're talking about can be reached in those cases, but I wouldn't bet
> against it.
I think that for above-mentioned background workers, we follow a
different path to call ForgetBackgroundWorker().
CleanupBackgroundWorker()
- ReportBackgroundWorkerExit()
 - ForgetBackgroundWorker()
But, I'm not sure for any other cases.

Should we include the assert statement as well?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Kuntal Ghosh
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhat...@gmail.com> wrote:
>> The problem here seem to be the change in the max_parallel_workers value
>> while the parallel workers are still under execution. So this poses two
>> questions:
>>
>> 1. From usecase point of view, why could there be a need to tweak the
>> max_parallel_workers exactly at the time when the parallel workers are at
>> play.
>> 2. Could there be a restriction on tweaking of max_parallel_workers while
>> the parallel workers are at play? At least do not allow setting the
>> max_parallel_workers less than the current # of active parallel workers.
>
> Well, that would be letting the tail wag the dog.  The maximum value
> of max_parallel_workers is only 1024, and what we're really worried
> about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
> daylight.  How about just creating a #define that's used by guc.c as
> the maximum for the GUC, and here we assert that we're <= that value?
>
I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;

Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.

I've attached both the patches for better accessibility. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-parallel-worker-counts-after-a-crash.patch
Description: binary/octet-stream


0002-Add-GUC-for-max_parallel_workers-upper-limit.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-06 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
>>> I'm probably missing something, but I don't quite understand how these
>>> values actually survive the crash. I mean, what I observed is OOM followed
>>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>>> some reason?
>> AFAICU, during crash recovery, we wait for all non-syslogger children
>> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
>> StartupDataBase. While starting the startup process we check if any
>> bgworker is scheduled for a restart.
>>
>
> In general, your theory appears right, but can you check how it
> behaves in standby server because there is a difference in how the
> startup process behaves during master and standby startup?  In master,
> it stops after recovery whereas in standby it will keep on running to
> receive WAL.
>
While performing StartupDatabase, both master and standby server
behave in similar way till postmaster spawns startup process.
In master, startup process completes its job and dies. As a result,
reaper is called which in turn calls maybe_start_bgworker().
In standby, after getting a valid snapshot, startup process sends
postmaster a signal to enable connections. Signal handler in
postmaster calls maybe_start_bgworker().
In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
forget the bgworker process.

I've attached the patch for adding an argument in
ForgetBackgroundWorker() to indicate a crashed situation. Based on
that, we can take the necessary actions. I've not included the Assert
statement in this patch.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-parallel-worker-counts-after-a-crash_v1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>>> Did you intend to attach that patch to this email?
>>>
>> Actually, I'm confused how we should ensure (register_count >
>> terminate_count) invariant. I think there can be a system crash what
>> Tomas has suggested up in the thread.
>>
>> Assert(parallel_register_count - parallel_terminate_count <=
>> max_parallel_workers);
>> Backend 1 > SET max_parallel_worker = 8;
>> Backend 1 > Execute a long running parallel query q1 with number of
>> parallel worker spawned is say, 4.
>
> At this point, parallel_register_count should be equal to
> parallel_terminate_count.  4 workers were started, and 4 have
> terminated.
>
Actually, I'm referring to the case when q1 is still running. In that
case, parallel_register_count = 4 and parallel_terminate_count = 0.

>> Backend 2> SET max_parallel_worker = 3;
Now, parallel_register_count - parallel_terminate_count = 4 >
max_parallel_worker.

>> Backend 2 > Try to execute any parallel query q2 with number of
>> parallel worker spawned > 0.
>
Hence, the assert will fail here.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> 
> wrote:
>> Yes. But, as Robert suggested up in the thread, we should not use
>> (parallel_register_count = 0) as an alternative to define a bgworker
>> crash. Hence, I've added an argument named 'wasCrashed' in
>> ForgetBackgroundWorker to indicate a bgworker crash.
>
> Did you intend to attach that patch to this email?
>
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.

Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.

The above assert statement will bring down the server unnecessarily
while executing q2. If the assert statement was not there, it could
have gone ahead without launching any workers.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
>>>
>>> The comment says that the counters are allowed to overflow, i.e. after a
>>> long uptime you might get these values
>>>
>>>  parallel_register_count = UINT_MAX + 1 = 1
>>>  parallel_terminate_count = UINT_MAX
>>>
>>> which is fine, because the C handles the overflow during subtraction and
>>> so
>>>
>>>  parallel_register_count - parallel_terminate_count = 1
>>>
>>> But the assert is not doing subtraction, it's comparing the values
>>> directly:
>>>
>>>  Assert(parallel_register_count >= parallel_terminate_count);
>>>
>>> and the (perfectly valid) values trivially violate this comparison.
>>>
>> Thanks for the explanation. So, we can't use the above assert
>> statement. Even the following assert statement will not be helpful:
>> Assert(parallel_register_count - parallel_terminate_count >= 0);
>> Because, it'll fail to track the scenario when parallel_register_count
>> is not overflowed, still less than parallel_terminate_count. :(
>>
>
> Actually, that assert would work, because C does handle overflows on uint
> values during subtraction just fine. That is,
>
> (UINT_MAX+x) - UINT_MAX = x
>
> Also, the comment about overflows before BackgroundWorkerArray claims this
> is the case.
>
Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:

Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */

Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
>
>
> On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
>>
>> AFAICU, during crash recovery, we wait for all non-syslogger children
>> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
>> StartupDataBase. While starting the startup process we check if any
>> bgworker is scheduled for a restart. If a bgworker is crashed and not
>> meant for a restart(parallel worker in our case), we call
>> ForgetBackgroundWorker() to discard it.
>>
>
> OK, so essentially we reset the counters, getting
>
> parallel_register_count = 0
> parallel_terminate_count = 0
>
> and then ForgetBackgroundWworker() runs and increments the terminate_count,
> breaking the logic?
>
> And you're using (parallel_register_count > 0) to detect whether we're still
> in the init phase or not. Hmmm.
>
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.

>>>
>>> In any case, the comment right before BackgroundWorkerArray says this:
>>>
>>>   * These counters can of course overflow, but it's not important here
>>>   * since the subtraction will still give the right number.
>>>
>>> which means that this assert
>>>
>>> +   Assert(BackgroundWorkerData->parallel_register_count >=
>>> +   BackgroundWorkerData->parallel_terminate_count);
>>>
>>> is outright broken, just like any other attempts to rely on simple
>>> comparisons of these two counters, no?
>>>
>> IIUC, the assert statements ensures that register count should always
>> be greater than or equal to the terminate count.
>> Whereas, the comment refers to the fact that register count and
>> terminate count indicate the total number of parallel workers spawned
>> and terminated respectively since the server has been started. At
>> every session, we're not resetting the counts, hence they can
>> overflow. But, their substraction should give you the number of active
>> parallel worker at any instance.
>> So, I'm not able to see how the assert is broken w.r.t the aforesaid
>> comment. Am I missing something here?
>>
>
> The comment says that the counters are allowed to overflow, i.e. after a
> long uptime you might get these values
>
> parallel_register_count = UINT_MAX + 1 = 1
> parallel_terminate_count = UINT_MAX
>
> which is fine, because the C handles the overflow during subtraction and so
>
> parallel_register_count - parallel_terminate_count = 1
>
> But the assert is not doing subtraction, it's comparing the values directly:
>
> Assert(parallel_register_count >= parallel_terminate_count);
>
> and the (perfectly valid) values trivially violate this comparison.
>
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(

It seems to me the only thing we can make sure is
(parallel_register_count == parallel_terminate_count == 0) in
ForgetBackgroundWorker in case of a crash.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri <nehakhat...@gmail.com> wrote:

> I feel there should be an assert if
> (BackgroundWorkerData->parallel_register_count -
> BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)
>
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Execute a long running parallel query q2 with number of
parallel worker spawned > 0.

The above assert statement will bring down the server unnecessarily
while executing q2.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> On 04/04/2017 06:52 PM, Robert Haas wrote:
>>
>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
>> wrote:
>>>
>>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmh...@gmail.com>
>>> wrote:
>>>>
>>>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>>>> <kuntalghosh.2...@gmail.com> wrote:
>>>>>
>>>>> 2. the server restarts automatically, initialize
>>>>> BackgroundWorkerData->parallel_register_count and
>>>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>>>> After that, it calls ForgetBackgroundWorker and it increments
>>>>> parallel_terminate_count.
>>>>
>>>>
>>>> Hmm.  So this seems like the root of the problem.  Presumably those
>>>> things need to be reset AFTER forgetting any background workers from
>>>> before the crash.
>>>>
>>> IMHO, the fix would be not to increase the terminated parallel worker
>>> count whenever ForgetBackgroundWorker is called due to a bgworker
>>> crash. I've attached a patch for the same. PFA.
>>
>>
>> While I'm not opposed to that approach, I don't think this is a good
>> way to implement it.  If you want to pass an explicit flag to
>> ForgetBackgroundWorker telling it whether or not it should performing
>> the increment, fine.  But with what you've got here, you're
>> essentially relying on "spooky action at a distance".  It would be
>> easy for future code changes to break this, not realizing that
>> somebody's got a hard dependency on 0 having a specific meaning.
>>
>
> I'm probably missing something, but I don't quite understand how these
> values actually survive the crash. I mean, what I observed is OOM followed
> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
> some reason?
AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.

>
> In any case, the comment right before BackgroundWorkerArray says this:
>
>  * These counters can of course overflow, but it's not important here
>  * since the subtraction will still give the right number.
>
> which means that this assert
>
> +   Assert(BackgroundWorkerData->parallel_register_count >=
> +   BackgroundWorkerData->parallel_terminate_count);
>
> is outright broken, just like any other attempts to rely on simple
> comparisons of these two counters, no?
>
IIUC, the assert statements ensures that register count should always
be greater than or equal to the terminate count.
Whereas, the comment refers to the fact that register count and
terminate count indicate the total number of parallel workers spawned
and terminated respectively since the server has been started. At
every session, we're not resetting the counts, hence they can
overflow. But, their substraction should give you the number of active
parallel worker at any instance.
So, I'm not able to see how the assert is broken w.r.t the aforesaid
comment. Am I missing something here?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-03 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> 2. the server restarts automatically, initialize
>> BackgroundWorkerData->parallel_register_count and
>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>> After that, it calls ForgetBackgroundWorker and it increments
>> parallel_terminate_count.
>
> Hmm.  So this seems like the root of the problem.  Presumably those
> things need to be reset AFTER forgetting any background workers from
> before the crash.
>
IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-parallel-worker-counts-after-a-crash.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 10:40 AM, Beena Emerson <memissemer...@gmail.com> wrote:
> On 30 Mar 2017 15:10, "Kuntal Ghosh" <kuntalghosh.2...@gmail.com> wrote:

>> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set
>> as
>> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
>> inbuilt value.
> Several methods are declared and defined in different tools to fetch
> the size of wal-seg-size.
> In pg_standby.c,
> RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */
>
> In pg_basebackup/streamutil.c,
>  on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
> SHOW wal_segment_size */
>
> In pg_waldump.c,
> ReadXLogFromDir(char *archive_loc)
> RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
> location to set XLogSegsize from the first WAL file */
>
> IMHO, it's better to define a single method in xlog.c and based on the
> different strategy, it can retrieve the XLogSegsize on behalf of
> different modules. I've suggested the same in my first set review and
> I'll still vote for it. For example, in xlog.c, you can define
> something as following:
> bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)
>
> Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
> the void pointer to the appropriate type. So, when a new tool needs to
> retrieve XLogSegSize, it can just call this function instead of
> defining a new RetrieveXLogSegSize method.
>
> It's just a suggestion from my side. Is there anything I'm missing
> which can cause the aforesaid approach not to be working?
> Apart from that, I've nothing to add here.
>
>
>
> I do not think a generalised function is a good idea. Besides, I feel the
> respective approaches are best kept in the modules used also because the
> internal code is not easily accessible by utils.
>
Ahh, I wonder what the reason can be. Anyway, I'll leave that decision
for the committer. I'm moving the status to Ready for committer.

I've only tested the patch in my 64-bit linux system. It needs some
testing on other environment settings.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 5:43 AM, Neha Khatri <nehakhat...@gmail.com> wrote:
>
> On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
> wrote:
>>
>> On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
>> <kuntalghosh.2...@gmail.com> wrote:
>> >
>> > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute
>> > any parallel query.
>> >  In LaunchParallelWorkers, you can see
>> >nworkers = n nworkers_launched = n (n>0)
>> > But, all the workers will crash because of the assert statement.
>> > 2. the server restarts automatically, initialize
>> > BackgroundWorkerData->parallel_register_count and
>> > BackgroundWorkerData->parallel_terminate_count in the shared memory.
>> > After that, it calls ForgetBackgroundWorker and it increments
>> > parallel_terminate_count. In LaunchParallelWorkers, we have the
>> > following condition:
>> > if ((BackgroundWorkerData->parallel_register_count -
>> >  BackgroundWorkerData->parallel_terminate_count) >=
>> > max_parallel_workers)
>> > DO NOT launch any parallel worker.
>> > Hence, nworkers = n nworkers_launched = 0.
>> parallel_register_count and parallel_terminate_count, both are
>> unsigned integer. So, whenever the difference is negative, it'll be a
>> well-defined unsigned integer and certainly much larger than
>> max_parallel_workers. Hence, no workers will be launched. I've
>> attached a patch to fix this.
>
>
> The current explanation of active number of parallel workers is:
>
>  * The active
>  * number of parallel workers is the number of registered workers minus the
>  * terminated ones.
>
> In the situations like you mentioned above, this formula can give negative
> number for active parallel workers. However a negative number for active
> parallel workers does not make any sense.
Agreed.

> I feel it would be better to explain in code that in what situations, the
> formula
> can generate a negative result and what that means.
I think that we need to find a fix so that it never generates a
negative result. The last patch submitted by me generates a negative
value correctly. But, surely that's not enough.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
<kuntalghosh.2...@gmail.com> wrote:
>
> 1. Put an Assert(0) in ParallelQueryMain(), start server and execute
> any parallel query.
>  In LaunchParallelWorkers, you can see
>nworkers = n nworkers_launched = n (n>0)
> But, all the workers will crash because of the assert statement.
> 2. the server restarts automatically, initialize
> BackgroundWorkerData->parallel_register_count and
> BackgroundWorkerData->parallel_terminate_count in the shared memory.
> After that, it calls ForgetBackgroundWorker and it increments
> parallel_terminate_count. In LaunchParallelWorkers, we have the
> following condition:
> if ((BackgroundWorkerData->parallel_register_count -
>  BackgroundWorkerData->parallel_terminate_count) >=
> max_parallel_workers)
> DO NOT launch any parallel worker.
> Hence, nworkers = n nworkers_launched = 0.
parallel_register_count and parallel_terminate_count, both are
unsigned integer. So, whenever the difference is negative, it'll be a
well-defined unsigned integer and certainly much larger than
max_parallel_workers. Hence, no workers will be launched. I've
attached a patch to fix this.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


fix-workers-launched.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 12:32 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>> Hi,
>>
>> While doing some benchmarking, I've ran into a fairly strange issue with OOM
>> breaking LaunchParallelWorkers() after the restart. What I see happening is
>> this:
>>
>> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>>
>> nworkers=8 nworkers_launched=8
>>
>> 2) the query does a Hash Aggregate, but ends up eating much more memory due
>> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
>> by OOM
>>
>> 3) the server restarts, the query is executed again, but this time we get in
>> LaunchParallelWorkers
>>
>> nworkers=8 nworkers_launched=0
>>
>> There's nothing else running on the server, and there definitely should be
>> free parallel workers.
>>
>> 4) The query gets killed again, and on the next execution we get
>>
>> nworkers=8 nworkers_launched=8
>>
>> again, although not always. I wonder whether the exact impact depends on OOM
>> killing the leader or worker, for example.
>
> I don't know what's going on but I think I have seen this once or
> twice myself while hacking on test code that crashed.  I wonder if the
> DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
> the DSM control is somehow confused?
>
I think I've run into the same problem while working on parallelizing
plans containing InitPlans. You can reproduce that scenario by
following steps:

1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
 In LaunchParallelWorkers, you can see
   nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
 BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.

I thought because of my stupid mistake the parallel worker is
crashing, so, this is supposed to happen. Sorry for that.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-30 Thread Kuntal Ghosh
On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson <memissemer...@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
Thanks for splitting the patches.
01-add-XLogSegmentOffset-macro.patch is same as before and it looks good.

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.
looks good.

> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
Several methods are declared and defined in different tools to fetch
the size of wal-seg-size.
In pg_standby.c,
RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */

In pg_basebackup/streamutil.c,
 on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
SHOW wal_segment_size */

In pg_waldump.c,
ReadXLogFromDir(char *archive_loc)
RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
location to set XLogSegsize from the first WAL file */

IMHO, it's better to define a single method in xlog.c and based on the
different strategy, it can retrieve the XLogSegsize on behalf of
different modules. I've suggested the same in my first set review and
I'll still vote for it. For example, in xlog.c, you can define
something as following:
bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)

Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
the void pointer to the appropriate type. So, when a new tool needs to
retrieve XLogSegSize, it can just call this function instead of
defining a new RetrieveXLogSegSize method.

It's just a suggestion from my side. Is there anything I'm missing
which can cause the aforesaid approach not to be working?
Apart from that, I've nothing to add here.

> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE
looks good.

>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
Nothing from me to add here.

I've nothing to add here for the attached set of patches. On top of
these, David's patch can be used for preserving LSNs in the file
naming for all segment sizes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-27 Thread Kuntal Ghosh
Thank you Robert for committing the patch.

commit fc70a4b0df38bda6a13941f1581f25fbb643c7f3

I've changed the status to Committed.

On Mon, Mar 27, 2017 at 6:09 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 5:26 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Fri, Mar 24, 2017 at 9:23 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>> I think this is still not good.  The places where pgstat_bestart() has
>>> been added are not even correct.  For example, the call added to
>>> BackgroundWriterMain() occurs after the section that does
>>> error-recovery, so it would get repeated every time the background
>>> writer recovers from an error.  There are similar problems elsewhere.
>>> Furthermore, although in theory there's an idea here that we're making
>>> it no longer the responsibility of InitPostgres() to call
>>> pgstat_bestart(), the patch as proposed only removes one of the two
>>> calls, so we really don't even have a consistent practice.  I think
>>> it's better to go with the idea of having InitPostgres() be
>>> responsible for calling this for regular backends, and
>>> AuxiliaryProcessMain() for auxiliary backends.  That involves
>>> substantially fewer calls to pgstat_bestart() and they are spread
>>> across only two functions, which IMHO makes fewer bugs of omission a
>>> lot less likely.
>>
>> Agreed. Calling it from  InitPostgres() and AuxiliaryProcessMain()
>> seems correct because of the following two reasons as you've mentioned
>> up in the thread:
>> 1. security-filtering should be left to some higher-level facility
>> that can make policy decisions rather than being hard-coded in the
>> individual modules.
>> 2. makes fewer bugs of omission a lot less likely.
>
> Okay, fine for me.
>
>>> - I modified the code to tolerate a NULL return from
>>> AuxiliaryPidGetProc().  I am pretty sure that without that there's a
>>> race condition that could lead to a crash if somebody tried to call
>>> this function just as an auxiliary process was terminating.
>>
>> Wow. Haven't thought of that. If it's called after
>> AuxiliaryProcKill(), a crash is evident.
>
> This one is a good catch.
> --
> Michael



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-25 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 9:23 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 7:29 AM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> On Thu, Mar 23, 2017 at 8:19 PM, Kuntal Ghosh
>> <kuntalghosh.2...@gmail.com> wrote:
>>> Hence, to be consistent with others, bgworker processes can be
>>> initialized from BackgroundWorkerInitializeConnectionBy[Oid].
>>
>> Yeah, I am fine with that. Thanks for the updated versions.
>
> I think this is still not good.  The places where pgstat_bestart() has
> been added are not even correct.  For example, the call added to
> BackgroundWriterMain() occurs after the section that does
> error-recovery, so it would get repeated every time the background
> writer recovers from an error.  There are similar problems elsewhere.
> Furthermore, although in theory there's an idea here that we're making
> it no longer the responsibility of InitPostgres() to call
> pgstat_bestart(), the patch as proposed only removes one of the two
> calls, so we really don't even have a consistent practice.  I think
> it's better to go with the idea of having InitPostgres() be
> responsible for calling this for regular backends, and
> AuxiliaryProcessMain() for auxiliary backends.  That involves
> substantially fewer calls to pgstat_bestart() and they are spread
> across only two functions, which IMHO makes fewer bugs of omission a
> lot less likely.
Agreed. Calling it from  InitPostgres() and AuxiliaryProcessMain()
seems correct because of the following two reasons as you've mentioned
up in the thread:
1. security-filtering should be left to some higher-level facility
that can make policy decisions rather than being hard-coded in the
individual modules.
2. makes fewer bugs of omission a lot less likely.

> Updated patch with that change attached.  In addition to that
> modification, I made some other alterations:
>
> - I changed the elog() for the can't-happen case in pgstat_bestart()
> from PANIC to FATAL.  The contents of shared memory aren't corrupted,
> so I think PANIC is overkill.
Agreed and duly noted for future.

> - I tweaked the comment in WalSndLoop() just before the
> pgstat_report_activity() call to accurately reflect what the effect of
> that call now is.
>
> - I changed the column ordering in pg_stat_get_activity() to put
> backend_type with the other columns that appear in pg_stat_activity,
> rather than (as the patch did) grouping it with the ones that appear
> in pg_stat_ssl.
Thank you.

> - I modified the code to tolerate a NULL return from
> AuxiliaryPidGetProc().  I am pretty sure that without that there's a
> race condition that could lead to a crash if somebody tried to call
> this function just as an auxiliary process was terminating.
Wow. Haven't thought of that. If it's called after
AuxiliaryProcKill(), a crash is evident.

> - I updated the documentation slightly.
Looks good to me.

> - I rebased over some conflicting commits.
Sorry for the inconveniences. It seems that the conflicting changes
occurred within few hours after I've posted the patch.

> If there aren't objections, I plan to commit this version.
Thank you for looking into the patch and doing the necessary changes.
All the changes look good to me and I've tested the feature and it has
passed all the regression tests.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 2:17 PM, Kuntal Ghosh
<kuntalghosh.2...@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:
>> On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2...@gmail.com> wrote:
>>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>>> <kuntalghosh.2...@gmail.com> wrote:
>>>> Hello,
>>>> In Windows, if one needs to take a dump in plain text format (this is
>>>> the default option, or can be specified using -Fp) with some level of
>>>> compression (-Z[0-9]), an output file has to
>>>> be specified. Otherwise, if the output is redirected to stdout, it'll
>>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>>>> carriage returns in the file).
>>> To reproduce the issue, please use the following command in windows cmd:
>>>
>>> pg_dump -Z 9 test > E:\test_xu.backup
>>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>>
>> This is a known problem. It is not specific to PostgreSQL, it affects
>> any software that attempts to use stdin/stdout on Windows via cmd,
>> where it is not 8-bit clean.
>>
>> We don't just refuse to run with stdout as a destination because it's
>> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
>> as I know, tell whether it's being invoked by cmd or something else.
> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
> whenever we want compression on a plain-text dump file, we can set the
> stdout mode to O_BINARY. Is it a wrong approach?
With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2...@gmail.com> wrote:
>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>> <kuntalghosh.2...@gmail.com> wrote:
>>> Hello,
>>> In Windows, if one needs to take a dump in plain text format (this is
>>> the default option, or can be specified using -Fp) with some level of
>>> compression (-Z[0-9]), an output file has to
>>> be specified. Otherwise, if the output is redirected to stdout, it'll
>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>>> carriage returns in the file).
>> To reproduce the issue, please use the following command in windows cmd:
>>
>> pg_dump -Z 9 test > E:\test_xu.backup
>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>
> This is a known problem. It is not specific to PostgreSQL, it affects
> any software that attempts to use stdin/stdout on Windows via cmd,
> where it is not 8-bit clean.
>
> We don't just refuse to run with stdout as a destination because it's
> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
> as I know, tell whether it's being invoked by cmd or something else.
ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

> If you have concrete ideas on how to improve this they'd be welcomed.
> Is there anywhere you expected to find info in the docs? Do you know
> of a way to detect in Windows if the output stream is not 8-bit clean
> from within the application program? ... other?
Actually, I'm not that familiar with windows environment. But, I
couldn't find any note to user in pg_dump documentation regarding the
issue. In cmd, if someone needs a plain-text dump in compressed
format, they should specify the output file, otherwise they may run
into the above problem. However, if a dump is corrupted due to the
above issue, a fix for that is provided in [1]. Should we include this
in the documentation?



[1] http://www.gzip.org/
Use fixgz.c to remove the extra CR (carriage return) bytes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
<kuntalghosh.2...@gmail.com> wrote:
> Hello,
> In Windows, if one needs to take a dump in plain text format (this is
> the default option, or can be specified using -Fp) with some level of
> compression (-Z[0-9]), an output file has to
> be specified. Otherwise, if the output is redirected to stdout, it'll
> create a corrupted dump (cmd is set to ASCII mode, so it'll put
> carriage returns in the file).
To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-23 Thread Kuntal Ghosh
Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

I'm referring the following part of pg_dump code:

/*
 * On Windows, we need to use binary mode to read/write non-text archive
 * formats.  Force stdin/stdout into binary mode if that is what we are
 * using.
 */
#ifdef WIN32
if (fmt != archNull &&
(AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
{
if (mode == archModeWrite)
setmode(fileno(stdout), O_BINARY);
else
setmode(fileno(stdin), O_BINARY);
}
#endif

For plain-text format, fmt is set to archNull. In that case, the
binary mode will not be forced(I think). To fix this, I've attached a
patch which adds one extra check in the 'if condition' to check the
compression level. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-pg_dump-for-Windows-cmd.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-23 Thread Kuntal Ghosh
On Wed, Mar 22, 2017 at 9:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 12:20 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Wed, Mar 22, 2017 at 1:31 AM, Michael Paquier
>> <michael.paqu...@gmail.com> wrote:
>>> Okay, switched as ready for committer. One note for the committer
>>> though: keeping the calls of pgstat_bestart() out of
>>> BackgroundWorkerInitializeConnection() and
>>> BackgroundWorkerInitializeConnectionByOid() keeps users the
>>> possibility to not have backends connected to the database show up in
>>> pg_stat_activity. This may matter for some users (cloud deployment for
>>> example). I am as well in favor in keeping the work of those routines
>>> minimal, without touching at pgstat.
>>
>> I think that's just inviting bugs of omission, in both core and
>> extension code.  I think it'd be much better to do this in a
>> centralized place.
>
> I mean, your argument boils down to "somebody might want to
> deliberately hide things from pg_stat_activity".  But that's not
> really a mode we support in general, and supporting it only for
> certain cases doesn't seem like something that this patch should be
> about.  We could add an option to BackgroundWorkerInitializeConnection
> and BackgroundWorkerInitializeConnectionByOid to suppress it, if it's
> something that somebody wants, but actually I'd be more inclined to
> think that everybody (who has a shared memory connection) should go
> into the machinery and then security-filtering should be left to some
> higher-level facility that can make policy decisions rather than being
> hard-coded in the individual modules.
>
> But I'm slightly confused as to how this even arises.  Background
> workers already show up in pg_stat_activity output, or at least I sure
> think they do.  So why does this patch need to make any change to that
> case at all?
When we initialize a process through InitPostgres, the control returns
from the method at different locations based on the process's type(as
soon as its relevant information is initialized).  Following is the
order of returning a process from InitPostgres:

IsAutoVacuumLauncherProcess
walsender not connected to a DB
bgworker not connected to a DB
Other processes using InitPostgres

Before the patch, not all the processes are shown in pg_stat_activity.
Hence, only at two locations in InitPostgres, we need to call
pgstat_bestart() to initialize the entry for the respective process.
But, since we're increasing the types of a process shown in
pg_stat_activity, it seems to be reasonable to move the
pgstat_bestart() call to a proc's main entry point. I've followed the
same approach for auxiliary processes as well.

AutovacuumLauncher - AutoVacLauncherMain()
bgwriter BackgroundWriterMain()
checkpointer CheckpointerMain()
startup StartupProcessMain()
walwriter WalWriterMain()
walreceiver WalReceiverMain()
walsender InitWalSender()

Hence, to be consistent with others, bgworker processes can be
initialized from BackgroundWorkerInitializeConnectionBy[Oid].

I've attached the updated patches which reflect the above change. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-all-backends.patch
Description: binary/octet-stream


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-22 Thread Kuntal Ghosh
On Wed, Mar 22, 2017 at 3:14 PM, Beena Emerson <memissemer...@gmail.com> wrote:
> PFA an updated patch which fixes a minor bug I found. It only increases the
> string size in pretty_wal_size function.
> The 01-add-XLogSegmentOffset-macro.patch has also been rebased.
Thanks for the updated versions. Here is a partial review of the patch:

In pg_standby.c and pg_waldump.c,
+ XLogPageHeader hdr = (XLogPageHeader) buf;
+ XLogLongPageHeader NewLongPage = (XLogLongPageHeader) hdr;
+
+ XLogSegSize = NewLongPage->xlp_seg_size;
It waits until the file is at least XLOG_BLCKSZ, then gets the
expected final size from XLogPageHeader. This looks really clean
compared to the previous approach.

+ * Verify that the min and max wal_size meet the minimum requirements.
Better to write min_wal_size and max_wal_size.

+ errmsg("Insufficient value for \"min_wal_size\"")));
"min_wal_size %d is too low" may be? Use lower case for error
messages. Same for max_wal_size.

+ /* Set the XLogSegSize */
+ XLogSegSize = ControlFile->xlog_seg_size;
+
A call to IsValidXLogSegSize() will be good after this, no?

+ /* Update variables using XLogSegSize */
+ check_wal_size();
The method name looks somewhat misleading compared to the comment for
it, doesn't it?

+ * allocating space and reading ControlFile.
s/and/for

+ {"TB", GUC_UNIT_MB, 1024 * 1024},
+ {"GB", GUC_UNIT_MB, 1024},
+ {"MB", GUC_UNIT_MB, 1},
+ {"kB", GUC_UNIT_MB, -1024},
@@ -2235,10 +2231,10 @@ static struct config_int ConfigureNamesInt[] =
  {"min_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
  gettext_noop("Sets the minimum size to shrink the WAL to."),
  NULL,
- GUC_UNIT_XSEGS
+ GUC_UNIT_MB
  },
- _wal_size,
- 5, 2, INT_MAX,
+ _wal_size_mb,
+ DEFAULT_MIN_WAL_SEGS * 16, 2, INT_MAX,
  NULL, NULL, NULL
  },

@@ -2246,10 +2242,10 @@ static struct config_int ConfigureNamesInt[] =
  {"max_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
  gettext_noop("Sets the WAL size that triggers a checkpoint."),
  NULL,
- GUC_UNIT_XSEGS
+ GUC_UNIT_MB
  },
- _wal_size,
- 64, 2, INT_MAX,
+ _wal_size_mb,
+ DEFAULT_MAX_WAL_SEGS * 16, 2, INT_MAX,
  NULL, assign_max_wal_size, NULL
  },
This patch introduces a new guc_unit having values in MB for
max_wal_size and min_wal_size. I'm not sure about the upper limit
which is set to INT_MAX for 32-bit systems as well. Is it needed to
define something like MAX_MEGABYTES similar to MAX_KILOBYTES?
It is worth mentioning that GUC_UNIT_KB can't be used in this case
since MAX_KILOBYTES is INT_MAX / 1024(<2GB) on 32-bit systems. That's
not a sufficient value for min_wal_size/max_wal_size.

While testing with pg_waldump, I got the following error.
bin/pg_waldump -p master/pg_wal/ -s 0/0100
Floating point exception (core dumped)
Stack:
#0  0x004039d6 in ReadPageInternal ()
#1  0x00404c84 in XLogFindNextRecord ()
#2  0x00401e08 in main ()
I think that the problem is in following code:
/* parse files as start/end boundaries, extract path if not specified */
if (optind < argc)
{

+ if (!RetrieveXLogSegSize(full_path))
...
}
In this case, RetrieveXLogSegSize is conditionally called. So, if the
condition is false, XLogSegSize will not be initialized.

I'm yet to review pg_basebackup module and I'll try to finish that ASAP.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-21 Thread Kuntal Ghosh
On Tue, Mar 21, 2017 at 10:52 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
Thank you for the review.

> Unfortunately this is true only for background workers that connect to
> a database. And this would break for bgworkers that do not do that.
> The point to fix is here:
> +   if (MyBackendId != InvalidBackendId)
> +   {
> [...]
> +   else if (IsBackgroundWorker)
> +   {
> +   /* bgworker */
> +   beentry->st_backendType = B_BG_WORKER;
> +   }
> Your code is assuming that a bgworker will always be setting
> MyBackendId which is not necessarily true, and you would trigger the
> assertion down:
> Assert(MyAuxProcType != NotAnAuxProcess);
> So you need to rely on IsBackgroundWorker in priority I think. I would
> suggest as well to give up calling pgstat_bestart() in
> BackgroundWorkerInitializeConnection[ByOid] and let the workers do
> this work by themselves. This gives more flexibility. You would need
> to update the logical replication worker and worker_spi for that as
> well.
We reserve a slot for each possible BackendId, plus one for each
possible auxiliary process type. For a non-auxiliary process,
BackendId is used to refer the backend status in PgBackendStatus
array. So, a bgworker without any BackendId can't initialize its'
entry in PgBackendStatus array. In simple terms, it will not be shown
in pg_stat_activity. I've added some comments regarding this in
pgstat_bestart().
And, any bgworker having valid BackendId will have either a valid
userid or BOOTSTRAP_SUPERUSERID.

> If you want to test this configuration, feel free to use this background 
> worker:
> https://github.com/michaelpq/pg_plugins/tree/master/hello_world
> This just prints an entry to the logs every 10s, and does not connect
> to any database. Adding a call to pgstat_bestart() in hello_main
> triggers the assertion.
>
In this case, pgstat_bestart() shouldn't be called from this module as
the spawned bgworker will have invalid BackendId. By the way, thanks
for sharing the repo link. Found a lot of interesting things to
explore and learn. :)

>>> @@ -808,6 +836,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
>>> nulls[12] = true;
>>> nulls[13] = true;
>>> nulls[14] = true;
>>> +   nulls[23] = true;
>>> }
>>> That's not possible to have backend_type set as NULL, no?
>>
>> Yes, backend_type can't be null. But, I'm not sure whether it should
>> be visible to a user with insufficient privileges. Anyway, I've made
>> it visible to all user for now.
>>
>> Please find the updated patches in the attachment.
>
> Yeah, hiding it may make sense...
Modified.

> /* The autovacuum launcher is done here */
> if (IsAutoVacuumLauncherProcess())
> +   {
> return;
> +   }
> Unnecessary noise here.
>
Fixed.

> Except for the two issues pointed out in this email, I am pretty cool
> with this patch. Nice work.
Thank you. :)

Please find the updated patches.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: application/download


0002-Expose-stats-for-all-backends.patch
Description: application/download


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Two phase commit in ECPG

2017-03-17 Thread Kuntal Ghosh
On Fri, Mar 17, 2017 at 4:34 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 12:17 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> On Tue, Mar 14, 2017 at 1:35 AM, Michael Meskes <mes...@postgresql.org> 
>> wrote:
>>>> Previous 002 patch lacked to add describing PREPARE TRANSACTION.
>>>> Attached updated 002 patch.
>>>
>>> I just committed both patches and a backport of the bug fix itself.
>>> Thanks again for finding and fixing.
>> Regression tests for sql/twophase is failing while performing the test
>> with make installcheck.
>> + [NO_PID]: ecpg_check_PQresult on line 32: bad response - ERROR:
>> prepared transactions are disabled
>> + HINT:  Set max_prepared_transactions to a nonzero value.
>>
>> Setting max_prepared_transactions accordingly fixes the issue. But,
>> I'm not sure whether this test should be performed by installcheck by
>> default or should only be performed by make
>> installcheck-prepared-txns.
>>
>
> Thank you for pointing out.
> Yeah, I agree that the twophase regression test should not be
> performed by default as long as the default value of
> max_prepared_transactions is 0. Similar to this, the isolation check
> regression test does same thing. Attached patch removes sql/twophase
> from schedule file and add new type of regression test.
>
The patch looks good. I've performed installcheck and
installcheck-prepared-txns. It's working as it should be.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-17 Thread Kuntal Ghosh
On Thu, Mar 16, 2017 at 1:18 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 9:14 PM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> I've attached the updated patches.
>
> Thanks for the new versions. This begins to look really clear.
Thanks again for the review.

> Having some activity really depends on the backend type (see
> autovacuum workers for example which fill in the query field), so my
> 2c here is that we let things as your patch proposes. If at some point
> it makes sense to add something in the query field, we could always
> discuss it separately and patch it accordingly.
+1.

> +/* Total number of backends including auxiliary */
> +#define NumBackendStatSlots (MaxBackends + NUM_AUXPROCTYPES)
> +
> This variable remains localized in pgstat.c, so let's define it there.
Done.

> +  bgworker, background writer,
> That's really bike-shedding, but we could say here "background worker"
> and be consistent with the rest.
Done.

> +/* Total number of backends including auxiliary */
> +#define NumBackendStatSlots (MaxBackends + NUM_AUXPROCTYPES)
> This could be a bit more precise, telling as well that MaxBackends
> includes autovacuum workers and background workers.
Done.

> - * --
> + *
> + * Each auxiliary process also maintains a PgBackendStatus struct in shared
> + * memory.
>   */
> Better to not delete this line, this prevents pgindent to touch this
> comment block.
Good to know. Fixed.

> Did you try if this patch worked with EXEC_BACKEND? Sorry I don't have
> a Windows workstation at hand now, but as AuxiliaryProcs is
> NON_EXEC_STATIC...
Thanks to Ashutosh for testing the patches on Windows. It's working fine.

> +   /* We have userid for client-backends and wal-sender processes */
> +   if (beentry->st_backendType == B_BACKEND ||
> beentry->st_backendType == B_WAL_SENDER)
> +   beentry->st_userid = GetSessionUserId();
> +   else
> +   beentry->st_userid = InvalidOid;
> This can be true as well for bgworkers defining a role OID when
> connecting with BackgroundWorkerInitializeConnection().
Fixed.

> +   /*
> +* Before returning, report autovacuum launcher process in the
> +* PgBackendStatus array.
> +*/
> +   pgstat_bestart();
> return;
> Wouldn't that be better in AutoVacLauncherMain()?
Agreed and done that way.

> +   /*
> +* Before returning, report the background worker process in the
> +* PgBackendStatus array.
> +*/
> +   if (!bootstrap)
> +   pgstat_bestart();
> Ditto with BackgroundWriterMain().
Perhaps you meant BackgroundWorkerInitializeConnection and
BackgroundWorkerInitializeConnectionByOid. Done.

> @@ -808,6 +836,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
> nulls[12] = true;
> nulls[13] = true;
> nulls[14] = true;
> +   nulls[23] = true;
> }
> That's not possible to have backend_type set as NULL, no?
Yes, backend_type can't be null. But, I'm not sure whether it should
be visible to a user with insufficient privileges. Anyway, I've made
it visible to all user for now.

Please find the updated patches in the attachment.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-all-backends.patch
Description: binary/octet-stream


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Two phase commit in ECPG

2017-03-17 Thread Kuntal Ghosh
On Tue, Mar 14, 2017 at 1:35 AM, Michael Meskes <mes...@postgresql.org> wrote:
>> Previous 002 patch lacked to add describing PREPARE TRANSACTION.
>> Attached updated 002 patch.
>
> I just committed both patches and a backport of the bug fix itself.
> Thanks again for finding and fixing.
Regression tests for sql/twophase is failing while performing the test
with make installcheck.
+ [NO_PID]: ecpg_check_PQresult on line 32: bad response - ERROR:
prepared transactions are disabled
+ HINT:  Set max_prepared_transactions to a nonzero value.

Setting max_prepared_transactions accordingly fixes the issue. But,
I'm not sure whether this test should be performed by installcheck by
default or should only be performed by make
installcheck-prepared-txns.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-03-15 Thread Kuntal Ghosh
On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> Based on that idea, I have modified the patch such that it will
> compute the set of initplans Params that are required below gather
> node and store them as bitmap of initplan params at gather node.
> During set_plan_references, we can find the intersection of external
> parameters that are required at Gather or nodes below it with the
> initplans that are passed from same or above query level. Once the set
> of initplan params are established, we evaluate those (if they are not
> already evaluated) before execution of gather node and then pass the
> computed value to each of the workers.   To identify whether a
> particular param is parallel safe or not, we check if the paramid of
> the param exists in initplans at same or above query level.  We don't
> allow to generate gather path if there are initplans at some query
> level below the current query level as those plans could be
> parallel-unsafe or undirect correlated plans.

I would like to mention different test scenarios with InitPlans that
we've considered while developing and testing of the patch.

An InitPlan is a subselect that doesn't take any reference from its
immediate outer query level and it returns a param value. For example,
consider the following query:

  QUERY PLAN
--
 Seq Scan on t1
   Filter: (k = $0)
   allParams: $0
   InitPlan 1 (returns $0)
 ->  Aggregate
   ->  Seq Scan on t3
In this case, the InitPlan is evaluated once when the filter is
checked for the first time. For subsequent checks, we need not
evaluate the initplan again since we already have the value. In our
approach, we parallelize the sequential scan by inserting a Gather
node on top of parallel sequential scan node. At the Gather node, we
evaluate the InitPlan before spawning the workers and pass this value
to the worker using dynamic shared memory. This yields the following
plan:
QUERY PLAN
---
 Gather
   Workers Planned: 2
   Params Evaluated: $0
   InitPlan 1 (returns $0)
   ->  Aggregate
   ->  Seq Scan on t3
   ->  Parallel Seq Scan on t1
 Filter: (k = $0)
As Amit mentioned up in the thread, at a Gather node, we evaluate only
those InitPlans that are attached to this query level or any higher
one and are used under the Gather node. extParam at a node includes
the InitPlan params that should be passed from an outer node. I've
attached a patch to show extParams and allParams for each node. Here
is the output with that patch:
QUERY PLAN
---
 Gather
   Workers Planned: 2
   Params Evaluated: $0
   allParams: $0
   InitPlan 1 (returns $0)
 ->  Finalize Aggregate
   ->  Gather
 Workers Planned: 2
 ->  Partial Aggregate
   ->  Parallel Seq Scan on t3
   ->  Parallel Seq Scan on t1
 Filter: (k = $0)
 allParams: $0
 extParams: $0
In this case, $0 is included in extParam of parallel sequential scan
and the InitPlan corresponding to this param is attached to the same
query level that contains the Gather node. Hence, we evaluate $0 at
Gather and pass it to workers.

But, for generating a plan like this requires marking an InitPlan
param as parallel_safe. We can't mark all params as parallel_safe
because of correlated subselects. Hence, in
max_parallel_hazard_walker, the only params marked safe are InitPlan
params from current or outer query level. An InitPlan param from inner
query level isn't marked safe since we can't evaluate this param at
any Gather node above the current node(where the param is used). As
mentioned by Amit, we also don't allow generation of gather path if
there are InitPlans at some query level below the current query level
as those plans could be parallel-unsafe or undirect correlated plans.

I've attached a script file and its output containing several
scenarios relevant to InitPlans. I've also attached the patch for
displaying extParam and allParam at each node. This patch can be
applied on top of pq_pushdown_initplan_v3.patch. Please find the
attachments.


> This restricts some of the cases for parallelism like when initplans
> are below gather node, but the patch looks better. We can open up
> those cases if required in a separate patch.
+1. Unfortunately, this patch doesn't enable parallelism for all
possible cases with InitPlans. Our objective is to keep things simple
and clean. Still, TPC-H q22 runs 2.5~3 times faster with this patch.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


script.out
Description: Binary data


script.sql
Description: application/sql


0002-Show-extParams-and-allParams-in-explain.patch
Description: binary/octet-stream

-- 

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-15 Thread Kuntal Ghosh
On Tue, Mar 14, 2017 at 1:50 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> "writer" would be better if defined as "background writer" instead?
> You are forgetting in this list autovacuum workers and the startup
> process, the latter is important for nodes in recovery.
>
Modified "writer" as "background writer". Included autovacuum workers
and startup process.

> + Type of the current server process. Possible types are:
> +  autovacuum launcher, bgworker, checkpointer, client backend,
> +  wal receiver, wal sender, wal writer and writer.
> + 
> There should be markups around those terms. Shouldn't "wal writer" and
> "wal sender" and "wal receiver" not use any space? On HEAD a WAL
> sender is defined as "walsender".
Enclosed each type in . Removed space from "wal sender" and
"wal receiver".

> For WAL senders, the "query" field is still filled with "walsender".
> This should be removed.
Fixed.

> @@ -365,6 +368,8 @@ CheckpointerMain(void)
>  */
> AbsorbFsyncRequests();
>
> +   pgstat_report_activity(STATE_RUNNING, NULL);
> +
> if (got_SIGHUP)
> It seems to me that what we want to know here are only the wait
> events, so I think that you had better drop this portion of the patch.
> It is hard to decide if an auxiliary process is idle or running, and
> this routine is a concept that applies to database backends running
> queries.
Removed this part as suggested.

I've attached the updated patches.
In 0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch,
I've extended BackendStatusArray to to store auxiliary processes.
Backends
use slots indexed in the range from 1 to MaxBackends (inclusive), so
we can use MaxBackends + AuxBackendType + 1 as the index of the slot
for an auxiliary process. Also, I've added a backend_type to describe
the type of the backend process. The type includes:
* autovacuum launcher
* autovacuum worker
* background writer
* bgworker
* client backend
* checkpointer
* startup
* walreceiver
* walsender
* walwriter
In 0002-Expose-stats-for-all-backends.patch, I've added the required
code for reporting activity of different auxiliary processes,
autovacuum launcher and bgworker processes.
In 0003-Add-backend_type-column-in-pg_stat_get_activity.patch, I've
added a column named backend_type in pg_stat_get_activity to show the
type of the process to user.

There are some pg_stat_* functions where showing all the backends
doesn't make much sense. For example,
postgres=# SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   pg_stat_get_backend_activity(s.backendid) AS query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
  pid  |  query
---+--
 17300 | SELECT pg_stat_get_backend_pid(s.backendid) AS pid, +
   |pg_stat_get_backend_activity(s.backendid) AS query   +
   | FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
 16925 | 
 16927 | 
 16926 | 
 16929 | 
IMHO, this scenario can be easily avoided by filtering backends using
backend_type. I'm not sure whether we should add any logic in the code
for handling such cases. Thoughts?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream


0002-Expose-stats-for-all-backends.patch
Description: binary/octet-stream


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-14 Thread Kuntal Ghosh
On Wed, Mar 15, 2017 at 12:32 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Mar 13, 2017 at 10:36 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Couple of review comments,,
>>
>> You may also need to update the documentation as now we are also going
>> to support wal consistency check for hash index. The current
>> documentation does not include hash index.
>>
>> +only records originating from those resource managers.  Currently,
>> +the supported resource managers are heap,
>> +heap2, btree, gin,
>> +gist, sequence, spgist,
>> +brin, and generic. Only
>
> Did that, committed this.  Also ran pgindent over hash_mask() and
> fixed an instance of dubious capitalization.
Thanks Robert for the commit.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-14 Thread Kuntal Ghosh
On Tue, Mar 14, 2017 at 1:50 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> Here is a first pass on this patch.
Thanks Michael for the review.

>
>  void
> -pgstat_bestart(void)
> +pgstat_procstart(void)
> I would not have thought that this patch justifies potentially
> breaking extensions.
Since I'm using this method to start all kind of processes including
client backends, auxiliary procs etc., I thought of changing the name
as above. But, this surely doesn't justify breaking extensions. So,
I'll change it back to pgstat_bestart.


> @@ -365,6 +368,8 @@ CheckpointerMain(void)
>  */
> AbsorbFsyncRequests();
>
> +   pgstat_report_activity(STATE_RUNNING, NULL);
> +
> if (got_SIGHUP)
> It seems to me that what we want to know here are only the wait
> events, so I think that you had better drop this portion of the patch.
> It is hard to decide if an auxiliary process is idle or running, and
> this routine is a concept that applies to database backends running
> queries.
I see your point. I'll remove this part.

> -static LocalPgBackendStatus *localBackendStatusTable = NULL;
> +
> +/* Status for backends and auxiliary processes */
> +static LocalPgBackendStatus *localProcStatusTable = NULL;
> I don't quite understand why you need to have two layers here,
> wouldn't it be more simple to just extend localBackendStatusTable with
> enough slots for the system processes? That would be also less
> bug-prone. You need to be careful to have a correct mapping to
> include:
> - auxiliary processes, up to 4 slots used.
> - bgworker processes, decided by GUC.
> - autovacuum workers, decided by GUC.
I do have extended localBackendStatusTable with slots for non-backend
processes. But, I've renamed it as localProcStatusTable since it
includes all processes. I'll keep the variable name as
localBackendStatusTable in the updated patch to avoid any confusion.
I've extended BackendStatusArray to store auxiliary processes.
Backends use slots indexed in the range from 1 to MaxBackends
(inclusive), so we use MaxBackends + AuxProcType + 1 as the index of
the slot for an auxiliary process.

Additionally, to store the index of currently active client backends
from localBackendStatusTable, I've added an array named
localBackendStatusIndex. This is useful for generating a set of
currently active client backend ID numbers (from 1 to the number of
active client backends). These IDs are used for some pgstat_*
functions relevant to client processes, e.g.,
pg_stat_get_backend_activity, pg_stat_get_backend_client_port etc.

Any suggestions?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Kuntal Ghosh
On Fri, Mar 10, 2017 at 3:11 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-03-09 16:37:29 -0500, Tom Lane wrote:
>> Robert Haas <robertmh...@gmail.com> writes:
>> > On Thu, Mar 9, 2017 at 2:30 PM, Peter Eisentraut
>> > <peter.eisentr...@2ndquadrant.com> wrote:
>> >> In practice, I think it's common to do a quick select * from
>> >> pg_stat_activity to determine whether a database instance is in use.
>>
>> > I thought of the same kind of thing, and it was discussed upthread.
>> > There seemed to be more votes for keeping it all in one view, but that
>> > could change if more people vote.
>>
>> I've not been paying much attention to this thread, but it seems like
>> something that would help Peter's use-case and have other uses as well
>> is a new column that distinguishes different process types --- user
>> session, background worker, autovacuum worker, etc.
>
> The patches upthread add precisely such a column.
>
The patch exposes auxiliary processes, autovacuum launcher and
bgworker along with other backends in pg_stat_activity. It also adds
an extra column, named proc_type (suggested by Craig
and Robert), to indicate the type of process in pg_stat_activity view.
proc_type includes:

* client backend
* autovacuum launcher
* wal sender
* bgworker
* writer
* checkpointer
* wal writer
* wal receiver

Here is the present output with the relevant columns.
postgres=# SELECT wait_event_type, wait_event, state, proc_type FROM
pg_stat_activity;
 wait_event_type | wait_event  | state  |  proc_type
-+-++-
 Activity| AutoVacuumMain  | idle   | autovacuum launcher
 Activity| LogicalLauncherMain | idle   | bgworker
 Activity| WalSenderMain   | idle   | wal sender
 | | active | client backend
 Activity| BgWriterMain| idle   | writer
 Activity    | CheckpointerMain| idle   | checkpointer
 Activity| WalWriterMain   | idle   | wal writer
(7 rows)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Kuntal Ghosh
Hello Amit,

On Tue, Mar 7, 2017 at 4:24 PM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> Hi Kuntal,
>
> Patches apply and compile fine.  Works as advertised.
>
> Some minor comments on the patches themselves.
>
Thanks for the review.

> In 0001:
>
> - * pgstat_bestart() -
> + * pgstat_procstart() -
> + *
> + *  Initialize this process's entry in the PgBackendStatus array.
> + *  Called from InitPostgres and AuxiliaryProcessMain.
>
> Not being called from AuxiliaryProcessMain().  Maybe leftover comment from
> a previous version.  Actually I see that in patch 0002, Main() functions
> of various auxiliary processes call pgstat_procstart, not
> AuxiliaryProcessMain.
>
Fixed.

> + * user-defined functions which expects ids of backends starting from
> 1 to
>
> s/expects/expect/g
>
Fixed.

> +/*
> + * AuxiliaryPidGetProc -- get PGPROC for an auxiliary process
> + * given its PID
> + *
> + * Returns NULL if not found.
> + */
> +PGPROC *
> +AuxiliaryPidGetProc(int pid)
> +{
> +PGPROC *result;
>
> Initialize to NULL so that the comment above is true. :)
>
Fixed.

> In 0002:
>
> @@ -248,6 +248,9 @@ BackgroundWriterMain(void)
>   */
>  prev_hibernate = false;
>
> +/* report walwriter process in the PgBackendStatus array */
> +pgstat_procstart();
> +
>
> s/walwriter/writer/g
Fixed.

> Patch 0004 should update monitoring.sgml.
Added.

I've attached the updated patches. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-non-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-auxiliary-processes-in-pg_stat_get_.patch
Description: binary/octet-stream


0003-Expose-stats-for-autovacuum-launcher-and-bgworker.patch
Description: binary/octet-stream


0004-Add-proc_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-08 Thread Kuntal Ghosh
On Fri, Mar 3, 2017 at 9:44 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> Hello everyone,
>>
>> I've attached a patch which implements WAL consistency checking for
>> hash indexes. This feature is going to be useful for developing and
>> testing of WAL logging for hash index.
>>
>
> 2.
> + else if ((opaque->hasho_flag & LH_BUCKET_PAGE) ||
> + (opaque->hasho_flag & LH_OVERFLOW_PAGE))
> + {
> + /*
> + * In btree bucket and overflow pages, it is possible to modify the
> + * LP_FLAGS without emitting any WAL record. Hence, mask the line
> + * pointer flags.
> + * See hashgettuple() for details.
> + */
> + mask_lp_flags(page);
> + }
>
> Again, this mechanism is also modified by patch "Microvacuum support
> for hash index", so above changes needs to be adjusted accordingly.
> Comment referring to btree is wrong, you need to refer hash.
I've corrected the text in the comment and re-based the patch on the
latest hash index patch for WAL logging[1]. As discussed in the
thread, Microvacuum patch can be re-based on top of this patch.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BmvCucroWQwX3S7aBR%3D0yBJGF%2BjQz4x4Cx9QVsMFTZUw%40mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-wal_consistency_checking-for-hash-index_v1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-03-05 Thread Kuntal Ghosh
On Sat, Mar 4, 2017 at 11:09 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <and...@anarazel.de> wrote:
>> attached is a patch to address this problem, and the one reported by
>> Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
>> addresses all the performance issues, often being noticeably faster than
>> with the dynahash code.
>>
>> Comments?
+errdetail("size %f/members %f: factor %.2f",
+  (double)tb->size, (double)tb->members,
+  (double) tb->members / tb->size),
In SH_STAT, we use 'filled' instead of 'factor'. For displaying size
and members, there is no need to convert those into double.


> I'm still not convinced that raising the fillfactor like this is going
> to hold up in testing, but I don't mind you committing it and we'll
> see what happens.  If it is possible to survive with a fillfactor that
> high, it certainly has some advantages, and it's obviously more likely
> to work out with the other logic you've added.  I think we need a lot
> more people playing with this to know whether any given approach is
> right, and I think getting something committed will help more than any
> amount of theoretical discussion.
+1

> I think DEBUG1 is far too high for something that could occur with
> some frequency on a busy system; I'm fairly strongly of the opinion
> that you ought to downgrade that by a couple of levels, say to DEBUG3
> or so.
+1

I've tested with TPC-H query 18 and it's working fine.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-03-02 Thread Kuntal Ghosh
On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <and...@anarazel.de> wrote:
>> the resulting hash-values aren't actually meaningfully influenced by the
>> IV. Because we just xor with the IV, most hash-value that without the IV
>> would have fallen into a single hash-bucket, fall into a single
>> hash-bucket afterwards as well; just somewhere else in the hash-range.
>
> Wow, OK.  I had kind of assumed (without looking) that setting the
> hash IV did something a little more useful than that.  Maybe we should
> do something like struct blah { int iv; int hv; }; newhv =
> hash_any(, sizeof(blah)).
>
Sounds good. I've seen a post from Thomas Munro suggesting some
alternative approach for combining hash values in execGrouping.c[1].

>> In addition to that it seems quite worthwhile to provide an iterator
>> that's not vulnerable to this.  An approach that I am, seemingly
>> successfully, testing is to iterate the hashtable in multiple (in my
>> case 23, because why not) passes, accessing only every nth element. That
>> allows the data to be inserted in a lot less "dense" fashion.  But
>> that's more an optimization, so I'll just push something like the patch
>> mentioned in the thread already.
>>
>> Makes some sense?
>
> Yep.
>
Yes, it makes sense. Quadratic probing is another approach, but it
would require an extra shift op every time we want to find the next or
prev location during a collision.

[1] 
https://www.postgresql.org/message-id/caeepm%3d3rdgjfxw4ckvj0oemya2-34b0qhng1xv0vk7tgpjg...@mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-02-28 Thread Kuntal Ghosh
On Wed, Mar 1, 2017 at 10:53 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-03-01 10:47:45 +0530, Kuntal Ghosh wrote:
>> if (insertdist > curdist)
>> {
>> swap the entry to be inserted with the current entry.
>> Try to insert the current entry in the same logic.
>> }
>>
>> So, the second approach will not cause all the followers to be shifted by 1.
>
> How not? You'll have to do that game until you found a free slot.
You can skip increasing the curdist of some elements for which it is
already pretty high. Suppose, we've the following hash table and an
element e,
_,_,_,i,_,_,j,j+1,j+2,_,_
We want to insert e at i but there is a collision and we start doing
probing. At j, the condition if (insertdist > curdist) satisfies and
we'll swap e with the element at j. Now, we try to insert e( = element
at j) and so on. In this process, if the element at j+1,j+2 has
already high distance from their optimal location, you can skip those
element and go ahead. But, in the existing approach, we increase their
distance further. Thoughts?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-02-28 Thread Kuntal Ghosh
On Wed, Mar 1, 2017 at 9:38 AM, Andres Freund <and...@anarazel.de> wrote:
> Hi,
>
> On 2017-03-01 09:33:07 +0530, Kuntal Ghosh wrote:
>> On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <and...@anarazel.de> wrote:
>> >> So, I was looking for other alternatives and I've found one called
>> >> RobinHood hashing.
>> >
>> > simplehash.h implements robin hood hashing.
>
>> But, it doesn't implement the swapping idea, right?
>
> It does, that's the if (insertdist > curdist) block in SH_INSERT.
> Unless I misunderstand what you're proposing?
>
I think the idea of the existing implementation is:

if (insertdist > curdist)
{
find an empty space at the end of the cluster.
shift all the followers by 1 position to make space for the element to
be inserted.
insert the element at the current place.
}

What I'm trying to say is:

if (insertdist > curdist)
{
swap the entry to be inserted with the current entry.
Try to insert the current entry in the same logic.
}

So, the second approach will not cause all the followers to be shifted by 1.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-02-28 Thread Kuntal Ghosh
On Wed, Mar 1, 2017 at 9:33 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <and...@anarazel.de> wrote:
>> That's without the patch in
>> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
>> ? With that patch it should complete without that change, right?
>>
> No, it's with the patch in the above link which is
> 0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've
> kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do
> another round of testing with the constants provided by you.
>
I've tested with the patch
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch and the
results are same, i.e., hash table grows even when it is only 10-12%
filled. I've attached the logfile for reference.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


logfile
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-02-28 Thread Kuntal Ghosh
On Wed, Mar 1, 2017 at 9:19 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-03-01 09:13:15 +0530, Kuntal Ghosh wrote:
>> On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <and...@anarazel.de> wrote:
>> >> BTW, another option to consider is lowering the target fillfactor.
>> >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> >> issue.  Obviously, it's better to detect when we need a lower
>> >> fillfactor than to always use a lower one, but obviously the tighter
>> >> you pack the hash table, the more likely it is that you're going to
>> >> have these kinds of problems.
>> >
>> > Yea, that'd be one approach, but I feel better dynamically increasing
>> > the fillfactor like in the patch. Even with a lower fillfactor you could
>> > see issues, and as you say a higher fillfactor is nice [TM]; after the
>> > patch I played with *increasing* the fillfactor, without being able to
>> > measure a downside.
>
>> I've tested with different fill factors and the query got completed
>> with fill factor 0.6.
>
> That's without the patch in
> http://archives.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
> ? With that patch it should complete without that change, right?
>
No, it's with the patch in the above link which is
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch. But, I've
kept the SH_FILLFACTOR to 0.8 and SH_MAX_FILLFACTOR to 0.95. I'll do
another round of testing with the constants provided by you.

>> With linear probing, the performance degrades more quickly at high
>> fill factors because of primary clustering, a tendency for one
>> collision to cause more nearby collisions. So, increasing the fill
>> factor further doesn't seem to be a good idea.
>
> Well, the idea is increasing it, but *also* detecting clustering and
> adaptively resizing earlier.
>
>
>> So, I was looking for other alternatives and I've found one called
>> RobinHood hashing.
>
> simplehash.h implements robin hood hashing.
>
>
But, it doesn't implement the swapping idea, right?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in TPC-H Q18

2017-02-28 Thread Kuntal Ghosh
On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <and...@anarazel.de> wrote:
>> BTW, another option to consider is lowering the target fillfactor.
>> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> issue.  Obviously, it's better to detect when we need a lower
>> fillfactor than to always use a lower one, but obviously the tighter
>> you pack the hash table, the more likely it is that you're going to
>> have these kinds of problems.
>
> Yea, that'd be one approach, but I feel better dynamically increasing
> the fillfactor like in the patch. Even with a lower fillfactor you could
> see issues, and as you say a higher fillfactor is nice [TM]; after the
> patch I played with *increasing* the fillfactor, without being able to
> measure a downside.
I've tested with different fill factors and the query got completed
with fill factor 0.6.

With linear probing, the performance degrades more quickly at high
fill factors because of primary clustering, a tendency for one
collision to cause more nearby collisions. So, increasing the fill
factor further doesn't seem to be a good idea. Besides, when I've
tested with the patch, I've seen hash table grows even when the 10-12%
of the hash table is filled.

LOG:  size: 8388608, members: 845056, filled: 0.100739, total chain:
735723, max chain: 37, avg chain: 0.870620, total_collisions: 219101,
max_collisions: 6, avg_collisions: 0.259274
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;
LOG:  size: 16777216, members: 845056, filled: 0.050369, total chain:
197047, max chain: 6, avg chain: 0.233176, total_collisions: 121185,
max_collisions: 5, avg_collisions: 0.143405
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;
LOG:  size: 16777216, members: 2127649, filled: 0.126818, total chain:
1471327, max chain: 35, avg chain: 0.691527, total_collisions: 486310,
max_collisions: 6, avg_collisions: 0.228567
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;
LOG:  size: 33554432, members: 2127649, filled: 0.063409, total chain:
413073, max chain: 7, avg chain: 0.194145, total_collisions: 265766,
max_collisions: 5, avg_collisions: 0.124911
STATEMENT:  explain analyze select l_orderkey from lineitem group by l_orderkey;

This seems bad. We're wasting a lot of memory in the hash table.

Apart from that, I was thinking about the following optimization in the code.
/*
 * If the bucket is not empty, we either found a match (in which case
 * we're done), or we have to decide whether to skip over or move the
 * colliding entry. When the colliding element's distance to its
 * optimal position is smaller than the to-be-inserted entry's, we
 * shift the colliding entry (and its followers) forward by one.
 */
I'm worried about the fact that we are increasing the chain length of
each follower. It may happen that we've increased the chain length of
the last element by a huge factor.

So, I was looking for other alternatives and I've found one called
RobinHood hashing. In this approach, when you probe for a position to
insert a new element, if the probe length for the existing element is
less than the current probe length for the element being inserted,
swap the two elements and keep going. It leads to a low variance for
the chain lengths rather than the last one. Is this approach already
considered?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL Consistency checking for hash indexes

2017-02-27 Thread Kuntal Ghosh
Hello everyone,

I've attached a patch which implements WAL consistency checking for
hash indexes. This feature is going to be useful for developing and
testing of WAL logging for hash index.

Note that, this patch should be applied on top of the following WAL
logging for hash index patch:
https://www.postgresql.org/message-id/CAA4eK1%2B%2BP%2BjVZC7MC3xzw5uLCpva9%2BgsBpd3semuWffWAftr5Q%40mail.gmail.com

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-wal_consistency_checking-for-hash-index.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-02-27 Thread Kuntal Ghosh
On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
>>
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>
>
> Detecting correct WAL size based on the size of a random WAL file seems like
> a really bad idea to me.
+1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-02-24 Thread Kuntal Ghosh
On Fri, Feb 24, 2017 at 12:47 PM, Beena Emerson <memissemer...@gmail.com> wrote:
>
> Hello,
>
> The recent commit  c29aff959dc64f7321062e7f33d8c6ec23db53d has again changed
> the code and the second patch cannot be applied cleanly. Please find
> attached the rebased 02 patch. 01 patch is the same .
>
I've done an initial review of the patch. The objective of the patch
is to modify the wal-segsize as an initdb-time parameter instead of a
compile time parameter.

The patch introduces following three different techniques to expose
the XLogSize to different modules:

1. Directly read XLogSegSize from the control file
This is used by default, i.e., StartupXLOG() and looks good to me.

2. Run the SHOW wal_segment_size command to fetch and set the XLogSegSize

+   if (!RetrieveXLogSegSize(conn))
+   disconnect_and_exit(1);
+
You need the same logic in pg_receivewal.c as well.

3. Retrieve the XLogSegSize by reading the file size of WAL files
+   if (private.inpath != NULL)
+   sprintf(full_path, "%s/%s", private.inpath, fname);
+   else
+   strcpy(full_path, fname);
+
+   stat(full_path, );
+
+   if (!IsValidXLogSegSize(fst.st_size))
+   {
+   fprintf(stderr,
+   _("%s: file size %d is invalid \n"),
+   progname, (int) fst.st_size);
+
+   return EXIT_FAILURE;
+
+   }
+
+   XLogSegSize = (int) fst.st_size;
I see couple of issues with this approach:

* You should check the return value of stat() before going ahead.
Something like,
if (stat(filename, ) < 0)
error "file doesn't exist"

* You're considering any WAL file with a power of 2 as valid. Suppose,
the correct WAL seg size is 64mb. For some reason, the server
generated a 16mb invalid WAL file(maybe it crashed while creating the
WAL file). Your code seems to treat this as a valid file which I think
is incorrect. Do you agree with that?

Is it possible to unify these different techniques of reading
XLogSegSize in a generalized function with a proper documentation
describing the scope and limitations of each approach?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Performance degradation in TPC-H Q18

2017-02-21 Thread Kuntal Ghosh
 currently big one, in
+ * hash-table order.
+ *
+ * FIXME: compute boundaries in a more principled manner.
+ */

After applying the patch, TPC-H Q18 and the simplified query, both
completes in time. Although the patch works in this scenario, we need
a smarter way to trigger the hash expansion. The problem with the
patch is that it triggers a hash expansion even when the filled
percentage is pretty low, causing unnecessary memory consumption.

Thoughts?

[1] 
https://www.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Passing query string to workers

2017-02-20 Thread Kuntal Ghosh
On Mon, Feb 20, 2017 at 10:11 AM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> On Sun, Feb 19, 2017 at 10:11 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> +   query_data = (char *) palloc0(strlen(estate->es_queryString) + 1);
>> +   strcpy(query_data, estate->es_queryString);
>>
>> It's unnecessary to copy the query string here; you're going to use it
>> later on in the very same function.  That code can just refer to
>> estate->es_queryString directly.
>
>
> Done.
+   char   *query_data;
+   query_data = estate->es_sourceText;
estate->es_sourceText is a const char* variable. Assigning this const
pointer to a non-const pointer violates the rules
constant-correctness. So, either you should change query_data as const
char*, or as Robert suggested, you can directly use
estate->es_sourceText.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Passing query string to workers

2017-02-16 Thread Kuntal Ghosh
On Thu, Feb 16, 2017 at 5:47 PM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> Other that that I updated some comments and other cleanup things. Please
> find the attached patch for the revised version.
Looks good.

It has passed the regression tests (with and without regress mode).
Query is getting displayed for parallel workers in pg_stat_activity,
log statements and failed-query statements. Moved status to Ready for
committer.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Passing query string to workers

2017-02-16 Thread Kuntal Ghosh
On Sat, Feb 11, 2017 at 8:38 AM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
>
>>
>> Another question is don't we need to set debug_query_string in worker?
>
> In the updated version I am setting it in ParallelQueryMain.
>
Ahh, I missed that. debug_query_string is used to show the log
statements. Hence, it should be set.

> Please find the attached file for the revised version.
>
+   queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT);
+   debug_query_string = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT);
+   pgstat_report_activity(STATE_RUNNING, shm_toc_lookup(toc,
PARALLEL_KEY_QUERY_TEXT));
Just one lookup is sufficient.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-02-16 Thread Kuntal Ghosh
On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson <memissemer...@gmail.com> wrote:
>
> Hello,
> PFA the updated patches.
I've started reviewing the patches.
01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my
detailed review after looking into the second patch. But, both the
patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7
(Rename user-facing tools with "xlog" in the name to say "wal").

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-16 Thread Kuntal Ghosh
On Mon, Feb 13, 2017 at 1:01 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:

Thanks for the explanation. I've looked into the referred code. I'm
still in doubt. vacuumed_pages is incremented only when there are no
indexes, i.e. nindexes=0. Now, look at the following part in the
patch.

+   /*
+* Do post-vacuum cleanup and statistics update for each index if
+* the number of vacuumed page exceeds threshold.
+*/
+   cleanupidx_thresh = (float4) nblocks * vacuum_cleanup_index_scale;
+
+   elog(DEBUG3, "%s: vac: %d (threshold %0.f)",
+RelationGetRelationName(onerel), nblocks, cleanupidx_thresh);
+   if (vacuumed_pages >= cleanupidx_thresh)
+   {
+   for (i = 0; i < nindexes; i++)
+   lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+   }
So, unless vacuum_cleanup_index_scale_thresh is zero,
lazy_cleanup_index will never be called. IMO, this seems to be
incorrect. Besides, I've tested with non-zero(0.5)
vacuum_cleanup_index_scale_thresh and the regression tests for brin
and gin fails. (make installcheck)

+   {"vacuum_cleanup_index_scale_factor", PGC_USERSET,
CLIENT_CONN_STATEMENT,
+gettext_noop("Number of pages containing dead tuple
prior to vacuum as a fraction of relpages."),
+   NULL
+   },
+   _cleanup_index_scale,
+   0.0, 0.0, 100.0,
+   NULL, NULL, NULL
+   },
Maximum value for vacuum_cleanup_index_scale_factor should be 1
instead of 100. As the code indicates, it is certainly not treated as
a percentage fraction of relpages.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-02-15 Thread Kuntal Ghosh
On Mon, Feb 13, 2017 at 8:52 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> As discussed, attached are refactoring patches and a patch to enable
> WAL for the hash index on top of them.

0006-Enable-WAL-for-Hash-Indexes.patch needs to be rebased after
commit 8da9a226369e9ceec7cef1.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-02-15 Thread Kuntal Ghosh
Hello everyone,

As discussed in this thread, I've attached a set of patches to show
auxiliary processes, autovacuum launcher and bgworker along with other
backends in pg_stat_activity. For now, I've extended
BackendStatusArray to store auxiliary processes. Backends use slots
indexed in the range from 1 to MaxBackends (inclusive), so we use
MaxBackends + AuxProcType + 1 as the index of the slot for an
auxiliary process. However, BackendStatusArray should be renamed to
something like 'ProcStatusArray' along with many others in pgstat.c
and pgstatfuncs.c(Ex: LocalPgBackendStatus etc). But, that needs a
major code refactoring. I can do the changes if we agree with that.

I've also kept a local array, named localBackendStatusIndex, which
stores the index of currently active backends from BackendStatusArray.
It assigns ids to currently active backend from 1 to the number of
active backends.(It is required in some pgstat_* functions, for
example: pg_stat_get_backend_idset). Hence, we are not affecting the
outputs of other sql functions apart from pg_stat_activity and
pg_stat_get_activity.

I've also added an extra column, named proc_type (suggested by Craig
and Robert), to indicate the type of process in pg_stat_activity view.
proc_type includes:

* client backend
* autovacuum launcher
* wal sender
* bgworker
* writer
* checkpointer
* wal writer
* wal receiver

Here is the present output with the relevant columns. (Didn't show
backend_start since it takes too long space)

postgres=# select pid, usesysid, application_name, wait_event_type,
wait_event, state, proc_type from pg_stat_activity;
  pid   | usesysid |   application_name   | wait_event_type |
   wait_event  | state  |  proc_type
+--+--+-+-++-
 109945 |  |  | Activity|
AutoVacuumMain  | idle   | autovacuum launcher
 109947 |  | logical replication launcher | Activity|
LogicalLauncherMain | idle   | bgworker
 109962 |   10 | walreceiver  | Activity|
WalSenderMain   | idle   | wal sender
 109976 |   10 | psql | |
   | active | client backend
 109943 |  |  | Activity|
BgWriterMain| idle   | writer
 109942 |  |  | Activity|
CheckpointerMain| idle   | checkpointer
 109944 |  |  | Activity|
WalWriterMain   | idle   | wal writer
(7 rows)

Whereas, the output of other pgstat_* functions remains unchanged. For example,

postgres=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
   pg_stat_get_backend_activity(s.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
 procpid |   current_query
-+---
  120713 | SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,  +
 |pg_stat_get_backend_activity(s.backendid) AS current_query+
 | FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;


Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-non-backend-processes-in-pg_stat_get.patch
Description: application/download


0002-Expose-stats-for-auxiliary-processes-in-pg_stat_get_.patch
Description: application/download


0003-Expose-stats-for-autovacuum-launcher-and-bgworker.patch
Description: application/download


0004-Add-proc_type-column-in-pg_stat_get_activity.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-10 Thread Kuntal Ghosh
On Wed, Jan 4, 2017 at 1:51 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:

> Attached patch introduces new GUC parameter parameter
> vacuum_cleanup_index_scale_factor which specifies the fraction of the
> table pages containing dead tuple needed to trigger a cleaning up
> indexes. The default is 0.0, which means that the cleanup index is not
> invoked if no update on table. In other word, if table is completely
> frozen then lazy vacuum can skip the index scans as well. Increasing
> this value could reduce total time of lazy vacuum but the statistics
> and the free space map of index are not updated.
>
I was looking into your patch and trying to understand how the
following piece of code works.
+if (vacuumed_pages > cleanupidx_thresh)
+{
+for (i = 0; i < nindexes; i++)
+lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+}
So, you are skipping deletion of index entries if it does not reach
the clean-up index threshold. But, you are removing all dead tuples
from the heap pointed by the same index. Hence, index will contain
entries with invalid references. How does that work? How will you
remove those index entries later? (I'm a newbie.)

+This parameter can only be set anywhere.
Oxymoron. :-)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Passing query string to workers

2017-02-10 Thread Kuntal Ghosh
On Tue, Feb 7, 2017 at 10:19 AM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> Thanks a lot Kuntal for the review, please find attached patch for the
> revised version.
Few comments on the patch:

There are some spacing issues in the code. For example,
+   estate->es_queryString = (char
*)palloc0(strlen(queryDesc->sourceText) + 1);
+   /*Estimate space for query text. */
pgindent might be helpful to track all such changes.

+#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE010)
I'm uncomfortable with declaring the same macro in two
files(parallel.c, execParallel.c). My suggestion would be to move
pgstat_report_activity in ParallelQueryMain instead of
ParallelWorkerMain. Then, you can remove the macro definition from
parallel.c. Thoughts?

And, the value of the macro seems pretty random to me. IMO, it should
be UINT64CONST(0xE007).

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-08 Thread Kuntal Ghosh
On Thu, Feb 9, 2017 at 2:26 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> 
> wrote:
>> Thank you Robert for the review. Please find the updated patch in the
>> attachment.
>
> I have committed this patch after fairly extensive revisions:
>
Thank you, Robert, for the above corrections and commit. Thanks to
Michael Paquier, Peter Eisentraut, Amit Kapila, Álvaro Herrera, and
Simon Riggs for taking their time to complete the patch. It was a
great learning experience for me.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-07 Thread Kuntal Ghosh
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>> I've attached the patch with the modified changes. PFA.
>
> Can this patch check contrib/bloom?
>
Added support for generic resource manager type.

> +/*
> + * Mask some line pointer bits, particularly those marked as
> + * used on a master and unused on a standby.
> + */
>
> Comment doesn't explain why we need to do this.
>
Added comments.

> +/*
> + * For GIN_DELETED page, the page is initialized to empty.
> + * Hence mask everything.
> + */
> +if (opaque->flags & GIN_DELETED)
> +memset(page_norm, MASK_MARKER, BLCKSZ);
> +else
> +mask_unused_space(page_norm);
>
> If the page is initialized to empty, why do we need to mask
> anything/everything?  Anyway, it doesn't seem right to mask the
> GinPageOpaque structure itself. Or at least it doesn't seem right to
> mask the flags word.
>
Modified accordingly.

>
> +if (!HeapTupleHeaderXminFrozen(page_htup))
> +page_htup->t_infomask |= HEAP_XACT_MASK;
> +else
> +page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
> HEAP_XMAX_INVALID;
>
> Comment doesn't address this logic.  Also, the "else" case shouldn't
> exist at all, I think.
Added comments. I think that "Else" part is needed for xmax.

>
> +/*
> + * For a speculative tuple, the content of t_ctid is conflicting
> + * between the backup page and current page. Hence, we set it
> + * to the current block number and current offset.
> + */
>
> Why does it differ?  Is that a bug?
>
Added comments.

>
> +/*
> + * During replay of a btree page split, we don't set the BTP_SPLIT_END
> + * flag of the right sibling and initialize th cycle_id to 0 for the same
> + * page.
> + */
>
> A reference to the name of the redo function might be helpful here
> (and in some other places), unless it's just ${AMNAME}_redo.  "th" is
> a typo for "the".
Corrected.

> +appendStringInfoString(buf, " (full page
> image, apply)");
> +else
> +appendStringInfoString(buf, " (full page image)");
>
> How about "(full page image)" and "(full page image, for WAL verification)"?
>
> Similarly in XLogDumpDisplayRecord, I think we should assume that
> "FPW" means what it has always meant, and leave that output alone.
> Instead, distinguish the WAL-consistency-checking case when it
> happens, e.g. "(FPW for consistency check)".
>
Corrected.

> +checkConsistency(XLogReaderState *record)
>
> How about CheckXLogConsistency()?
>
> + * If needs_backup is true or wal consistency check is enabled for
>
> ...or WAL checking is enabled...
>
> + * If WAL consistency is enabled for the resource manager of
>
> If WAL consistency checking is enabled...
>
> + * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag
>
> with the BKPIMAGE_APPLY flag
Modified accordingly.

>
> - * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended
> - * with all-zeroes pages up to the referenced block number.  In
> - * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
> + * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag
> + * is not set for the backup block, the relation is extended with all-zeroes
> + * pages up to the referenced block number.
>
> OK, I'm puzzled by this.  Surely we don't want the WAL consistency
> checking facility to cause the relation to be extended like this.  And
> I don't see why it would, because the WAL consistency checking happens
> after the page changes have already been applied.  I wonder if this
> hunk is in error and should be dropped.
>
Dropped comments.

> +PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX);
> +phdr->pd_prune_xid = PG_UINT32_MAX;
> +phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES;
> +phdr->pd_flags |= PD_ALL_VISIBLE;
> +#define MASK_MARKER0xFF
> (and many others)
>
> Why do we mask by setting bits rather than clearing bits?  My
> intuition would have been to zero things we want to mask, rather than
> setting them to one.
>
Modified accordingly so that we can zero things during masking instead
of setting them to one.

> +{
> +newwalconsistency[i] = tru

Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Wed, Feb 1, 2017 at 8:01 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas <robertmh...@gmail.com> wrote:

>> +/*
>> + * Leave if no masking functions defined, this is possible in the case
>> + * resource managers generating just full page writes, comparing an
>> + * image to itself has no meaning in those cases.
>> + */
>> +if (RmgrTable[rmid].rm_mask == NULL)
>> +return;
>>
>> ...and also...
>>
>> +/*
>> + * This feature is enabled only for the resource managers where
>> + * a masking function is defined.
>> + */
>> +for (i = 0; i <= RM_MAX_ID; i++)
>> +{
>> +if (RmgrTable[i].rm_mask != NULL)
>>
>> Why do we assume that the feature is only enabled for RMs that have a
>> mask function?  Why not instead assume that if there's no masking
>> function, no masking is required?
>
> Not all RMs work on full pages. Tracking in smgr.h the list of RMs
> that are no-ops when masking things is easier than having empty
> routines declared all over the code base. Don't you think so>
>
Robert's suggestion surely makes the approach more general. But,  the
existing approach makes it easier to decide the RM's for which we
support the consistency check facility. Surely, we can use a list to
track the RMs which should (/not) be masked. But, I think we always
have to mask the lsn of the pages at least. Isn't it?

>> +void(*rm_mask) (char *page, BlockNumber blkno);
>>
>> Could the page be passed as type "Page" rather than a "char *" to make
>> things more convenient for the masking functions?  If not, could those
>> functions at least do something like "Page page = (Page) pagebytes;"
>> rather than "Page page_norm = (Page) page;"?
>
> xlog_internal.h is used as well by frontends, this makes the header
> dependencies cleaner. (I have looked at using Page when hacking this
> stuff, but the header dependencies are not worth it, I don't recall
> all the details though this was a couple of months back).
+ 1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >