Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Jignesh Shah
On Mon, Sep 25, 2017 at 11:37 AM, Joshua D. Drake 
wrote:

> On 09/25/2017 11:31 AM, Alvaro Hernandez wrote:
>
>>
>>
> Whether or not they are included in a managed environment is generally
>>> based on two things:
>>>
>>> 1. Safety (why RDS doesn't allow certain C extensions)
>>> 2. Community/Popularity (Exactly why RDS has PostGIS)
>>> A. Demand with a prerequisite of #1
>>>
>>
>>  This is very clear. Now tell me: how many output plugins do you see
>> included in RDS. And in GCP's PostgreSQL? Azure Postgres? Heroku?
>>
>
> From RDS:
>
> Logical Replication for PostgreSQL on Amazon RDS
>
> Beginning with PostgreSQL version 9.4, PostgreSQL supports the streaming
> of WAL changes using logical replication slots. Amazon RDS supports logical
> replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4
> and higher. Using logical replication, you can set up logical replication
> slots on your instance and stream database changes through these slots to a
> client like pg_recvlogical. Logical slots are created at the database level
> and support replication connections to a single database.
>
> PostgreSQL logical replication on Amazon RDS is enabled by a new
> parameter, a new replication connection type, and a new security role. The
> client for the replication can be any client that is capable of
> establishing a replication connection to a database on a PostgreSQL DB
> instance.
>
> The most common clients for PostgreSQL logical replication are AWS
> Database Migration Service or a custom-managed host on an AWS EC2 instance.
> The logical replication slot knows nothing about the receiver of the
> stream; there is no requirement that the target be a replica database. Note
> that if you set up a logical replication slot and do not read from the
> slot, data can be written to your DB instance's storage and you can quickly
> fill up the storage on your instance.
>
> """
>
> I don't see why others wouldn't be available either. In fact, I am not
> sure why you couldn't use the JSON ones now. (Although I have not tested
> it).
>
> JD
>
>
>
>

Also to add, Amazon RDS for PostgreSQL does supports non-core plugins.
Wal2json output plugin for logical decoding is supported for versions 9.6.3+
and 9.5.7+  (link
)
.

Regards,
Jignesh


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-06 Thread Jignesh Shah
On Wed, Apr 4, 2012 at 7:06 PM, Josh Berkus j...@agliodbs.com wrote:
 On 4/4/12 4:02 PM, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
 On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Why is this pgbench run accessing so much unhinted data that is  1
 million transactions old? Do you believe those numbers? Looks weird.

 I think this is in the nature of the workload pgbench does. Because
 the updates are uniformly distributed, not concentrated 90% in 10% of
 the buffers like most real-world systems, (and I believe pgbench only
 does index lookups) the second time a tuple is looked at is going to
 average N/2 transactions later where N is the number of tuples.

 That's a good point, and it makes me wonder whether pgbench is the right
 test case to be micro-optimizing around.  It would be a good idea to at
 least compare the numbers for something with more locality of reference.

 Jignesh, would DVDstore help for this?


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com

I will try this out next week.. I am scrounging for decent hardware
and I think I found one.. though I will get access to it on Monday.

The way you could do locality in DVDStore is do the build with say
10GB size but while running the actual run just mention the size as
1GB and that should get you the 10% active population scenario.


Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Double-write with Fast Checksums

2012-01-18 Thread Jignesh Shah
              9.2 + DW patch
              ---
              FPW off  FPW on  DW on/FPW off
              CK on    CK on   CK on
 one disk:     11078   10394    3296  [1G shared_buffers, 8G RAM]
 sep log disk: 13605   12015    3412

 one disk:      7731    6613    2670  [1G shared_buffers, 2G RAM]
 sep log disk:  6752    6129    2722



On my single Hard disk test with write cache turned off I see
different results than what Dan sees..
DBT2 50-warehouse, 1hr steady state with shared_buffers 1G,
checkpoint_segments=128 as common settings on 8GB RAM)
(checkpoints were on for all cases) with 8 Core .

FPW off: 3942.25 NOTPM
FPW on: 3613.37 NOTPM
DW on : 3479.15  NOTPM

I retried it with 2 core also and get similar results. So high
evictions does have slighly higher penalty than FPW.

My run somehow did not collect the background writer stats so dont
have that comparison for these runs but have fixed it for the next
runs.

Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 4:00 AM, Jesper Krogh jes...@krogh.cc wrote:
 On 2011-12-22 09:42, Florian Weimer wrote:

 * David Fetter:

 The issue is that double writes needs a checksum to work by itself,
 and page checksums more broadly work better when there are double
 writes, obviating the need to have full_page_writes on.

 How desirable is it to disable full_page_writes?  Doesn't it cut down
 recovery time significantly because it avoids read-modify-write cycles
 with a cold cache

 What is the downsides of having full_page_writes enabled .. except from
 log-volume? The manual mentions something about speed, but it is
 a bit unclear where that would come from, since the full pages must
 be somewhere in memory when being worked on anyway,.



I thought I will share some of my perspective on this checksum +
doublewrite from a performance point of view.

Currently from what I see in our tests based on dbt2, DVDStore, etc
is that checksum does not impact scalability or total throughput
measured. It does increase CPU cycles depending on the algorithm used
by not really anything that causes problems. The Doublewrite change
will be the big win to performance compared to full_page_write.  For
example compared to other databases our WAL traffic is one of the
highest. Most of it is attributed to full_page_write. The reason
full_page_write is necessary in production (atleast without worrying
about replication impact) is that if a write fails, we can recover
that whole page from WAL Logs as it is and just put it back out there.
(In fact I believe thats the recovery does.) However the net impact is
during high OLTP the runtime impact on WAL is high due to the high
traffic and compared to other databases due to the higher traffic, the
utilization is high. Also this has a huge impact on transaction
response time the first time a page is changed which in all OLTP
environments it is huge because by nature the transactions are all on
random pages.

When we use Doublewrite with checksums, we can safely disable
full_page_write causing a HUGE reduction to the WAL traffic without
loss of reliatbility due to a write fault since there are two writes
always. (Implementation detail discussable). Since the double writes
itself are sequential bundling multiple such writes further reduces
the write time. The biggest improvement is that now these writes are
not done during TRANSACTION COMMIT but during CHECKPOINT WRITES which
improves performance drastically for OLTP application's transaction
performance  and you still get the reliability that is needed.

Typically  Performance in terms of throughput tps system is like
tps(Full_page Write)  tps (no full page write)
With the double write and CRC we see
tps (Full_page_write)  tps (Doublewrite)  tps(no full page
write)
Which is a big win for production systems to get the reliability of
full_page write.

Also the side effect for response times is that they are more leveled
unlike full page write where the response time varies like  0.5ms to
5ms depending on whether the same transaction needs to write a full
page onto WAL or not.  With doublewrite it can always be around 0.5ms
rather than have a huge deviation on transaction performance. With
this folks measuring the 90 %ile  response time will see a huge relief
on trying to meet their SLAs.

Also from WAL perspective, I like to put the WAL on its own
LUN/spindle/VMDK etc .. The net result that I have with the reduced
WAL traffic, my utilization drops which means the same hardware can
now handle higher WAL traffic in terms of IOPS resulting that WAL
itself becomes lesser of a bottleneck. Typically this is observed by
the reduction in response times of the transactions and increase in
tps till some other bottleneck becomes the gating factor.

So overall this is a big win.

Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 11:16 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Jignesh Shah jks...@gmail.com wrote:

 When we use Doublewrite with checksums, we can safely disable
 full_page_write causing a HUGE reduction to the WAL traffic
 without loss of reliatbility due to a write fault since there are
 two writes always. (Implementation detail discussable).

 The always there surprised me.  It seemed to me that we only need
 to do the double-write where we currently do full page writes or
 unlogged writes.  In thinking about your message, it finally struck

Currently PG only does full page write for the first change that makes
the dirty after a checkpoint. This scheme works when all changes are
relative to that first page so when checkpoint write fails then it can
recreate the page by using the full page write + all the delta changes
from WAL.

In the double write implementation, every checkpoint write is double
writed, so if the first doublewrite page write fails then then
original page is not corrupted and if the second write to the actual
datapage fails, then one can recover it from the earlier write. Now
while it seems that there are 2X double writes during checkpoint is
true. I can argue that there are the same 2 X writes right now except
1X of the write goes to WAL DURING TRANSACTION COMMIT.  Also since
doublewrite is generally written in its own file it is essentially
sequential so it doesnt have the same write latencies as the actual
checkpoint write. So if you look at the net amount of the writes it is
the same. For unlogged tables even if you do doublewrite it is not
much of a penalty while that may not be logging before in the WAL.  By
doing the double write for it, it is still safe and gives resilience
for those tables to it eventhough it is not required. The net result
is that the underlying page is never irrecoverable due to failed
writes.


 me that this might require a WAL record to be written with the
 checksum (or CRC; whatever we use).  Still, writing a WAL record
 with a CRC prior to the page write would be less data than the full
 page.  Doing double-writes instead for situations without the torn
 page risk seems likely to be a net performance loss, although I have
 no benchmarks to back that up (not having a double-write
 implementation to test).  And if we can get correct behavior without
 doing either (the checksum WAL record or the double-write), that's
 got to be a clear win.

I am not sure why would one want to write the checksum to WAL.
As for the double writes, infact there is not a net loss because
(a) the writes to the doublewrite area is sequential the writes calls
are relatively very fast and infact does not cause any latency
increase to any transactions unlike full_page_write.
(b) It can be moved to a different location to have no stress on the
default tablespace if you are worried about that spindle handling 2X
writes which is mitigated in full_page_writes if you move pg_xlogs to
different spindle

and my own tests supports that the net result is almost as fast as
full_page_write=off  but not the same due to the extra write  (which
gives you the desired reliability) but way better than
full_page_write=on.


Regards,
Jignesh






 -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote:
 In the double write implementation, every checkpoint write is double
 writed,

 Unless I'm quite thoroughly confused, which is possible, the double
 write will need to happen the first time a buffer is written following
 each checkpoint.  Which might mean the next checkpoint, but it could
 also be sooner if the background writer kicks in, or in the worst case
 a buffer has to do its own write.



Logically the double write happens for every checkpoint write and it
gets fsynced.. Implementation wise you can do a chunk of those pages
like we do in sets of pages and sync them once and yes it still
performs better than full_page_write. As long as you compare with
full_page_write=on, the scheme is always much better. If you compare
it with performance of full_page_write=off it is slightly less but
then you lose the the reliability. So for performance testers like me
who always turn off  full_page_write anyway during my benchmark run
will not see any impact. However for folks in production who are
rightly scared to turn off full_page_write will have an ability to
increase performance without being scared on failed writes.

 Furthermore, we can't *actually* write any pages until they are
 written *and fsync'd* to the double-write buffer.  So the penalty for
 the background writer failing to do the right thing is going to go up
 enormously.  Think about VACUUM or COPY IN, using a ring buffer and
 kicking out its own pages.  Every time it evicts a page, it is going
 to have to doublewrite the buffer, fsync it, and then write it for
 real.  That is going to make PostgreSQL 6.5 look like a speed demon.

Like I said implementation detail wise it depends on how many such
pages do you sync simultaneously and the real tests prove that it is
actually much faster than one expects.

 The background writer or checkpointer can conceivably dump a bunch of
 pages into the doublewrite area and then fsync the whole thing in
 bulk, but a backend that needs to evict a page only wants one page, so
 it's pretty much screwed.


Generally what point you pay the penalty is a trade off.. I would
argue that you are making me pay for the full page write for my first
transaction commit  that changes the page which I can never avoid and
the result is I get a transaction response time that is unacceptable
since the deviation of a similar transaction which modifies the page
already made dirty is lot less. However I can avoid page evictions if
I select a bigger bufferpool (not necessarily that I want to do that
but I have a choice without losing reliability).

Regards,
Jignesh



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-07 Thread Jignesh Shah
On Mon, Jun 6, 2011 at 11:20 PM, Jignesh Shah jks...@gmail.com wrote:


 Okay I tried it out with sysbench read scaling test..
 Note I had tried that earlier on 9.0
 http://jkshah.blogspot.com/2010/11/postgresql-90-simple-select-scaling.html

 And on that test I found that doing that test on anything bigger than
 4 cores lead to decreased performance ..
 Redoing the same test with 100 users on 4 vCPU Virtual Machine with
 8GB with 1M rows I get
   transactions:                        17870082 (59566.46 per sec.)
 which is inline with the best number on 9.0.
 This test hardly had any idle CPUs.

 However where it made a huge impact was doing the same test on my 8
 vCPU VM with 8GB RAM I get
    transactions:                        33274594 (110914.85 per sec.)

 which is a whopping 1.8x scaling for 2x scaling (from 4 to 8 vCPU)..
 My idle cpu was less than 7% which when taken into consideration that
 the useful work is line with my expectations is really impressive..
 (And plus the last time I did MySQL they were around 95K or so for the
 same test).


 Next step DBT-2..



I tried with a warehouse size of 50 all cached in memory and my
initial tests with DBT-2 using 8 vCPU does not show any major changes
for a quick 10 minute run. I did eliminate write bottlenecks for this
test so as to stress on locks (using full_page_writes=off,
synchronous_commit=off, etc). I also have a large enough bufferpool to
fit the all 50 warehouse DB in memory

Without patch  score:  29088 NOTPM
With patch patch score:  30161 NOTPM

It could be that I have other problems in the setup..One of the things
I noticed is that there are too many Idle in Connections being
reported which tells me something else is becoming a bottleneck here
:-) I also tested with multiple clients but similar results..  both
postgresql shows multiple idle in transaction and fetch in waiting
while the clients show waiting in SocketCheck.. like shown below for
example.

#0  0x7fc4e83a43c6 in poll () from /lib64/libc.so.6
#1  0x7fc4e8abd61a in pqSocketCheck ()
#2  0x7fc4e8abd730 in pqWaitTimed ()
#3  0x7fc4e8abc215 in PQgetResult ()
#4  0x7fc4e8abc398 in PQexecFinish ()
#5  0x004050e1 in execute_new_order ()
#6  0x0040374f in process_transaction ()
#7  0x00403519 in db_worker ()


So yes for DBT2 I think this is inconclusive since there still could
be other bottlenecks in play..  (Networking included)
But overall yes I like the sysbench read scaling numbers quite a bit..


Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Jignesh Shah
On Mon, Jun 6, 2011 at 2:49 PM, Josh Berkus j...@agliodbs.com wrote:

 That's an improvement of about ~3.5x.  According to the vmstat output,
 when running without the patch, the CPU state was about 40% idle.
 With the patch, it dropped down to around 6%.

 Wow!  That's fantastic.

 Jignesh, are you in a position to test any of Robert's work using DBT or
 other benchmarks?

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



I missed the discussion. Can you send me the patch (will that work
with 9.1 beta?)? I can do a before and after with DBT2 and let you
know.
And also test it with sysbench read test  which also has a relation
locking bottleneck.

Thanks.

Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Jignesh Shah
On Mon, Jun 6, 2011 at 2:49 PM, Josh Berkus j...@agliodbs.com wrote:

 That's an improvement of about ~3.5x.  According to the vmstat output,
 when running without the patch, the CPU state was about 40% idle.
 With the patch, it dropped down to around 6%.

 Wow!  That's fantastic.

 Jignesh, are you in a position to test any of Robert's work using DBT or
 other benchmarks?

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Okay I tried it out with sysbench read scaling test..
Note I had tried that earlier on 9.0
http://jkshah.blogspot.com/2010/11/postgresql-90-simple-select-scaling.html

And on that test I found that doing that test on anything bigger than
4 cores lead to decreased performance ..
Redoing the same test with 100 users on 4 vCPU Virtual Machine with
8GB with 1M rows I get
   transactions:17870082 (59566.46 per sec.)
which is inline with the best number on 9.0.
This test hardly had any idle CPUs.

However where it made a huge impact was doing the same test on my 8
vCPU VM with 8GB RAM I get
transactions:33274594 (110914.85 per sec.)

which is a whopping 1.8x scaling for 2x scaling (from 4 to 8 vCPU)..
My idle cpu was less than 7% which when taken into consideration that
the useful work is line with my expectations is really impressive..
(And plus the last time I did MySQL they were around 95K or so for the
same test).

Also note that in my earlier case 60K was the max irrespective of the
hardware I threw at it.. For this fastlock patch that does not seem to
be the problem anymore :-)

This gain is impressive..

Next step DBT-2..

Regards,
Jignesh

Next step

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] possible concurrency bug or mistake in understanding read-committed behavior

2010-11-16 Thread Jignesh Shah
Hello  All,


I am recently using sysbench with PostgreSQL 9.0 and 8.4.5 and doing
some tests on 8core systems with SSDs.

I seem to be hitting some problems with the read-write tests and
hoping to see if it is a possible concurrency bug or expected
behavior.


Using sysbench with 1M rows and 80+ threads with

 --oltp-dist-type=special --oltp-table-size=100
--oltp-read-only=off --oltp-test-mode=complex --max-requests=0
--max-time=300 --num-threads=80 --test=oltp

Causes :

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 80

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are
returned in 75 pct cases)
Using BEGIN for starting transactions
Using auto_inc on the id column
Threads started!
FATAL: query execution failed: 8386864
FATAL: database error, exiting...
Done.


With error reported in pg_log:
ERROR:  duplicate key value violates unique constraint sbtest_pkey


The way sysbench works that in a transaction it deletes a row and
inserts the same row back. So in a transaction logic it should not hit
this error since it just deleted the row.
What's happening is the timing with transaction on some other process
working with the same id.  The error is pretty easily reproduced.

Here are some condensed logs with respect to the two processes involved


 445373 tm:2010-11-16 16:06:55.526 UTC db:sbtest pid:14244 LOG:
duration: 0.004 ms execute sbstmt13377649542683423: BEGIN
 441773 tm:2010-11-16 16:06:55.527 UTC db:sbtest pid:14307 LOG:
duration: 0.003 ms execute sbstmt899357721350501620: BEGIN

...other statements..removed...

 445440 tm:2010-11-16 16:06:55.829 UTC db:sbtest pid:14244 LOG:
duration: 16.683 ms execute sbstmt610923974845906481: UPDATE sbtest
set k=k+1 where id=$1
 445441 tm:2010-11-16 16:06:55.829 UTC db:sbtest pid:14244 DETAIL:
parameters: $1 = '500865'
 445444 tm:2010-11-16 16:06:55.830 UTC db:sbtest pid:14244 LOG:
duration: 0.058 ms execute sbstmt81255943842609644: DELETE from sbtest
where id=$1
 445445 tm:2010-11-16 16:06:55.830 UTC db:sbtest pid:14244 DETAIL:
parameters: $1 = '500865'

 441840 tm:2010-11-16 16:06:55.831 UTC db:sbtest pid:14307 LOG:
duration: 17.812 ms execute sbstmt375950798157093453: UPDATE sbtest
set k=k+1 where id=$1
 441841 tm:2010-11-16 16:06:55.831 UTC db:sbtest pid:14307 DETAIL:
parameters: $1 = '500865'

 445448 tm:2010-11-16 16:06:55.831 UTC db:sbtest pid:14244 LOG:
duration: 0.084 ms execute sbstmt388529267384136315: INSERT INTO
sbtest values($1,0,'
','aaffrreeyy')
 445449 tm:2010-11-16 16:06:55.831 UTC db:sbtest pid:14244 DETAIL:
parameters: $1 = '500865'
 445451 tm:2010-11-16 16:06:55.880 UTC db:sbtest pid:14244 LOG:
duration: 48.647 ms execute sbstmt271576009267161387: COMMIT

 441844 tm:2010-11-16 16:06:55.880 UTC db:sbtest pid:14307 LOG:
duration: 48.889 ms execute sbstmt718364989767970105: DELETE from
sbtest where id=$1
 441845 tm:2010-11-16 16:06:55.880 UTC db:sbtest pid:14307 DETAIL:
parameters: $1 = '500865'

 445452 tm:2010-11-16 16:06:55.881 UTC db:sbtest pid:14244 LOG:
duration: 0.021 ms bind sbstmt13377649542683423: BEGIN
 445453 tm:2010-11-16 16:06:55.881 UTC db:sbtest pid:14244 LOG:
duration: 0.005 ms execute sbstmt13377649542683423: BEGIN

 441846 tm:2010-11-16 16:06:55.881 UTC db:sbtest pid:14307 LOG:
duration: 0.036 ms bind sbstmt16151529281767205: INSERT INTO sbtest
values($1,0,' ','aaffrreeyy')
 441847 tm:2010-11-16 16:06:55.882 UTC db:sbtest pid:14307 DETAIL:
parameters: $1 = '500865'
 441848 tm:2010-11-16 16:06:55.882 UTC db:sbtest pid:14307 ERROR:
duplicate key value violates unique constraint sbtest_pkey
 441849 tm:2010-11-16 16:06:55.882 UTC db:sbtest pid:14307 DETAIL: Key
(id)=(500865) already exists.
 441850 tm:2010-11-16 16:06:55.882 UTC db:sbtest pid:14307 STATEMENT:
INSERT INTO sbtest values($1,0,'
','aaffrreeyy')

The logs are condensed so please retry it with sysbench to get all
logs related to 80+ processes if required



So reducing the above  I get in Read-Committed Isolation

BEGIN
BEGIN
DELETE row;

INSERT row;
COMMIT;
DELETE row;
INSERT row;
COMMIT;

If I read this right if that is what is happening here.. then (a)
since the COMMIT happens before DELETE then DELETE should have deleted
that row again and insert should have been successful..

SO I tried this manually with two psql sessions;
BEGIN;
   BEGIN;
   SELECT xmin,xmax,id from sbtest where id=500815;
   xmin | xmax | id
 -+--+
  2566416 | 0 | 500815
(1 row)
select xmin,xmax,id from sbtest where id=500815;
  xmin | xmax | id
 -+--+
 2576468 | 0 | 500815
(1 row)


DELETE FROM sbtest WHERE id=500815;
DELETE 1
select xmin,xmax,id from sbtest where id=500815;
  xmin | xmax | id
-+-+
 2576468 | 2576518 | 500815
(1 

[HACKERS] Re: possible concurrency bug or mistake in understanding read-committed behavior

2010-11-16 Thread Jignesh Shah
Actually cutting down my mail to something more readable..

Lets consider two transactions

BEGIN;
BEGIN;
DELETE FROM sbtest WHERE id=500815;
INSERT INTO sbtest
values(500815,0,'','aaffrreeyy');
DELETE FROM sbtest WHERE id=500815;   --- hangs/waits
END;
COMMIT
   DELETE 0 – returns success but doesn’t delete any rows . It
doesn't roll back the transaction
  INSERT INTO sbtest
values(500815,0,'','aaffrreeyy');
 ERROR: duplicate key value violates unique constraint sbtest_pkey
 END;
 ROLLBACK

The question is should the delete fail if it doesn't exist and cause a
rollback or succeed with DELETE 0 ?

The other way I tried is I tried doing the DELETE before insert in the
first one and same results.. So irrespective of how it does things if
DELETE starts before the first transaction commits, then the answer is
ERROR: duplicate key
What happening is while DELETE tries to delete the old row and INSERT
then tries to compete against the new row..
Not sure which way should be right in read-committed.
What’s the official READ-COMMITTED semantics for this?

Thanks.
Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Performance of Parser?

2007-01-13 Thread Jignesh Shah

Hello All,

I am using the latest 8.2 source that I compiled with Sun Studio 11 and 
tested it on Solaris 10 11/06 against an app server. I find that the CPU 
utilization was higher than I expected and started digging through it.


Aparently the top CPU usage comes from the following stack trace which 
is roughly about 10-15% of the total the postgresql uses.


Anyway a real developer might make more sense from this than I can


 libc_psr.so.1`memcpy+0x524
 postgres`SearchCatCache+0x24
 postgres`getBaseType+0x2c
 postgres`find_coercion_pathway+0x14
 postgres`can_coerce_type+0x58
 postgres`func_match_argtypes+0x24
 postgres`oper_select_candidate+0x14
 postgres`make_op+0x1a8
 postgres`transformAExprAnd+0x1c
 postgres`transformWhereClause+0x18
 postgres`transformUpdateStmt+0x94
 postgres`transformStmt+0x1dc
 postgres`do_parse_analyze+0x18
 postgres`parse_analyze_varparams+0x30
 postgres`exec_parse_message+0x2fc
 postgres`PostgresMain+0x117c
 postgres`BackendRun+0x248
 postgres`BackendStartup+0xf4
 postgres`ServerLoop+0x4c8
 postgres`PostmasterMain+0xca0


FUNCTIONCOUNT   PCNT
postgres`can_coerce_type1   0.1%
postgres`find_coercion_pathway 11   0.9%
postgres`SearchCatCache43   3.4%
libc_psr.so.1`memcpy  136  10.6%

The appserver is basically using bunch of prepared statements that the 
server should be executing directly without doing the parsing again. 
Since it is the parser module  that invokes the catalog search, does 
anybody know how to improve the can_coerce_type function in order to 
reduce the similar comparisions again and again for same type of statements.


I also wanted to check if postgresql stores prepared statements at the 
server level or does it parse each incoming  prepared statement again?


Any insight will help here in understanding what it is attempting to do 
and what can be the possible workarounds.


Regards,
Jignesh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq