Re: [GENERAL] Stats update difference between VACUUM ANALYZE and ANALYZE in 9.2?
On Monday, September 26, 2016 9:44 AM, Tom Lane wrote: >> Paul Jones writes: >> For a freshly pg_restore'd 9.2 database, would VACUUM ANALYZE update >> statistics any better than just an ANALYZE? > > VACUUM would have caused the page-all-visible flags to get set for all > pages of unchanging tables. I don't recall whether ANALYZE has any side > effects on those flags at all, but it certainly would not have set them > for pages it didn't even visit, which would be most. > > Net result is that the pg_class.relallvisible fractions didn't get high > enough to persuade the planner that index-only scans would be effective. > I guess you could call that a statistic, but it's really about the > contents of the tables' free space maps. > > regards, tom lane This is good to know. I think we will be running VACUUM ANALYZE from now on after restore instead of just ANALYZE. I do note that sect. 49.11 claims that ANALYZE updates pg_class.relallvisible. I don't know if this is a documentation problem in light of what you explained. PJ
Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?
On Tuesday, July 19, 2016 6:19 AM, Teodor Sigaev wrote: > CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops); > Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=1 width=1261) > (actual time=2157.118..1259550.327 rows=909091 loops=1) > Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) > Rows Removed by Index Recheck: 4360296 > Heap Blocks: exact=37031 lossy=872059 Hmm, looks like too small work_mem because lossy heap block count is too big. Ok, thanks. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?
On Monday, July 18, 2016 10:14 PM, Kisung Kim wrote: Hi,I recently test YCSB benchmark too.But contrary to my expectation, PG (9.5) is slower than MongoDB 3.2.Paul said that making table with no logging option improved the performance,and it might be equal to MongoDB's behavior.But in MongoDB documentation, it writes journal log too.So I think turning off no logging option is not fair.Am I wrong about MongoDB's behavior? My understanding is that, even with Mongo journaling, it is not as reliable as Postgres. So, I felt that using unloggedtables leveled the playing field for Postgres. PJ (C)Bitnine, Kisung Kim, Ph.D https://sites.google.com/site/kisungresearch/ E-mail : ks...@bitnine.net Office phone : 070-4800-5890, 408-606-8602 US Mobile phone : 408-805-2192 2016-03-19 5:05 GMT+09:00 : On Tuesday, March 15, 2016 7:39 PM, "p...@cmicdo.com" wrote: > Your results are close enough to mine, I think, to prove the point. > And, I agree that the EDB benchmark is not necessary reflective of a > real-world scenario. > > However, the cache I'm referring to is PG's shared_buffer cache. > You can see the first run of the select causing a lot of disk reads. > The second identical run, reads purely from shared_buffers. > > What I don't understand is, why does a slightly different select from > the *same* table during the same session cause shared_buffers to be > blown out and re-read?? > > I will see if I can try YCSB next week (I'm in workshops all week...) > > Thanks! I was able to try YCSB today on both PG 9.5.1 and Mongo 3.2. At first, PG was running 4 times slower than Mongo. Then I remembered about unlogged tables (which I think is the way Mongo is all the time.), and remade the PG table as UNLOGGED. In a 50/50 read/update test over 1M records, PG ran in 0.62 of the time of Mongo. PG Load: [OVERALL], RunTime(ms), 104507.0 [OVERALL], Throughput(ops/sec), 9568.737022400413 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 293.0 [CLEANUP], MinLatency(us), 293.0 [CLEANUP], MaxLatency(us), 293.0 [CLEANUP], 95thPercentileLatency(us), 293.0 [CLEANUP], 99thPercentileLatency(us), 293.0 [INSERT], Operations, 100.0 [INSERT], AverageLatency(us), 101.329235 [INSERT], MinLatency(us), 88.0 [INSERT], MaxLatency(us), 252543.0 [INSERT], 95thPercentileLatency(us), 121.0 [INSERT], 99thPercentileLatency(us), 141.0 [INSERT], Return=OK, 100 PG Run: --- [OVERALL], RunTime(ms), 92763.0 [OVERALL], Throughput(ops/sec), 10780.16019318047 [READ], Operations, 499922.0 [READ], AverageLatency(us), 79.1722428698877 [READ], MinLatency(us), 69.0 [READ], MaxLatency(us), 19935.0 [READ], 95thPercentileLatency(us), 94.0 [READ], 99thPercentileLatency(us), 112.0 [READ], Return=OK, 499922 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 222.0 [CLEANUP], MinLatency(us), 222.0 [CLEANUP], MaxLatency(us), 222.0 [CLEANUP], 95thPercentileLatency(us), 222.0 [CLEANUP], 99thPercentileLatency(us), 222.0 [UPDATE], Operations, 500078.0 [UPDATE], AverageLatency(us), 98.96430156895525 [UPDATE], MinLatency(us), 83.0 [UPDATE], MaxLatency(us), 26655.0 [UPDATE], 95thPercentileLatency(us), 127.0 [UPDATE], 99thPercentileLatency(us), 158.0 [UPDATE], Return=OK, 500078 Mongo Load: --- [OVERALL], RunTime(ms), 133308.0 [OVERALL], Throughput(ops/sec), 7501.425270801452 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 1822.0 [CLEANUP], MinLatency(us), 1822.0 [CLEANUP], MaxLatency(us), 1822.0 [CLEANUP], 95thPercentileLatency(us), 1822.0 [CLEANUP], 99thPercentileLatency(us), 1822.0 [INSERT], Operations, 100.0 [INSERT], AverageLatency(us), 130.830678 [INSERT], MinLatency(us), 90.0 [INSERT], MaxLatency(us), 7147519.0 [INSERT], 95thPercentileLatency(us), 159.0 [INSERT], 99thPercentileLatency(us), 226.0 [INSERT], Return=OK, 100 Mongo Run: - [OVERALL], RunTime(ms), 149150.0 [OVERALL], Throughput(ops/sec), 6704.65973851827 [READ], Operations, 500837.0 [READ], AverageLatency(us), 98.13153980237084 [READ], MinLatency(us), 69.0 [READ], MaxLatency(us), 28271.0 [READ], 95thPercentileLatency(us), 166.0 [READ], 99thPercentileLatency(us), 186.0 [READ], Return=OK, 500837 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 2387.0 [CLEANUP], MinLatency(us), 2386.0 [CLEANUP], MaxLatency(us), 2387.0 [CLEANUP], 95thPercentileLatency(us), 2387.0 [CLEANUP], 99thPercentileLatency(us), 2387.0 [UPDATE], Operations, 499163.0 [UPDATE], AverageLatency(us), 195.21505600375028 [UPDATE], MinLatency(us), 118.0 [UPDATE], MaxLatency(us), 4513791.0 [UPDATE], 95thPercentileLatency(us), 211.0 [UPDATE], 99thPercentileLatency(us), 252.0 [UPDATE], Return=OK, 499163 > > > On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > Hi, Paul > > I agree with Oleg, EDB benchmarks are strange som
Re: [GENERAL] Question about shared_buffer cache behavior
On Friday, March 18, 2016 4:54 PM, Andreas Kretschmer wrote: > > >> Paul Jones hat am 18. Marz 2016 um 21:24 geschrieben: >> >> >> In Postgres 9.5.1 with a shared_buffer cache of 7Gb, a SELECT from > > the first query reads only the tuple from heap that are matched the > where-condition. > The 2nd query with an other where-condition reads other rows than the first > query. > > Keep in mind: a index search reads the index and pulls the rows that matched the > condition from the heap, no more. Ok, thanks! I understand now! > > Regards > -- > Andreas Kretschmer > http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?
On Tuesday, March 15, 2016 7:39 PM, "p...@cmicdo.com" wrote: > Your results are close enough to mine, I think, to prove the point. > And, I agree that the EDB benchmark is not necessary reflective of a > real-world scenario. > > However, the cache I'm referring to is PG's shared_buffer cache. > You can see the first run of the select causing a lot of disk reads. > The second identical run, reads purely from shared_buffers. > > What I don't understand is, why does a slightly different select from > the *same* table during the same session cause shared_buffers to be > blown out and re-read?? > > I will see if I can try YCSB next week (I'm in workshops all week...) > > Thanks! I was able to try YCSB today on both PG 9.5.1 and Mongo 3.2. At first, PG was running 4 times slower than Mongo. Then I remembered about unlogged tables (which I think is the way Mongo is all the time.), and remade the PG table as UNLOGGED. In a 50/50 read/update test over 1M records, PG ran in 0.62 of the time of Mongo. PG Load: [OVERALL], RunTime(ms), 104507.0 [OVERALL], Throughput(ops/sec), 9568.737022400413 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 293.0 [CLEANUP], MinLatency(us), 293.0 [CLEANUP], MaxLatency(us), 293.0 [CLEANUP], 95thPercentileLatency(us), 293.0 [CLEANUP], 99thPercentileLatency(us), 293.0 [INSERT], Operations, 100.0 [INSERT], AverageLatency(us), 101.329235 [INSERT], MinLatency(us), 88.0 [INSERT], MaxLatency(us), 252543.0 [INSERT], 95thPercentileLatency(us), 121.0 [INSERT], 99thPercentileLatency(us), 141.0 [INSERT], Return=OK, 100 PG Run: --- [OVERALL], RunTime(ms), 92763.0 [OVERALL], Throughput(ops/sec), 10780.16019318047 [READ], Operations, 499922.0 [READ], AverageLatency(us), 79.1722428698877 [READ], MinLatency(us), 69.0 [READ], MaxLatency(us), 19935.0 [READ], 95thPercentileLatency(us), 94.0 [READ], 99thPercentileLatency(us), 112.0 [READ], Return=OK, 499922 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 222.0 [CLEANUP], MinLatency(us), 222.0 [CLEANUP], MaxLatency(us), 222.0 [CLEANUP], 95thPercentileLatency(us), 222.0 [CLEANUP], 99thPercentileLatency(us), 222.0 [UPDATE], Operations, 500078.0 [UPDATE], AverageLatency(us), 98.96430156895525 [UPDATE], MinLatency(us), 83.0 [UPDATE], MaxLatency(us), 26655.0 [UPDATE], 95thPercentileLatency(us), 127.0 [UPDATE], 99thPercentileLatency(us), 158.0 [UPDATE], Return=OK, 500078 Mongo Load: --- [OVERALL], RunTime(ms), 133308.0 [OVERALL], Throughput(ops/sec), 7501.425270801452 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 1822.0 [CLEANUP], MinLatency(us), 1822.0 [CLEANUP], MaxLatency(us), 1822.0 [CLEANUP], 95thPercentileLatency(us), 1822.0 [CLEANUP], 99thPercentileLatency(us), 1822.0 [INSERT], Operations, 100.0 [INSERT], AverageLatency(us), 130.830678 [INSERT], MinLatency(us), 90.0 [INSERT], MaxLatency(us), 7147519.0 [INSERT], 95thPercentileLatency(us), 159.0 [INSERT], 99thPercentileLatency(us), 226.0 [INSERT], Return=OK, 100 Mongo Run: - [OVERALL], RunTime(ms), 149150.0 [OVERALL], Throughput(ops/sec), 6704.65973851827 [READ], Operations, 500837.0 [READ], AverageLatency(us), 98.13153980237084 [READ], MinLatency(us), 69.0 [READ], MaxLatency(us), 28271.0 [READ], 95thPercentileLatency(us), 166.0 [READ], 99thPercentileLatency(us), 186.0 [READ], Return=OK, 500837 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 2387.0 [CLEANUP], MinLatency(us), 2386.0 [CLEANUP], MaxLatency(us), 2387.0 [CLEANUP], 95thPercentileLatency(us), 2387.0 [CLEANUP], 99thPercentileLatency(us), 2387.0 [UPDATE], Operations, 499163.0 [UPDATE], AverageLatency(us), 195.21505600375028 [UPDATE], MinLatency(us), 118.0 [UPDATE], MaxLatency(us), 4513791.0 [UPDATE], 95thPercentileLatency(us), 211.0 [UPDATE], 99thPercentileLatency(us), 252.0 [UPDATE], Return=OK, 499163 > > > On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > Hi, Paul > > I agree with Oleg, EDB benchmarks are strange sometimes. I did the same benchmarks several months ago. I never noticed the cache influence back then, so I tried to reproduce your situation now (on a 5*10^6 records although). I started to play with db cache (using `echo 3 > /proc/sys/vm/drop_cache`), and I see difference in time execution for two subsequent queries, but `explain` info are almost identical, e.g. `shared hit & read`: > >
Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?
Your results are close enough to mine, I think, to prove the point. And, I agree that the EDB benchmark is not necessary reflective of a real-world scenario. However, the cache I'm referring to is PG's shared_buffer cache. You can see the first run of the select causing a lot of disk reads. The second identical run, reads purely from shared_buffers. What I don't understand is, why does a slightly different select from the *same* table during the same session cause shared_buffers to be blown out and re-read?? I will see if I can try YCSB next week (I'm in workshops all week...) Thanks! On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote: Hi, Paul I agree with Oleg, EDB benchmarks are strange sometimes. I did the same benchmarks several months ago. I never noticed the cache influence back then, so I tried to reproduce your situation now (on a 5*10^6 records although). I started to play with db cache (using `echo 3 > /proc/sys/vm/drop_cache`), and I see difference in time execution for two subsequent queries, but `explain` info are almost identical, e.g. `shared hit & read`: ```benchmark=# explain (buffers, analyze, verbose) select data from json_tables where data @> '{"name": "AC3 Case Red"}'; QUERY PLAN Bitmap Heap Scan on public.json_tables (cost=102.74..19001.47 rows=4999 width=1257) (actual time=740.556..215956.655 rows=454546 loops=1) Output: data Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb) Rows Removed by Index Recheck: 2114606 Heap Blocks: exact=31624 lossy=422922 Buffers: shared hit=1371 read=41 -> Bitmap Index Scan on json_tables_idx (cost=0.00..101.49 rows=4999 width=0) (actual time=731.010..731.010 rows=454547 loops=1) Index Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb) Buffers: shared hit=1371 read=1005 Planning time: 6.352 ms Execution time: 216075.830 ms(11 rows) benchmark=# explain (buffers, analyze, verbose) select data from json_tables where data @> '{"name": "AC3 Case Red"}'; QUERY PLAN --- Bitmap Heap Scan on public.json_tables (cost=102.74..19001.47 rows=4999 width=1257) (actual time=222.476..10692.703 rows=454546 loops=1) Output: data Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb) Rows Removed by Index Recheck: 2114606 Heap Blocks: exact=31624 lossy=422922 Buffers: shared hit=1371 read=41 -> Bitmap Index Scan on json_tables_idx (cost=0.00..101.49 rows=4999 width=0) (actual time=214.736..214.736 rows=454547 loops=1) Index Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb) Buffers: shared hit=1371 read=1005 Planning time: 0.089 ms Execution time: 10767.739 ms(11 rows)``` But I see almost the same execution time from mongodb `explain` (216075ms for pg and 177784ms for mongo, which isn't so much I think): ```DBQuery.shellBatchSize = 100; db.json_tables.find({"name": "AC3 Case Red"}).explain(true){ "queryPlanner" : { "plannerVersion" : 1, "namespace" : "benchmark.json_tables", "indexFilterSet" : false, "parsedQuery" : { "name" : { "$eq" : "AC3 Case Red" } }, "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "name" : 1 }, "indexName" : "name_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "name" : [ "[\"AC3 Case Red\", \"AC3 Case Red\"]" ] } } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 454546,
Re: [GENERAL] Anyone compare PG 9.5 and MongoDB 3.2?
Very helpful!! Thanks!! On Tuesday, March 1, 2016 9:32 AM, Peter Devoy wrote: > MongoDB has released 3.2 with their WiredTiger storage. Has anyone > benchmarked 9.5 against it, and for JSONB elements several MB in size? > > PJ Hi Paul I do not have an answer for you but there is a great talk here in which someone explains why they moved from a NoSQL stack to Postgres: https://www.youtube.com/watch?v=UgcC_bY4rPg If I recall correctly JSON functionality was touched upon but, if you have not seen it, the whole talk is worth a watch. Hope this helps in some way. Kind regards Peter Devoy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: could not read block 3 in file "base/12511/12270"
> On 12/23/2015 04:17 PM, Paul Jones wrote: > > > >I have been having disk errors that have corrupted something in > >>my postgres database. Other databases work ok: > > This isn't the best characterization...the "postgres" data is not a "system" > database but rather a convenient default user database. Maybe I'm being > overly picky here but seeing "system" in this context does have a > connotation that we really don't want to impart onto the "postgres" database. > > It is named such because the default user is likewise "postgres" and most > utilities when not provided with a database name will use the O/S user's > name which, for administrative tasks, is likely to be "postgres" (you really > shouldn't use root for DB-admin stuff) and thus those commands will be able > to connect without much, if any, additional options supplied. > > Its presence, absence, or modification in now way alters the fundamental > operation of PostgreSQL; though its lack may frustrate users acclimated to > using said defaults. > This was one of the big lessons I learned from this. All this time I was under the mistaken impression that it was special. > David J. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: could not read block 3 in file "base/12511/12270"
On Wednesday, December 23, 2015 6:45 PM, Tom Lane wrote: > Paul Jones writes: > > I have been having disk errors that have corrupted something in > > my postgres database. Other databases work ok: > > > postgres=# SELECT pg_catalog.pg_is_in_recovery(); > > ERROR: could not read block 3 in file "base/12511/12270": read only 4096 > > of 8192 bytes > > Hm. Evidently you've got a partially truncated file for some system > catalog or index. It's fairly hard to estimate the consequences of > that without knowing which one it is. Please see if this works: > > $ export PGOPTIONS="-c ignore_system_indexes=true" > $ psql -U postgres > > # show ignore_system_indexes; > (should say "on") > > # select relname, relkind from pg_class where pg_relation_filenode(oid) = > 12270; paul@kitanglad:~$ export PGOPTIONS="-c ignore_system_indexes=true" paul@kitanglad:~$ psql -U postgres psql (9.4.5) Type "help" for help. postgres=# show ignore_system_indexes; ignore_system_indexes --- on (1 row) postgres=# select relname, relkind from pg_class where pg_relation_filenode(oid) = 12270; relname | relkind ---+- pg_proc_oid_index | i (1 row) postgres=# reindex index pg_proc_oid_index; REINDEX postgres=# \q paul@kitanglad:~$ unset PGOPTIONS paul@kitanglad:~$ psql -U postgres psql (9.4.5) Type "help" for help. postgres=# SELECT pg_catalog.pg_is_in_recovery(); pg_is_in_recovery --- f (1 row) So, it was an index and was quickly fixed. Thanks! > > > If that works, and it tells you filenode 12270 is an index, you're in > luck: just REINDEX that index and you're done (at least with this problem, > there might be more lurking behind it). Don't forget to unset PGOPTIONS > afterwards. > > > > Since this is the "postgres" database, dropping and re-creating it > > doesn't seem possible. > > Sure it is, as long as you issue the commands from a non-broken database: > > # drop database postgres; > DROP DATABASE > # create database postgres with template template0; > CREATE DATABASE > > If you don't have any custom objects in the postgres database, this would > be by far the easiest way out. Good to know! I thought there was something special about "postgres". I have not modified it from what initdb put there. > > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problem building both Python 2 and 3 into Postgres 9.4.4?
That worked, thank you. The Tip in 43.1 did not explain in that much detail. I searched but could not find that explained anywhere in the docs. Your paragraph would be a nice enhancement to the tip. PJ On Sun, 7/19/15, Tom Lane wrote: Subject: Re: [GENERAL] Problem building both Python 2 and 3 into Postgres 9.4.4? To: p...@cmicdo.com Cc: pgsql-general@postgresql.org Date: Sunday, July 19, 2015, 4:54 PM p...@cmicdo.com writes: > Has anyone successfully built Python 2 and 3 into the same installation > of Postgres 9.4.4? I do not think you can do it like this: > make distclean > ./configure --with-openssl --with-libxml --with-libxslt --with-python PYTHON=/usr/bin/python > make > ./configure --with-openssl --with-libxml --with-libxslt --with-python PYTHON=/usr/bin/python3 > make > make install You need to configure and make one way, then install, then make distclean, reconfigure and remake the other way, then make install in only the src/pl/plpython directory. The recipe as you have it will end up with only one installed copy of plpython, not two, and I'm not real sure that that copy will be correctly built for either case. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Problem building both Python 2 and 3 into Postgres 9.4.4?
Has anyone successfully built Python 2 and 3 into the same installation of Postgres 9.4.4? I tried it today on Ubuntu 10.04, Python 2.6.5, Python 3.1.2 and got and error about undefined symbol: PyString_AsString. The Python docs say that PyString_* have been renamed to PyBytes_* and I find references to both under ./src/pl/plpython. Is this a bug, something I'm doing wrong, or perhaps my system is too old? make distclean ./configure --with-openssl --with-libxml --with-libxslt --with-python PYTHON=/us r/bin/python make ./configure --with-openssl --with-libxml --with-libxslt --with-python PYTHON=/us r/bin/python3 make make install service postgresql restart psql -U postgres postgres=# create database py2; CREATE DATABASE postgres=# create database py3; CREATE DATABASE postgres=# \c py2 You are now connected to database "py2" as user "postgres". py2=# create extension plpython2u; CREATE EXTENSION py2=# \c py3 You are now connected to database "py3" as user "postgres". py3=# create extension plpython3u; ERROR: could not load library "/usr/local/pgsql/lib/plpython3.so": /usr/local/p gsql/lib/plpython3.so: undefined symbol: PyString_AsString -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
On Thu, Nov 06, 2014 at 02:55:20PM +, Shaun Thomas wrote: > > These updates aren't equivalent. It's very important you know this, because > you're also inflating your table with a lot of extra updated rows. > > Take the first UPDATE: > > > UPDATE second SET time1 = orig.time1 > > FROM orig > > WHERE second.key1 = orig.key1; > > If you wrote this as a SELECT, it would look like this: > > SELECT second.time1, orig.time1 > FROM second > JOIN ORIG ON (second.key1 = orig.key1) > > Since second is a many to one subset of orig, you now have several > simultaneous updates. Your second UPDATE: > > > UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second > > WHERE orig.key1 = second.key1 LIMIT 1); > > Is equivalent to this SELECT: > > SELECT second.time1, >(SELECT orig.time1 FROM orig,second > WHERE orig.key1 = second.key1 LIMIT 1) > FROM second; > > Meaning you'd only get as many updates as there are rows in second. The > difference is your LIMIT 1. However, since you're not using an ORDER BY > clause, the actual value you get for time1 will be indeterminate. Something > like this would remove the row inflation and fix the random time1 behavior, > but I'm not sure it was your intent: > > UPDATE second >SET time1 = orig.time1 > FROM (SELECT DISTINCT ON (key1) key1, time1 > FROM orig > ORDER BY key1, time1 DESC) sub > WHERE second.key1 = sub.key1; I see now that I made more than one mistake. 1) I forgot to INCLUDE INDEXES when creating second. I would have seen dup keys when filling it. CREATE TABLE second (LIKE orig INCLUDING INDEXES); 2) I should have used something like this to fill second: INSERT INTO second (key1) SELECT key1 FROM orig ORDER BY random() LIMIT 40; 3) I then incorrectly remembered the query I had written at work. It should have been: EXPLAIN ANALYZE UPDATE second se SET time1 = (SELECT time1 FROM orig WHERE orig.key1 = se.key1); Once the second table is filled with unique keys, then both UPDATES should have produced the same results, but the UPDATE FROM is faster than the UPDATE = SELECT, which is documented. My original intent was to find out what the performance differences between the two are. Thanks for pointing these things out! PJ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
> > On Mon, 11/3/14, Igor Neyman wrote: > > -Original Message- > From: pgsql-general-ow...@postgresql.org > [mailto:pgsql-general-ow...@postgresql.org] > On Behalf Of p...@cmicdo.com > Sent: Monday, November 03, 2014 11:34 AM > To: pgsql-general@postgresql.org > Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM > vs UPDATE SET = (SELECT ...) > > Why does the UPDATE SET = FROM choose a more poorly > performing plan than the UPDATE SET = (SELECT ...)? It > seems to me that it is the same join. > > I'm using 9.3.5. > > CREATE TABLE orig > ( > key1VARCHAR(11) > PRIMARY KEY, > time1 TIME > ); > > INSERT INTO orig (key1, time1) > SELECT > a::TEXT, > (((random()*100)::INT % > 24)::TEXT || ':' || > ((random()*100)::INT % > 60)::TEXT)::TIME FROM generate_series(800, > 8000200) a; > > CREATE INDEX odx ON orig(key1); > > CREATE TABLE second (LIKE orig); > > INSERT INTO second (key1) > SELECT > (800+(((random()*100)::INT) % 100))::TEXT > FROM generate_series(1,40); > > EXPLAIN ANALYZE > UPDATE second SET time1 = orig.time1 > FROM orig > WHERE second.key1 = orig.key1; > [.] > > UPDATE second SET time1 = NULL; > > EXPLAIN ANALYZE > UPDATE second SET time1 = (SELECT orig.time1 FROM > orig,second > > WHERE orig.key1 = second.key1 > LIMIT 1); > [.] > > These 2 queries are not the same. > > The first query updates rows in the "second" table with the > orig.time1 values based on key1 column match. > The second query finds first possible match (based on key1 > column) and assigns orig.time1 value from the matched row to > every record in "second" table. > > Regards, > Igor Neyman I see that now. I was trying to reproduce something from work from memory and got tripped up on a sublety of UPDATE ... SELECT. The query I ran at work was like this: EXPLAIN ANALYZE UPDATE second se SET time1 = (SELECT time1 FROM orig WHERE orig.key1 = se.key1); QUERY PLAN -- Update on second se (cost=0.00..3390627.00 rows=40 width=18) (actual time=18698.795..18698.795 rows=0 loops=1) -> Seq Scan on second se (cost=0.00..3390627.00 rows=40 width=18) (actual time=7.558..16694.600 rows=40 loops=1) SubPlan 1 -> Index Scan using odx on orig (cost=0.43..8.45 rows=1 width=8) (actual time=0.033..0.035 rows=1 loops=40) Index Cond: ((key1)::text = (se.key1)::text) Total runtime: 18698.865 ms (6 rows) This does correctly match and update all of the second table entries. The plan actually runs longer than the UPDATE ... FROM, which squares with a comment the fine manual. Thanks! PJ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
Why does the UPDATE SET = FROM choose a more poorly performing plan than the UPDATE SET = (SELECT ...)? It seems to me that it is the same join. I'm using 9.3.5. CREATE TABLE orig ( key1VARCHAR(11) PRIMARY KEY, time1 TIME ); INSERT INTO orig (key1, time1) SELECT a::TEXT, (((random()*100)::INT % 24)::TEXT || ':' || ((random()*100)::INT % 60)::TEXT)::TIME FROM generate_series(800, 8000200) a; CREATE INDEX odx ON orig(key1); CREATE TABLE second (LIKE orig); INSERT INTO second (key1) SELECT (800+(((random()*100)::INT) % 100))::TEXT FROM generate_series(1,40); EXPLAIN ANALYZE UPDATE second SET time1 = orig.time1 FROM orig WHERE second.key1 = orig.key1; QUERY PLAN Update on second (cost=69461.02..106082.02 rows=40 width=32) (actual time=16033.023..16033.023 rows=0 loops=1) -> Hash Join (cost=69461.02..106082.02 rows=40 width=32) (actual time=7698.445..12992.039 rows=40 loops=1) Hash Cond: ((second.key1)::text = (orig.key1)::text) -> Seq Scan on second (cost=0.00..12627.00 rows=40 width=18) (actual time=49.820..791.397 rows=40 loops=1) -> Hash (cost=31765.01..31765.01 rows=201 width=26) (actual time=7648.540..7648.540 rows=201 loops=1) Buckets: 4096 Batches: 128 Memory Usage: 717kB -> Seq Scan on orig (cost=0.00..31765.01 rows=201 width=26) (actual time=0.014..3655.844 rows=201 loops=1) Total runtime: 16033.193 ms (8 rows) UPDATE second SET time1 = NULL; EXPLAIN ANALYZE UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second WHERE orig.key1 = second.key1 LIMIT 1); QUERY PLAN Update on second (cost=3.60..19078.19 rows=1279959 width=18) (actual time=4642.453..4642.453 rows=0 loops=1) InitPlan 1 (returns $1) -> Limit (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 rows=1 loops=1) -> Nested Loop (cost=0.43..4056331.83 rows=1279959 width=8) (actual time=2.606..2.606 rows=1 loops=1) -> Seq Scan on second second_1 (cost=0.00..19074.59 rows=1279959 width=12) (actual time=2.487..2.487 rows=1 loops=1) -> Index Scan using odx on orig (cost=0.43..3.14 rows=1 width=20) (actual time=0.098..0.098 rows=1 loops=1) Index Cond: ((key1)::text = (second_1.key1)::text) -> Seq Scan on second (cost=0.00..19074.59 rows=1279959 width=18) (actual time=6.420..817.739 rows=40 loops=1) Total runtime: 4642.561 ms (9 rows) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BDR Global Sequences
Hi Andres, > Hi, > > On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote: >> I have a question about BDR Global Sequences. >> [deleted] >> Is there way to increase a global sequence's reservation block for each >> node so that I can tell the nodes, "I'm going to load 100M rows now so >> you should get ready for that."? > > > Not yet, but we're planning to add that. Good to hear. In the meantime, is there something I can hack to force the nodes to make a sequence allocation of my choosing (even if just hardwired?) I was playing with start_elections_sql where it says: generate_series(\n" current_max,\n" -- 1000 is the chunk size, -1 is to get < instead <= out of generate_series\n" current_max + 1000 * (5 - open_seq_chunks) - 1,\n" 1000) chunk_start\n" and hoping that bumping up the 1000 would help, but apparently not. PJ > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] BDR Global Sequences
I have a question about BDR Global Sequences. I've been playing with BDR on PG 9.4beta2, built from source from the 2nd Quadrant GIT page (git://git.postgresql.org/git/2ndquadrant_bdr.git). When trying a 100 row \copy-in, letting PG choose the global sequence values, I get "ERROR: could not find free sequence value for global sequence public.myseq", as documented...no surprise there. However, the number of rows I can load before the error varies wildly with each trial. Is there way to increase a global sequence's reservation block for each node so that I can tell the nodes, "I'm going to load 100M rows now so you should get ready for that."? PJ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general