On Mar 11, 2016 4:40 PM, "Paul Jones" <p...@cmicdo.com> wrote: > > I have been running the EDB benchmark that compares Postgres and MongoDB. > I believe EDB ran it against PG 9.4 and Mongo 2.6. I am running it > against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000 > JSON records generated by the benchmark. It looks like Mongo is winning, > and apparently because of its cache management.
Dmitry was working on the same benchmarks. I think edb benchmark is broken by design. Better, use ycsb benchmarks. I hope, Dmitry will share his results. > > The first queries on both run in ~30 min. And, once PG fills its cache, > it whips Mongo on repeats of the *same* query (vmstat shows no disk > reads for PG). > > However, when different query on the same table is issued to both, > vmstat shows that PG has to read the *entire* table again, and it takes > ~30 min. Mongo does a lot of reads initially but after about 5 minutes, > it stops reading and completes the query, most likely because it is > using its cache very effectively. > > Host: Virtual Machine > 4 CPUs > 16 Gb RAM > 200 Gb Disk > RHEL 6.6 > > PG: 9.5.1 compiled from source > shared_buffers = 7GB > effectve_cache_size = 12GB > > Mongo: 3.2 installed with RPM from Mongo > > In PG, I created the table by: > > CREATE TABLE json_tables > ( > data JSONB > ); > > After loading, it creates the index: > > CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops); > > After a lot of experimentation, I discovered that the benchmark was not > using PG's index, so I modified the four queries to be: > > SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}'; > SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic Plan"}'; > SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}'; > SELECT data FROM json_tables WHERE data @> '{"type": "service"}'; > > Here are two consecutive explain analyze for PG, for the same query. > No functional difference in the plans that I can tell, but the effect > of PG's cache on the second is dramatic. > > If anyone has ideas on how I can get PG to more effectively use the cache > for subsequent queries, I would love to hear them. > > ------- > > benchmark=# explain analyze select data from json_tables where data @> '{"name": "AC3 Case Red"}'; > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=10000 width=1261) > (actual time=2157.118..1259550.327 rows=909091 loops=1) > Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) > Rows Removed by Index Recheck: 4360296 > Heap Blocks: exact=37031 lossy=872059 > -> Bitmap Index Scan on json_tables_idx (cost=0.00..111.00 rows=10000 width =0) (actual time=2141.250..2141.250 rows=909091 loops=1) > Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) > Planning time: 291.932 ms > Execution time: 1259886.920 ms > (8 rows) > > Time: 1261191.844 ms > > benchmark=# explain analyze select data from json_tables where data @> '{"name": "AC3 Case Red"}'; > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=10000 width=1261) (actual time=779.261..29815.262 rows=909091 loops=1) > Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) > Rows Removed by Index Recheck: 4360296 > Heap Blocks: exact=37031 lossy=872059 > -> Bitmap Index Scan on json_tables_idx (cost=0.00..111.00 rows=10000 width =0) (actual time=769.081..769.081 rows=909091 loops=1) > Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) > Planning time: 33.967 ms > Execution time: 29869.381 ms > > (8 rows) > > Time: 29987.122 ms > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general