Of course, I do not create GIN index. Maybe the problem is related to checkpoint and WAL. I don't know how to make the comparison with MongoDB fair enough.
(C)Bitnine, Kisung Kim, Ph.D https://sites.google.com/site/kisungresearch/ E-mail : ks...@bitnine.net Office phone : 070-4800-5890, 408-606-8602 US Mobile phone : 408-805-2192 2016-07-19 11:23 GMT+09:00 Sameer Kumar <sameer.ku...@ashnik.com>: > > > On Fri, 11 Mar 2016, 9:39 p.m. Paul Jones, <p...@cmicdo.com> wrote: > >> I have been running the EDB benchmark that compares Postgres and MongoDB. >> I believe EDB ran it against PG 9.4 and Mongo 2.6. I am running it >> against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000 >> JSON records generated by the benchmark. It looks like Mongo is winning, >> and apparently because of its cache management. >> >> The first queries on both run in ~30 min. And, once PG fills its cache, >> it whips Mongo on repeats of the *same* query (vmstat shows no disk >> reads for PG). >> >> However, when different query on the same table is issued to both, >> vmstat shows that PG has to read the *entire* table again, and it takes >> ~30 min. Mongo does a lot of reads initially but after about 5 minutes, >> it stops reading and completes the query, most likely because it is >> using its cache very effectively. >> >> Host: Virtual Machine >> 4 CPUs >> 16 Gb RAM >> 200 Gb Disk >> RHEL 6.6 >> >> PG: 9.5.1 compiled from source >> shared_buffers = 7GB >> effectve_cache_size = 12GB >> >> Mongo: 3.2 installed with RPM from Mongo >> >> In PG, I created the table by: >> >> CREATE TABLE json_tables >> ( >> data JSONB >> ); >> >> After loading, it creates the index: >> >> CREATE INDEX json_tables_idx ON json_tables USING GIN (data >> jsonb_path_ops); >> > > This would create one GIN index which is going to be a bit larger than > usual btree /n-tree index on a specific JSON field. And would be slower > too. I suggest that you create an index on the specific expression using > JSON operators. In my opinion that index would be much more nearer to > mongoDB indexes. > > > >> After a lot of experimentation, I discovered that the benchmark was not >> using PG's index, so I modified the four queries to be: >> >> SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}'; >> SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic >> Plan"}'; >> SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}'; >> SELECT data FROM json_tables WHERE data @> '{"type": "service"}'; >> >> Here are two consecutive explain analyze for PG, for the same query. >> No functional difference in the plans that I can tell, but the effect >> of PG's cache on the second is dramatic. >> >> If anyone has ideas on how I can get PG to more effectively use the cache >> for subsequent queries, I would love to hear them. >> >> ------- >> >> benchmark=# explain analyze select data from json_tables where data @> >> '{"name": "AC3 Case Red"}'; >> >> QUERY PLAN >> >> >> >> ----------------------------------------------------------------------------------------------------------------------------------------- >> >> Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=10000 >> width=1261) >> (actual time=2157.118..1259550.327 rows=909091 loops=1) >> Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) >> Rows Removed by Index Recheck: 4360296 >> Heap Blocks: exact=37031 lossy=872059 >> -> Bitmap Index Scan on json_tables_idx (cost=0.00..111.00 >> rows=10000 width =0) (actual time=2141.250..2141.250 rows=909091 loops=1) >> Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) >> Planning time: 291.932 ms >> Execution time: 1259886.920 ms >> (8 rows) >> >> Time: 1261191.844 ms >> >> benchmark=# explain analyze select data from json_tables where data @> >> '{"name": "AC3 Case Red"}'; >> QUERY PLAN >> >> >> --------------------------------------------------------------------------------------------------------------------------------------- >> >> Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=10000 >> width=1261) (actual time=779.261..29815.262 rows=909091 loops=1) >> Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) >> Rows Removed by Index Recheck: 4360296 >> Heap Blocks: exact=37031 lossy=872059 >> -> Bitmap Index Scan on json_tables_idx (cost=0.00..111.00 >> rows=10000 width =0) (actual time=769.081..769.081 rows=909091 loops=1) >> Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) >> Planning time: 33.967 ms >> Execution time: 29869.381 ms >> >> (8 rows) >> >> Time: 29987.122 ms >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > -- > -- > Best Regards > Sameer Kumar | DB Solution Architect > *ASHNIK PTE. LTD.* > > 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533 > > T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com >