Your results are close enough to mine, I think, to prove the point.   And, I 
agree that the EDB benchmark is not necessary reflective of a real-world 
scenario.
However, the cache I'm referring to is PG's shared_buffer cache.   You can see 
the first run of the select causing a lot of disk reads.  The second identical 
run, reads purely from shared_buffers.
What I don't understand is, why does a slightly different select from the 
*same* table during the same session cause shared_buffers to be blown out and 
re-read??
I will see if I can try YCSB next week (I'm in workshops all week...)
Thanks!
 

    On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> 
wrote:
 

 Hi, Paul
I agree with Oleg, EDB benchmarks are strange sometimes. I did the same 
benchmarks several months ago. I never noticed the cache influence back then, 
so I tried to reproduce your situation now (on a 5*10^6 records although). I 
started to play with db cache (using `echo 3 > /proc/sys/vm/drop_cache`), and I 
see difference in time execution for two subsequent queries, but `explain` info 
are almost identical, e.g. `shared hit & read`:
```benchmark=# explain (buffers, analyze, verbose) select data from json_tables 
where data @> '{"name": "AC3 Case Red"}';                                       
                          QUERY PLAN                                            
                     
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.json_tables  (cost=102.74..19001.47 rows=4999 
width=1257) (actual time=740.556..215956.655 rows=454546 loops=1)   Output: 
data   Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)   
Rows Removed by Index Recheck: 2114606   Heap Blocks: exact=31624 lossy=422922  
 Buffers: shared hit=1371 read=455551   ->  Bitmap Index Scan on 
json_tables_idx  (cost=0.00..101.49 rows=4999 width=0) (actual 
time=731.010..731.010 rows=454547 loops=1)         Index Cond: 
(json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)         Buffers: shared 
hit=1371 read=1005 Planning time: 6.352 ms Execution time: 216075.830 ms(11 
rows)
benchmark=# explain (buffers, analyze, verbose) select data from json_tables 
where data @> '{"name": "AC3 Case Red"}';                                       
                         QUERY PLAN                                             
                    
-------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.json_tables  (cost=102.74..19001.47 rows=4999 
width=1257) (actual time=222.476..10692.703 rows=454546 loops=1)   Output: data 
  Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)   Rows 
Removed by Index Recheck: 2114606   Heap Blocks: exact=31624 lossy=422922   
Buffers: shared hit=1371 read=455551   ->  Bitmap Index Scan on json_tables_idx 
 (cost=0.00..101.49 rows=4999 width=0) (actual time=214.736..214.736 
rows=454547 loops=1)         Index Cond: (json_tables.data @> '{"name": "AC3 
Case Red"}'::jsonb)         Buffers: shared hit=1371 read=1005 Planning time: 
0.089 ms Execution time: 10767.739 ms(11 rows)```
But I see almost the same execution time from mongodb `explain` (216075ms for 
pg and 177784ms for mongo, which isn't so much I think):
```DBQuery.shellBatchSize = 10000000000; db.json_tables.find({"name": "AC3 Case 
Red"}).explain(true){        "queryPlanner" : {                "plannerVersion" 
: 1,                "namespace" : "benchmark.json_tables",                
"indexFilterSet" : false,                "parsedQuery" : {                      
  "name" : {                                "$eq" : "AC3 Case Red"              
          }                },                "winningPlan" : {                  
      "stage" : "FETCH",                        "inputStage" : {                
                "stage" : "IXSCAN",                                "keyPattern" 
: {                                        "name" : 1                           
     },                                "indexName" : "name_1",                  
              "isMultiKey" : false,                                "isUnique" : 
false,                                "isSparse" : false,                       
         "isPartial" : false,                                "indexVersion" : 
1,                                "direction" : "forward",                      
          "indexBounds" : {                                        "name" : [   
                                             "[\"AC3 Case Red\", \"AC3 Case 
Red\"]"                                        ]                                
}                        }                },                "rejectedPlans" : [ 
]        },       "executionStats" : {                "executionSuccess" : 
true,                "nReturned" : 454546,                "executionTimeMillis" 
: 177784,                "totalKeysExamined" : 454546,                
"totalDocsExamined" : 454546,                "executionStages" : {              
          "stage" : "FETCH",                        "nReturned" : 454546,       
                 "executionTimeMillisEstimate" : 175590,                        
"works" : 454547,                        "advanced" : 454546,                   
     "needTime" : 0,                        "needYield" : 0,                    
    "saveState" : 8638,                        "restoreState" : 8638,           
             "isEOF" : 1,                        "invalidates" : 0,             
           "docsExamined" : 454546,                        "alreadyHasObj" : 0, 
                       "inputStage" : {                                "stage" 
: "IXSCAN",                                "nReturned" : 454546,                
                "executionTimeMillisEstimate" : 700,                            
    "works" : 454547,                                "advanced" : 454546,       
                         "needTime" : 0,                                
"needYield" : 0,                                "saveState" : 8638,             
                   "restoreState" : 8638,                                
"isEOF" : 1,                                "invalidates" : 0,                  
              "keyPattern" : {                                        "name" : 
1                                },                                "indexName" 
: "name_1",                                "isMultiKey" : false,                
                "isUnique" : false,                                "isSparse" : 
false,                                "isPartial" : false,                      
          "indexVersion" : 1,                                "direction" : 
"forward",                                "indexBounds" : {                     
                   "name" : [                                                
"[\"AC3 Case Red\", \"AC3 Case Red\"]"                                        ] 
                               },                                "keysExamined" 
: 454546,                                "dupsTested" : 0,                      
          "dupsDropped" : 0,                                "seenInvalidated" : 
0                        }                },                "allPlansExecution" 
: [ ]        },        "serverInfo" : {                "host" : 
"ip-172-30-0-236",                "port" : 27017,                "version" : 
"3.2.4",                "gitVersion" : 
"e2ee9ffcf9f5a94fad76802e28cc978718bb7a30"        },        "ok" : 1}```
I not missed anything, am I right? Are you sure that it took much more time for 
PostgreSQL?Besides, everything is fine for queries with more small results 
(while the query {"name": "AC3 Case Red"} is almost 1/10 of entire dataset):

```=# insert into json_tables values('{"name": "test name"}'::jsonb);
=# explain (buffers, analyze, verbose) select data from json_tables where data 
@> '{"name": "test name"}';                                                     
                                                        QUERY PLAN              
                                             
-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.json_tables  (cost=62.75..18965.16 rows=5000 
width=1257) (actual time=0.020..0.021 rows=1 loops=1)   Output: data   Recheck 
Cond: (json_tables.data @> '{"name": "test name"}'::jsonb)   Heap Blocks: 
exact=1   Buffers: shared hit=5   ->  Bitmap Index Scan on json_tables_idx  
(cost=0.00..61.50 rows=5000 width=0) (actual time=0.011..0.011 rows=1 loops=1)  
       Index Cond: (json_tables.data @> '{"name": "test name"}'::jsonb)         
Buffers: shared hit=4 Planning time: 1.164 ms Execution time: 0.045 ms(10 
rows)```
As far as I know there isn't much to do about caching. I don't know if it's 
appropriate, but you can manually warm-up the cache (something like `cat 
/var/lib/postgresql/9.5/main/base/*/* > /dev/null`).
On 14 March 2016 at 00:30, Oleg Bartunov <obartu...@gmail.com> wrote:


On Mar 11, 2016 4:40 PM, "Paul Jones" <p...@cmicdo.com> wrote:
>
> I have been running the EDB benchmark that compares Postgres and MongoDB.
> I believe EDB ran it against PG 9.4 and Mongo 2.6.  I am running it
> against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000
> JSON records generated by the benchmark.  It looks like Mongo is winning,
> and apparently because of its cache management.Dmitry was working on the same 
> benchmarks. I think edb benchmark is broken by design. Better,  use ycsb 
> benchmarks. I hope, Dmitry will share his 
results.>
> The first queries on both run in ~30 min.  And, once PG fills its cache,
> it whips Mongo on repeats of the *same* query (vmstat shows no disk
> reads for PG).
>
> However, when different query on the same table is issued to both,
> vmstat shows that PG has to read the *entire* table again, and it takes
> ~30 min.  Mongo does a lot of reads initially but after about 5 minutes,
> it stops reading and completes the query, most likely because it is
> using its cache very effectively.
>
> Host:   Virtual Machine
>         4 CPUs
>         16 Gb RAM
>         200 Gb Disk
>         RHEL 6.6
>
> PG:     9.5.1 compiled from source
>         shared_buffers = 7GB
>         effectve_cache_size = 12GB
>
> Mongo:  3.2 installed with RPM from Mongo
>
> In PG, I created the table by:
>
> CREATE TABLE json_tables
> (
>         data    JSONB
> );
>
> After loading, it creates the index:
>
> CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);
>
> After a lot of experimentation, I discovered that the benchmark was not
> using PG's index, so I modified the four queries to be:
>
> SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}';
> SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic 
> Plan"}';
> SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}';
> SELECT data FROM json_tables WHERE data @> '{"type": "service"}';
>
> Here are two consecutive explain analyze for PG, for the same query.
> No functional difference in the plans that I can tell, but the effect
> of PG's cache on the second is dramatic.
>
> If anyone has ideas on how I can get PG to more effectively use the cache
> for subsequent queries, I would love to hear them.
>
> -------
>
> benchmark=# explain analyze select data from json_tables where data @> 
> '{"name": "AC3 Case Red"}';
>
>                                                                QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000 width=1261)
> (actual time=2157.118..1259550.327 rows=909091 loops=1)
>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>    Rows Removed by Index Recheck: 4360296
>    Heap Blocks: exact=37031 lossy=872059
>    ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=10000 
> width =0) (actual time=2141.250..2141.250 rows=909091 loops=1)
>          Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
> Planning time: 291.932 ms
> Execution time: 1259886.920 ms
> (8 rows)
>
> Time: 1261191.844 ms
>
> benchmark=# explain analyze select data from json_tables where data @> 
> '{"name": "AC3 Case Red"}';
>                                                               QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000 
> width=1261) (actual time=779.261..29815.262 rows=909091 loops=1)
>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>    Rows Removed by Index Recheck: 4360296
>    Heap Blocks: exact=37031 lossy=872059
>    ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=10000 
> width =0) (actual time=769.081..769.081 rows=909091 loops=1)
>          Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
> Planning time: 33.967 ms
> Execution time: 29869.381 ms
>
> (8 rows)
>
> Time: 29987.122 ms
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general




  

Reply via email to