Hmm, the last index query you attached verified nothing was clustered. Just one more thing to look at. Please send the results of each of the following to a file and attach.
EXPLAIN ANALYZE SELECT * FROM data2011_01 WHERE taxiid::text = 'SZB07P23'::text; EXPLAIN ANALYZE SELECT * FROM data2013_01w WHERE taxiid::text = 'SZB07P23'::text;; On Tue, Feb 2, 2016 at 10:16 AM, Yu Nie <niey...@gmail.com> wrote: > One more observation: not sure if it helps with the diagnosis. > > If I query based on time interval, not based on taxiid, the query is > generally faster - I believe it is due to the way the data were loaded. > Yet there is still significant performance difference: the large table > seems runs faster than the small table. More importantly, this time the > planner proposes two very different plans for the two tables. Not sure why > Bitmap index was used for the small table... > > > See results below: > > explain (analyze, buffers) select * from data2011_01 where time >= > '2011-01-16 12:00:00' and time < '2011-01-16 12:05:00' order by time; > > For small table. > "Sort (cost=529024.61..529411.73 rows=154848 width=55) (actual > time=3761.329..3765.501 rows=149807 loops=1)" > " Sort Key: "time"" > " Sort Method: quicksort Memory: 27211kB" > " Buffers: shared hit=39 read=4304" > " -> Bitmap Heap Scan on data2013_01w (cost=3287.76..515676.33 > rows=154848 width=55) (actual time=29.028..3714.592 rows=149807 loops=1)" > " Recheck Cond: (("time" >= '2013-01-16 12:00:00'::timestamp > without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without > time zone))" > " Heap Blocks: exact=3929" > " Buffers: shared hit=39 read=4304" > " -> Bitmap Index Scan on data2013_01w_ixtime (cost=0.00..3249.05 > rows=154848 width=0) (actual time=17.931..17.931 rows=149807 loops=1)" > " Index Cond: (("time" >= '2013-01-16 12:00:00'::timestamp > without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without > time zone))" > " Buffers: shared hit=1 read=413" > "Planning time: 0.203 ms" > "Execution time: 3779.368 ms" > > > For large table. > "Index Scan using data2011_01_ixtime on data2011_01 (cost=0.57..78012.46 > rows=110204 width=55) (actual time=43.765..709.985 rows=104617 loops=1)" > " Index Cond: (("time" >= '2011-01-16 12:00:00'::timestamp without time > zone) AND ("time" < '2011-01-16 12:05:00'::timestamp without time zone))" > " Buffers: shared hit=79619 read=1482" > "Planning time: 0.131 ms" > "Execution time: 713.446 ms" > > > > On Mon, Feb 1, 2016 at 8:50 PM, Melvin Davidson <melvin6...@gmail.com> > wrote: > >> OK, one more thing to try. >> >> Please send the output of the following query to a file and attach so I >> can review. >> >> SELECT c.relname as idxname, >> i.indisclustered, >> i.indisvalid, >> i.indcollation >> FROM pg_index i >> JOIN pg_class c ON i.indexrelid = c.oid >> WHERE i.indrelid IN (SELECT oid >> FROM pg_class >> WHERE relkind = 'r' >> AND relname = 'data2011_01' >> OR relname = 'data2013_01w' >> ) >> ORDER BY 1; >> >> >> On Mon, Feb 1, 2016 at 5:51 PM, Yu Nie <niey...@gmail.com> wrote: >> >>> Yeah, it is really puzzling. :( >>> >>> No, I have never attempted to cluster the large table - as I mentioned >>> before the two tables were created almost exactly the same way (through a >>> c++ api), except with different data files. I only use theme for query, and >>> had never done any update/insert/delete since the creation. >>> >>> >>> >>> On Mon, Feb 1, 2016 at 4:41 PM, Melvin Davidson <melvin6...@gmail.com> >>> wrote: >>> >>>> At this point I can see no possible explanation why this would occur. >>>> >>>> Just a thought though, is it possible that data2011_01 was clustered on >>>> the index at some point but data2013_01w has not been clustered? >>>> If you cluster data2013_01w on the index, does the performance change? >>>> >>>> On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <niey...@gmail.com> wrote: >>>> >>>>> Melvin, >>>>> >>>>> Please see attached for the requests results. I ran two queries (each >>>>> with a different taxiid that is next to each other) for each table. Note >>>>> that for the large table one is much faster than the other because the >>>>> shared cache was used for the second query. This does not work however >>>>> for >>>>> the small table. >>>>> >>>>> Many thanks for your willingness to help! >>>>> >>>>> Best, Marco >>>>> >>>>> On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6...@gmail.com> >>>>> wrote: >>>>> >>>>>> Fine. Please rerun both explains and my queries and send ALL output >>>>>> to a file (F8 in PGADMIN Sql). Then attach the file. >>>>>> >>>>>> >>>>>> >>>>>> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone >>>>>> -------- Original message -------- >>>>>> From: Yu Nie <niey...@gmail.com> >>>>>> Date: 2/1/2016 15:17 (GMT-05:00) >>>>>> To: melvin6925 <melvin6...@gmail.com> >>>>>> Subject: Re: [GENERAL] strange sql behavior >>>>>> >>>>>> Yes, absolutely. >>>>>> >>>>>> On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> And were _your_ queries run on the same day at the same time within >>>>>>> a few seconds of each other? >>>>>>> >>>>>>> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone >>>>>>> -------- Original message -------- >>>>>>> From: Yu Nie <niey...@gmail.com> >>>>>>> Date: 2/1/2016 15:01 (GMT-05:00) >>>>>>> To: melvin6925 <melvin6...@gmail.com> >>>>>>> Cc: Bill Moran <wmo...@potentialtech.com>, >>>>>>> pgsql-general@postgresql.org >>>>>>> Subject: Re: [GENERAL] strange sql behavior >>>>>>> >>>>>>> Thanks, Bill and Melvin! >>>>>>> >>>>>>> Just some quick note/answers before I absorb all the information >>>>>>> provided by Bill. >>>>>>> >>>>>>> 1. I don't expect many users running queries against the tables, >>>>>>> especially for the small table - since I just created it this morning, >>>>>>> and >>>>>>> nobody know about it except myself. >>>>>>> >>>>>>> 2. The setting in the config: >>>>>>> >>>>>>> >>>>>>> shared_buffers = 512MB # min 128kB >>>>>>> work_mem = 128MB # min 64kB >>>>>>> >>>>>>> 3. I am running a Windows 7 with 24 GB RAM. and my postgresal is >>>>>>> 9.4. >>>>>>> >>>>>>> 4. here is the query I ran: >>>>>>> >>>>>>> SELECT n.nspname, >>>>>>> s.relname, >>>>>>> c.reltuples::bigint, >>>>>>> n_tup_ins, >>>>>>> n_tup_upd, >>>>>>> n_tup_del, >>>>>>> date_trunc('second', last_vacuum) as last_vacuum, >>>>>>> date_trunc('second', last_autovacuum) as last_autovacuum, >>>>>>> date_trunc('second', last_analyze) as last_analyze, >>>>>>> date_trunc('second', last_autoanalyze) as last_autoanalyze >>>>>>> , >>>>>>> round( >>>>>>> current_setting('autovacuum_vacuum_threshold')::integer + >>>>>>> current_setting('autovacuum_vacuum_scale_factor')::numeric * >>>>>>> C.reltuples) >>>>>>> AS av_threshold >>>>>>> FROM pg_stat_all_tables s >>>>>>> JOIN pg_class c ON c.oid = s.relid >>>>>>> JOIN pg_namespace n ON (n.oid = c.relnamespace) >>>>>>> WHERE s.relname NOT LIKE 'pg_%' >>>>>>> AND s.relname NOT LIKE 'sql_%' >>>>>>> AND s.relname IN ('data2013_01w', 'data2011_01') >>>>>>> ORDER by 1, 2; >>>>>>> >>>>>>> I copied the result from PGAdmin directly, here it is again.: >>>>>>> >>>>>>> public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19 >>>>>>> 17:31:08-06";156847423 >>>>>>> public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01 >>>>>>> 08:57:24-06";"2016-02-01 04:01:04-06";60157336 >>>>>>> >>>>>>> On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Bill. >>>>>>>> Also, it's very important to include the headers with the queries! >>>>>>>> >>>>>>>> Marco, >>>>>>>> There is no top secret information that is requested, so please do >>>>>>>> not edit the output. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone >>>>>>>> -------- Original message -------- >>>>>>>> From: Bill Moran <wmo...@potentialtech.com> >>>>>>>> Date: 2/1/2016 14:41 (GMT-05:00) >>>>>>>> To: Yu Nie <niey...@gmail.com> >>>>>>>> Cc: Melvin Davidson <melvin6...@gmail.com>, >>>>>>>> pgsql-general@postgresql.org >>>>>>>> Subject: Re: [GENERAL] strange sql behavior >>>>>>>> >>>>>>>> >>>>>>>> Came a little late to the thread, see many comments inline below: >>>>>>>> >>>>>>>> On Mon, 1 Feb 2016 13:16:13 -0600 >>>>>>>> Yu Nie <niey...@gmail.com> wrote: >>>>>>>> >>>>>>>> > Thanks a lot for your reply. I ran the query you suggested and >>>>>>>> here are >>>>>>>> > the results >>>>>>>> > >>>>>>>> > Large table: >>>>>>>> "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19 >>>>>>>> > 17:31:08-06";156847423 >>>>>>>> > Small table: >>>>>>>> "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01 >>>>>>>> > 08:57:24-06";"2016-02-01 04:01:04-06";60157336 >>>>>>>> >>>>>>>> You didn't do Melvin's query correctly. He specified 11 columns to >>>>>>>> select, but you only show 10. Since you don't show the query you >>>>>>>> actually ran, we don't know which of the numeric columns is missing. >>>>>>>> >>>>>>>> More information inline below: >>>>>>>> >>>>>>>> > >>>>>>>> > On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson < >>>>>>>> melvin6...@gmail.com> >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> > > One thing to look at is the last time both tables were >>>>>>>> vacuumed/analyzed. >>>>>>>> > > >>>>>>>> > > SELECT n.nspname, >>>>>>>> > > s.relname, >>>>>>>> > > c.reltuples::bigint, >>>>>>>> > > n_tup_ins, >>>>>>>> > > n_tup_upd, >>>>>>>> > > n_tup_del, >>>>>>>> > > date_trunc('second', last_vacuum) as last_vacuum, >>>>>>>> > > date_trunc('second', last_autovacuum) as last_autovacuum, >>>>>>>> > > date_trunc('second', last_analyze) as last_analyze, >>>>>>>> > > date_trunc('second', last_autoanalyze) as >>>>>>>> last_autoanalyze >>>>>>>> > > , >>>>>>>> > > round( >>>>>>>> current_setting('autovacuum_vacuum_threshold')::integer + >>>>>>>> > > current_setting('autovacuum_vacuum_scale_factor')::numeric * >>>>>>>> C.reltuples) >>>>>>>> > > AS av_threshold >>>>>>>> > > FROM pg_stat_all_tables s >>>>>>>> > > JOIN pg_class c ON c.oid = s.relid >>>>>>>> > > JOIN pg_namespace n ON (n.oid = c.relnamespace) >>>>>>>> > > WHERE s.relname NOT LIKE 'pg_%' >>>>>>>> > > AND s.relname NOT LIKE 'sql_%' >>>>>>>> > > AND s.relname IN ' "your_small_table", "your_large_table"' >>>>>>>> > > ORDER by 1, 2; >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > Also, please confirm the indexes for both tables are using the >>>>>>>> same method >>>>>>>> > > (btree?). >>>>>>>> > > >>>>>>>> > > On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <niey...@gmail.com> >>>>>>>> wrote: >>>>>>>> > > >>>>>>>> > >> Hi there, >>>>>>>> > >> >>>>>>>> > >> Recently I am working with a large amount of taxis GIS data >>>>>>>> and had >>>>>>>> > >> encountered some weird performance issues. I am hoping >>>>>>>> someone in this >>>>>>>> > >> community can help me figure it out. >>>>>>>> > >> >>>>>>>> > >> The taxi data were loaded in 5 minute block into a table. I >>>>>>>> have two >>>>>>>> > >> separate such tables, one stores a month of data with about >>>>>>>> 700 million >>>>>>>> > >> rows, another stores about 10 days of data with about 300 >>>>>>>> million rows. >>>>>>>> > >> The two tables have the exactly same schema and indexes. There >>>>>>>> are two >>>>>>>> > >> indexes: one on taxiid (text), and the other on the time stamp >>>>>>>> (date >>>>>>>> > >> time). In order to process the data, I need to get all points >>>>>>>> for a single >>>>>>>> > >> taxis; to do that, I use something like: >>>>>>>> > >> select * from table1 where taxiid = 'SZB00S41' order by time; >>>>>>>> > >> What puzzled me greatly is that this query runs consistently >>>>>>>> much faster >>>>>>>> > >> for the large table than for the small table, which seems to >>>>>>>> contradict >>>>>>>> > >> with intuition. At the end of message you may find explain >>>>>>>> (analyze >>>>>>>> > >> buffer) results of two particular queries for the same taxiid >>>>>>>> (one for each >>>>>>>> > >> table). You can see that it took much longer (more than 20 >>>>>>>> times) to get >>>>>>>> > >> 20k rows from the small table than to get 44 k rows from the >>>>>>>> large table. >>>>>>>> > >> Interestingly it seems that the planner does expect about 1/3 >>>>>>>> work for the >>>>>>>> > >> small table query - yet for some reason, it took much longer >>>>>>>> to fetch the >>>>>>>> > >> rows from the small table. Why there is such a huge >>>>>>>> performance between >>>>>>>> > >> the two seemingly identical queries executed on two different >>>>>>>> tables? >>>>>>>> > >> >>>>>>>> > >> Is is because the data on the second table is on some >>>>>>>> mysteriously >>>>>>>> > >> "broken part" of the disk? what else could explain such a >>>>>>>> bizarre >>>>>>>> > >> behavior? Your help is greatly appreciated. >>>>>>>> > >> >>>>>>>> > >> The above behavior is consistent through all queries. >>>>>>>> Another issue I >>>>>>>> > >> identified is that for the large table, the query can use the >>>>>>>> shared buffer >>>>>>>> > >> more effectively. For example, after you query one taxiid and >>>>>>>> immediately >>>>>>>> > >> following that query run the same query for another taxi whose >>>>>>>> id ranks >>>>>>>> > >> right behind the first id, then shared hit buffers would be >>>>>>>> quite high (and >>>>>>>> > >> the query would run much faster); this however never works for >>>>>>>> the small >>>>>>>> > >> table. >>>>>>>> > >> >>>>>>>> > >> Thanks a lot! >>>>>>>> > >> >>>>>>>> > >> Best, Marco >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> Results for the small table: it took 141 seconds to finish. >>>>>>>> The planning >>>>>>>> > >> time is 85256.31 >>>>>>>> > >> >>>>>>>> > >> "Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual >>>>>>>> > >> time=141419.499..141420.025 rows=20288 loops=1)" >>>>>>>> > >> " Sort Key: "time"" >>>>>>>> > >> " Sort Method: quicksort Memory: 3622kB" >>>>>>>> > >> " Buffers: shared hit=92 read=19816" >>>>>>>> > >> " -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27 >>>>>>>> > >> rows=22101 width=55) (actual time=50.762..141374.777 >>>>>>>> rows=20288 loops=1)" >>>>>>>> > >> " Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)" >>>>>>>> > >> " Heap Blocks: exact=19826" >>>>>>>> > >> " Buffers: shared hit=92 read=19816" >>>>>>>> >>>>>>>> ^^ >>>>>>>> Note that despite this table being smaller, Postgres had to read >>>>>>>> 19816 >>>>>>>> blocks from disk, which is 2.5x more than the larger table. >>>>>>>> >>>>>>>> > >> " -> Bitmap Index Scan on data2013_01w_ixtaxiid >>>>>>>> > >> (cost=0.00..510.33 rows=22101 width=0) (actual >>>>>>>> time=26.053..26.053 >>>>>>>> > >> rows=20288 loops=1)" >>>>>>>> > >> " Index Cond: ((taxiid)::text = 'SZB00S41'::text)" >>>>>>>> > >> " Buffers: shared hit=4 read=78" >>>>>>>> > >> "Planning time: 0.144 ms" >>>>>>>> > >> "Execution time: 141421.154 ms" >>>>>>>> > >> >>>>>>>> > >> Results for the large table: it took 5 seconds to finish. The >>>>>>>> planning >>>>>>>> > >> time is 252077.10 >>>>>>>> > >> "Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual >>>>>>>> > >> time=5038.571..5039.765 rows=44204 loops=1)" >>>>>>>> > >> " Sort Key: "time"" >>>>>>>> > >> " Sort Method: quicksort Memory: 7753kB" >>>>>>>> > >> " Buffers: shared hit=2 read=7543" >>>>>>>> > >> " -> Bitmap Heap Scan on data2011_01 >>>>>>>> (cost=1520.29..246672.53 >>>>>>>> > >> rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 >>>>>>>> loops=1)" >>>>>>>> > >> " Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)" >>>>>>>> > >> " Heap Blocks: exact=7372" >>>>>>>> > >> " Buffers: shared hit=2 read=7543" >>>>>>>> > >> " -> Bitmap Index Scan on data2011_01_ixtaxiid >>>>>>>> > >> (cost=0.00..1503.92 rows=65512 width=0) (actual >>>>>>>> time=35.792..35.792 >>>>>>>> > >> rows=44204 loops=1)" >>>>>>>> >>>>>>>> ^^ >>>>>>>> Note that the larger table took LONGER to do the index work than the >>>>>>>> smaller table, which probably means there's nothing wrong with your >>>>>>>> disks or anything ... that's the behavior I would expect. >>>>>>>> >>>>>>>> > >> " Index Cond: ((taxiid)::text = 'SZB00S41'::text)" >>>>>>>> > >> " Buffers: shared hit=2 read=171" >>>>>>>> > >> "Planning time: 0.127 ms" >>>>>>>> > >> "Execution time: 5042.134 ms" >>>>>>>> >>>>>>>> So, what I'm seeing, is that Postgres is able to figure out _which_ >>>>>>>> rows >>>>>>>> to fetch faster on the small table than the large table, which is >>>>>>>> what >>>>>>>> you would expect, since a smaller index should be faster than a >>>>>>>> large one. >>>>>>>> >>>>>>>> However, when it goes to actually fetch the row data, it takes >>>>>>>> significantly longer on the small table, despite the fact that it's >>>>>>>> only fetching 1/3 as many rows. It is, however, doing 2.5x as many >>>>>>>> disk reads to get those rows: For the large table, it reads 61MB >>>>>>>> from >>>>>>>> disk, but it reads 160MB to get all the data for the smaller table. >>>>>>>> >>>>>>>> How the data was inserted into each table could lead to similar data >>>>>>>> being clustered on common pages on the large table, while it's >>>>>>>> spread >>>>>>>> across many more pages on the small table. >>>>>>>> >>>>>>>> That still doesn't explain it all, though. 2.5x the disk >>>>>>>> activity normally wouldn't equate to 28x the time required. Unless >>>>>>>> you're disks are horrifically slow? Does this server have a lot of >>>>>>>> other activity against the disks? I.e. are other people running >>>>>>>> queries that you would have to contend with, or is the server a VM >>>>>>>> sharing storage with other VMs, or even a combined use server that >>>>>>>> has to share disk access with (for example) a web or mail server >>>>>>>> as well? Is the performance difference consistently ~28x? >>>>>>>> >>>>>>>> Other things: what is shared_buffers set to? The queries would seem >>>>>>>> to indicate that this server has less than 1M of those two tables >>>>>>>> cached in memory at the time you ran those queries, which seems to >>>>>>>> suggest that either you've got shared_buffers set very low, or that >>>>>>>> there are a lot of other tables that other queries are accessing at >>>>>>>> the time you're running these. Perhaps installing pg_buffercache to >>>>>>>> have a look at what's using your shared_buffers would be helpful. >>>>>>>> >>>>>>>> -- >>>>>>>> Bill Moran >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Melvin Davidson* >>>> I reserve the right to fantasize. Whether or not you >>>> wish to share my fantasy is entirely up to you. >>>> >>> >>> >> >> >> -- >> *Melvin Davidson* >> I reserve the right to fantasize. Whether or not you >> wish to share my fantasy is entirely up to you. >> > > -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.