Re: [GENERAL] strange sql behavior

Melvin Davidson Tue, 02 Feb 2016 07:24:38 -0800

Hmm, the last index query you attached verified nothing was clustered.

Just one more thing to look at.  Please send the results of each of the
following to a file and attach.


EXPLAIN ANALYZE SELECT * FROM data2011_01 WHERE taxiid::text =
'SZB07P23'::text;

EXPLAIN ANALYZE SELECT * FROM data2013_01w WHERE taxiid::text =
'SZB07P23'::text;;


On Tue, Feb 2, 2016 at 10:16 AM, Yu Nie <niey...@gmail.com> wrote:

> One more observation:  not sure if it helps with the diagnosis.
>
> If I query based on time interval, not based on taxiid, the query is
> generally faster - I believe it is due to the way the data were loaded.
> Yet there is still significant performance difference: the large table
> seems runs faster than the small table.  More importantly, this time the
> planner proposes two very different plans for the two tables. Not sure why
> Bitmap index was used for the small table...
>
>
> See results below:
>
> explain (analyze, buffers) select * from data2011_01 where time >=
> '2011-01-16 12:00:00' and time < '2011-01-16 12:05:00' order by time;
>
> For small table.
> "Sort  (cost=529024.61..529411.73 rows=154848 width=55) (actual
> time=3761.329..3765.501 rows=149807 loops=1)"
> "  Sort Key: "time""
> "  Sort Method: quicksort  Memory: 27211kB"
> "  Buffers: shared hit=39 read=4304"
> "  ->  Bitmap Heap Scan on data2013_01w  (cost=3287.76..515676.33
> rows=154848 width=55) (actual time=29.028..3714.592 rows=149807 loops=1)"
> "        Recheck Cond: (("time" >= '2013-01-16 12:00:00'::timestamp
> without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without
> time zone))"
> "        Heap Blocks: exact=3929"
> "        Buffers: shared hit=39 read=4304"
> "        ->  Bitmap Index Scan on data2013_01w_ixtime  (cost=0.00..3249.05
> rows=154848 width=0) (actual time=17.931..17.931 rows=149807 loops=1)"
> "              Index Cond: (("time" >= '2013-01-16 12:00:00'::timestamp
> without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without
> time zone))"
> "              Buffers: shared hit=1 read=413"
> "Planning time: 0.203 ms"
> "Execution time: 3779.368 ms"
>
>
> For large table.
> "Index Scan using data2011_01_ixtime on data2011_01  (cost=0.57..78012.46
> rows=110204 width=55) (actual time=43.765..709.985 rows=104617 loops=1)"
> "  Index Cond: (("time" >= '2011-01-16 12:00:00'::timestamp without time
> zone) AND ("time" < '2011-01-16 12:05:00'::timestamp without time zone))"
> "  Buffers: shared hit=79619 read=1482"
> "Planning time: 0.131 ms"
> "Execution time: 713.446 ms"
>
>
>
> On Mon, Feb 1, 2016 at 8:50 PM, Melvin Davidson <melvin6...@gmail.com>
> wrote:
>
>> OK, one more thing to try.
>>
>> Please send the output of the following query to a file and attach so I
>> can review.
>>
>> SELECT c.relname as idxname,
>>                 i.indisclustered,
>>                 i.indisvalid,
>>        i.indcollation
>>     FROM pg_index i
>>       JOIN pg_class c ON i.indexrelid = c.oid
>>  WHERE i.indrelid IN (SELECT oid
>>                                        FROM pg_class
>>                                     WHERE relkind = 'r'
>>                                           AND relname = 'data2011_01'
>>                                             OR relname = 'data2013_01w'
>>                                    )
>> ORDER BY 1;
>>
>>
>> On Mon, Feb 1, 2016 at 5:51 PM, Yu Nie <niey...@gmail.com> wrote:
>>
>>> Yeah, it is really puzzling. :(
>>>
>>> No, I have never attempted to cluster the large table - as I mentioned
>>> before the two tables were created almost exactly the same way (through a
>>> c++ api), except with different data files. I only use theme for query, and
>>> had never done any update/insert/delete since the creation.
>>>
>>>
>>>
>>> On Mon, Feb 1, 2016 at 4:41 PM, Melvin Davidson <melvin6...@gmail.com>
>>> wrote:
>>>
>>>> At this point I can see no possible explanation why this would occur.
>>>>
>>>> Just a thought though, is it possible that data2011_01 was clustered on
>>>> the index at some point but data2013_01w has not been clustered?
>>>> If you cluster data2013_01w on the index, does the performance change?
>>>>
>>>> On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <niey...@gmail.com> wrote:
>>>>
>>>>> Melvin,
>>>>>
>>>>> Please see attached for the requests results. I ran two queries (each
>>>>> with a different taxiid that is next to each other) for each table.  Note
>>>>> that for the large table one is much faster than the other because the
>>>>> shared cache was used for the second query.  This does not work however 
>>>>> for
>>>>> the small table.
>>>>>
>>>>> Many thanks for your willingness to help!
>>>>>
>>>>> Best, Marco
>>>>>
>>>>> On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Fine. Please rerun both explains and my queries and send ALL output
>>>>>> to a file (F8 in PGADMIN Sql). Then attach the file.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
>>>>>> -------- Original message --------
>>>>>> From: Yu Nie <niey...@gmail.com>
>>>>>> Date: 2/1/2016 15:17 (GMT-05:00)
>>>>>> To: melvin6925 <melvin6...@gmail.com>
>>>>>> Subject: Re: [GENERAL] strange sql behavior
>>>>>>
>>>>>> Yes, absolutely.
>>>>>>
>>>>>> On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> And were _your_ queries run on the same day at the same time within
>>>>>>> a few seconds of each other?
>>>>>>>
>>>>>>> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
>>>>>>> -------- Original message --------
>>>>>>> From: Yu Nie <niey...@gmail.com>
>>>>>>> Date: 2/1/2016 15:01 (GMT-05:00)
>>>>>>> To: melvin6925 <melvin6...@gmail.com>
>>>>>>> Cc: Bill Moran <wmo...@potentialtech.com>,
>>>>>>> pgsql-general@postgresql.org
>>>>>>> Subject: Re: [GENERAL] strange sql behavior
>>>>>>>
>>>>>>> Thanks, Bill and Melvin!
>>>>>>>
>>>>>>> Just some quick note/answers before I absorb all the information
>>>>>>> provided by Bill.
>>>>>>>
>>>>>>> 1. I don't expect many users running queries against the tables,
>>>>>>> especially for the small table - since I just created it this morning, 
>>>>>>> and
>>>>>>> nobody know about it except myself.
>>>>>>>
>>>>>>> 2. The setting in the config:
>>>>>>>
>>>>>>>
>>>>>>> shared_buffers = 512MB            # min 128kB
>>>>>>> work_mem = 128MB                # min 64kB
>>>>>>>
>>>>>>> 3. I am running a Windows 7 with 24 GB RAM.  and my postgresal is
>>>>>>> 9.4.
>>>>>>>
>>>>>>> 4.  here is the query I ran:
>>>>>>>
>>>>>>> SELECT n.nspname,
>>>>>>>        s.relname,
>>>>>>>        c.reltuples::bigint,
>>>>>>>        n_tup_ins,
>>>>>>>        n_tup_upd,
>>>>>>>        n_tup_del,
>>>>>>>        date_trunc('second', last_vacuum) as last_vacuum,
>>>>>>>        date_trunc('second', last_autovacuum) as last_autovacuum,
>>>>>>>        date_trunc('second', last_analyze) as last_analyze,
>>>>>>>        date_trunc('second', last_autoanalyze) as last_autoanalyze
>>>>>>>        ,
>>>>>>>        round(
>>>>>>> current_setting('autovacuum_vacuum_threshold')::integer +
>>>>>>> current_setting('autovacuum_vacuum_scale_factor')::numeric * 
>>>>>>> C.reltuples)
>>>>>>> AS av_threshold
>>>>>>>   FROM pg_stat_all_tables s
>>>>>>>   JOIN pg_class c ON c.oid = s.relid
>>>>>>>   JOIN pg_namespace n ON (n.oid = c.relnamespace)
>>>>>>>  WHERE s.relname NOT LIKE 'pg_%'
>>>>>>>    AND s.relname NOT LIKE 'sql_%'
>>>>>>>    AND s.relname IN  ('data2013_01w', 'data2011_01')
>>>>>>>  ORDER by 1, 2;
>>>>>>>
>>>>>>> I copied the result from PGAdmin directly, here it is again.:
>>>>>>>
>>>>>>> public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
>>>>>>> 17:31:08-06";156847423
>>>>>>> public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
>>>>>>> 08:57:24-06";"2016-02-01 04:01:04-06";60157336
>>>>>>>
>>>>>>> On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Bill.
>>>>>>>> Also, it's very important to include the headers with the queries!
>>>>>>>>
>>>>>>>> Marco,
>>>>>>>> There is no top secret information that is requested, so please do
>>>>>>>> not edit the output.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
>>>>>>>> -------- Original message --------
>>>>>>>> From: Bill Moran <wmo...@potentialtech.com>
>>>>>>>> Date: 2/1/2016 14:41 (GMT-05:00)
>>>>>>>> To: Yu Nie <niey...@gmail.com>
>>>>>>>> Cc: Melvin Davidson <melvin6...@gmail.com>,
>>>>>>>> pgsql-general@postgresql.org
>>>>>>>> Subject: Re: [GENERAL] strange sql behavior
>>>>>>>>
>>>>>>>>
>>>>>>>> Came a little late to the thread, see many comments inline below:
>>>>>>>>
>>>>>>>> On Mon, 1 Feb 2016 13:16:13 -0600
>>>>>>>> Yu Nie <niey...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> > Thanks  a lot for your reply. I ran the query you suggested and
>>>>>>>> here are
>>>>>>>> > the results
>>>>>>>> >
>>>>>>>> > Large table:
>>>>>>>> "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
>>>>>>>> > 17:31:08-06";156847423
>>>>>>>> > Small table:
>>>>>>>> "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
>>>>>>>> > 08:57:24-06";"2016-02-01 04:01:04-06";60157336
>>>>>>>>
>>>>>>>> You didn't do Melvin's query correctly. He specified 11 columns to
>>>>>>>> select, but you only show 10. Since you don't show the query you
>>>>>>>> actually ran, we don't know which of the numeric columns is missing.
>>>>>>>>
>>>>>>>> More information inline below:
>>>>>>>>
>>>>>>>> >
>>>>>>>> > On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <
>>>>>>>> melvin6...@gmail.com>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > > One thing to look at is the last time both tables were
>>>>>>>> vacuumed/analyzed.
>>>>>>>> > >
>>>>>>>> > > SELECT n.nspname,
>>>>>>>> > >        s.relname,
>>>>>>>> > >        c.reltuples::bigint,
>>>>>>>> > >        n_tup_ins,
>>>>>>>> > >        n_tup_upd,
>>>>>>>> > >        n_tup_del,
>>>>>>>> > >        date_trunc('second', last_vacuum) as last_vacuum,
>>>>>>>> > >        date_trunc('second', last_autovacuum) as last_autovacuum,
>>>>>>>> > >        date_trunc('second', last_analyze) as last_analyze,
>>>>>>>> > >        date_trunc('second', last_autoanalyze) as
>>>>>>>> last_autoanalyze
>>>>>>>> > >        ,
>>>>>>>> > >        round(
>>>>>>>> current_setting('autovacuum_vacuum_threshold')::integer +
>>>>>>>> > > current_setting('autovacuum_vacuum_scale_factor')::numeric *
>>>>>>>> C.reltuples)
>>>>>>>> > > AS av_threshold
>>>>>>>> > >   FROM pg_stat_all_tables s
>>>>>>>> > >   JOIN pg_class c ON c.oid = s.relid
>>>>>>>> > >   JOIN pg_namespace n ON (n.oid = c.relnamespace)
>>>>>>>> > >  WHERE s.relname NOT LIKE 'pg_%'
>>>>>>>> > >    AND s.relname NOT LIKE 'sql_%'
>>>>>>>> > >    AND s.relname IN  ' "your_small_table", "your_large_table"'
>>>>>>>> > >  ORDER by 1, 2;
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > Also, please confirm the indexes for both tables are using the
>>>>>>>> same method
>>>>>>>> > > (btree?).
>>>>>>>> > >
>>>>>>>> > > On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <niey...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > >> Hi there,
>>>>>>>> > >>
>>>>>>>> > >> Recently I am working with a large amount of taxis GIS data
>>>>>>>> and had
>>>>>>>> > >> encountered some weird performance issues.  I am hoping
>>>>>>>> someone in this
>>>>>>>> > >> community can help me figure it out.
>>>>>>>> > >>
>>>>>>>> > >> The taxi data were loaded in 5 minute block into a table.  I
>>>>>>>> have two
>>>>>>>> > >> separate such tables, one stores a month of data with about
>>>>>>>> 700 million
>>>>>>>> > >> rows, another stores about 10 days of data with about 300
>>>>>>>> million rows.
>>>>>>>> > >> The two tables have the exactly same schema and indexes. There
>>>>>>>> are two
>>>>>>>> > >> indexes: one on taxiid (text), and the other on the time stamp
>>>>>>>> (date
>>>>>>>> > >> time).  In order to process the data, I need to get all points
>>>>>>>> for a single
>>>>>>>> > >> taxis; to do that, I use something like:
>>>>>>>> > >>  select * from table1 where taxiid = 'SZB00S41' order by time;
>>>>>>>> > >> What puzzled me greatly is that this query runs consistently
>>>>>>>> much faster
>>>>>>>> > >> for the large table than for the small table, which seems to
>>>>>>>> contradict
>>>>>>>> > >> with intuition.   At the end of message you may find explain
>>>>>>>> (analyze
>>>>>>>> > >> buffer) results of two particular queries for the same taxiid
>>>>>>>> (one for each
>>>>>>>> > >> table). You can see that it took much longer (more than 20
>>>>>>>> times) to get
>>>>>>>> > >> 20k rows from the small table than to get 44 k rows from the
>>>>>>>> large table.
>>>>>>>> > >> Interestingly it seems that the planner does expect about 1/3
>>>>>>>> work for the
>>>>>>>> > >> small table query - yet for some reason, it took much longer
>>>>>>>> to fetch the
>>>>>>>> > >> rows from the small table.   Why there is such a huge
>>>>>>>> performance between
>>>>>>>> > >> the two seemingly identical queries executed on two different
>>>>>>>> tables?
>>>>>>>> > >>
>>>>>>>> > >> Is is because the data on the second table is on some
>>>>>>>> mysteriously
>>>>>>>> > >> "broken part" of the disk?  what else could explain such a
>>>>>>>> bizarre
>>>>>>>> > >> behavior?  Your help is greatly appreciated.
>>>>>>>> > >>
>>>>>>>> > >> The above behavior is consistent through all queries.
>>>>>>>> Another issue I
>>>>>>>> > >> identified is that for the large table, the query can use the
>>>>>>>> shared buffer
>>>>>>>> > >> more effectively.  For example, after you query one taxiid and
>>>>>>>> immediately
>>>>>>>> > >> following that query run the same query for another taxi whose
>>>>>>>> id ranks
>>>>>>>> > >> right behind the first id, then shared hit buffers would be
>>>>>>>> quite high (and
>>>>>>>> > >> the query would run much faster); this however never works for
>>>>>>>> the small
>>>>>>>> > >> table.
>>>>>>>> > >>
>>>>>>>> > >> Thanks   a lot!
>>>>>>>> > >>
>>>>>>>> > >> Best, Marco
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> Results for the small table: it took 141 seconds to finish.
>>>>>>>> The planning
>>>>>>>> > >> time is 85256.31
>>>>>>>> > >>
>>>>>>>> > >> "Sort  (cost=85201.05..85256.31 rows=22101 width=55) (actual
>>>>>>>> > >> time=141419.499..141420.025 rows=20288 loops=1)"
>>>>>>>> > >> "  Sort Key: "time""
>>>>>>>> > >> "  Sort Method: quicksort  Memory: 3622kB"
>>>>>>>> > >> "  Buffers: shared hit=92 read=19816"
>>>>>>>> > >> "  ->  Bitmap Heap Scan on data2013_01w  (cost=515.86..83606.27
>>>>>>>> > >> rows=22101 width=55) (actual time=50.762..141374.777
>>>>>>>> rows=20288 loops=1)"
>>>>>>>> > >> "        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
>>>>>>>> > >> "        Heap Blocks: exact=19826"
>>>>>>>> > >> "        Buffers: shared hit=92 read=19816"
>>>>>>>>
>>>>>>>> ^^
>>>>>>>> Note that despite this table being smaller, Postgres had to read
>>>>>>>> 19816
>>>>>>>> blocks from disk, which is 2.5x more than the larger table.
>>>>>>>>
>>>>>>>> > >> "        ->  Bitmap Index Scan on data2013_01w_ixtaxiid
>>>>>>>> > >> (cost=0.00..510.33 rows=22101 width=0) (actual
>>>>>>>> time=26.053..26.053
>>>>>>>> > >> rows=20288 loops=1)"
>>>>>>>> > >> "              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
>>>>>>>> > >> "              Buffers: shared hit=4 read=78"
>>>>>>>> > >> "Planning time: 0.144 ms"
>>>>>>>> > >> "Execution time: 141421.154 ms"
>>>>>>>> > >>
>>>>>>>> > >> Results for the large table: it took 5 seconds to finish.  The
>>>>>>>> planning
>>>>>>>> > >> time is 252077.10
>>>>>>>> > >> "Sort  (cost=251913.32..252077.10 rows=65512 width=55) (actual
>>>>>>>> > >> time=5038.571..5039.765 rows=44204 loops=1)"
>>>>>>>> > >> "  Sort Key: "time""
>>>>>>>> > >> "  Sort Method: quicksort  Memory: 7753kB"
>>>>>>>> > >> "  Buffers: shared hit=2 read=7543"
>>>>>>>> > >> "  ->  Bitmap Heap Scan on data2011_01
>>>>>>>> (cost=1520.29..246672.53
>>>>>>>> > >> rows=65512 width=55) (actual time=36.935..5017.463 rows=44204
>>>>>>>> loops=1)"
>>>>>>>> > >> "        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
>>>>>>>> > >> "        Heap Blocks: exact=7372"
>>>>>>>> > >> "        Buffers: shared hit=2 read=7543"
>>>>>>>> > >> "        ->  Bitmap Index Scan on data2011_01_ixtaxiid
>>>>>>>> > >> (cost=0.00..1503.92 rows=65512 width=0) (actual
>>>>>>>> time=35.792..35.792
>>>>>>>> > >> rows=44204 loops=1)"
>>>>>>>>
>>>>>>>> ^^
>>>>>>>> Note that the larger table took LONGER to do the index work than the
>>>>>>>> smaller table, which probably means there's nothing wrong with your
>>>>>>>> disks or anything ... that's the behavior I would expect.
>>>>>>>>
>>>>>>>> > >> "              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
>>>>>>>> > >> "              Buffers: shared hit=2 read=171"
>>>>>>>> > >> "Planning time: 0.127 ms"
>>>>>>>> > >> "Execution time: 5042.134 ms"
>>>>>>>>
>>>>>>>> So, what I'm seeing, is that Postgres is able to figure out _which_
>>>>>>>> rows
>>>>>>>> to fetch faster on the small table than the large table, which is
>>>>>>>> what
>>>>>>>> you would expect, since a smaller index should be faster than a
>>>>>>>> large one.
>>>>>>>>
>>>>>>>> However, when it goes to actually fetch the row data, it takes
>>>>>>>> significantly longer on the small table, despite the fact that it's
>>>>>>>> only fetching 1/3 as many rows. It is, however, doing 2.5x as many
>>>>>>>> disk reads to get those rows: For the large table, it reads 61MB
>>>>>>>> from
>>>>>>>> disk, but it reads 160MB to get all the data for the smaller table.
>>>>>>>>
>>>>>>>> How the data was inserted into each table could lead to similar data
>>>>>>>> being clustered on common pages on the large table, while it's
>>>>>>>> spread
>>>>>>>> across many more pages on the small table.
>>>>>>>>
>>>>>>>> That still doesn't explain it all, though. 2.5x the disk
>>>>>>>> activity normally wouldn't equate to 28x the time required. Unless
>>>>>>>> you're disks are horrifically slow? Does this server have a lot of
>>>>>>>> other activity against the disks? I.e. are other people running
>>>>>>>> queries that you would have to contend with, or is the server a VM
>>>>>>>> sharing storage with other VMs, or even a combined use server that
>>>>>>>> has to share disk access with (for example) a web or mail server
>>>>>>>> as well? Is the performance difference consistently ~28x?
>>>>>>>>
>>>>>>>> Other things: what is shared_buffers set to? The queries would seem
>>>>>>>> to indicate that this server has less than 1M of those two tables
>>>>>>>> cached in memory at the time you ran those queries, which seems to
>>>>>>>> suggest that either you've got shared_buffers set very low, or that
>>>>>>>> there are a lot of other tables that other queries are accessing at
>>>>>>>> the time you're running these. Perhaps installing pg_buffercache to
>>>>>>>> have a look at what's using your shared_buffers would be helpful.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Bill Moran
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Melvin Davidson*
>>>> I reserve the right to fantasize.  Whether or not you
>>>> wish to share my fantasy is entirely up to you.
>>>>
>>>
>>>
>>
>>
>> --
>> *Melvin Davidson*
>> I reserve the right to fantasize.  Whether or not you
>> wish to share my fantasy is entirely up to you.
>>
>
>


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] strange sql behavior

Reply via email to