Re: [GENERAL] Processor usage/tuning question

Israel Brewster Mon, 13 Oct 2014 16:41:45 -0700

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

On Oct 10, 2014, at 1:04 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> On 10/8/14, 3:17 PM, Israel Brewster wrote:
>> Except that the last data point received is still valid as the aircraft's 
>> current location, even if it came in several hours ago - which we may well 
>> care about. That is, some users may need to see where a given aircraft (or 
>> the entire fleet) is, even if an aircraft hasn't updated in a while. That 
>> said, I did discuss this with my higher-ups, and got the ok to take it down 
>> to four hours.
> 
> Note that in your explain output nothing is filtering by time at all; are you 
> sure you posted the right explain?
> 
>>> I don't think PostgreSQL is going to be able to reason very effectively 
>>> about a ROW_NUMBER() in a inner table and then a row<=5 in the outer one 
>>> being equivalent to a LIMIT query for which it could walk an index and then 
>>> stopping once it finds 5 of them.
>>> 
>>> Does this need to issued as a single query?  Why not issue 55 different 
>>> queries?  It seems like the client is likely going to need to pick the 
>>> returned list back out by tail number anyway, so both the client and the 
>>> server might be happier with separate queries.
>> 
>> Good point. Doing that brought the entire execution time down to around 
>> 60ms. Just ever so slightly better than the ~1200ms I was getting before. 
>> :-) I just have an unreasonable aversion to making multiple queries. I guess 
>> this is a prime example of why I need to overcome that :-)
> 
> Do you actually need the last 5 points? If you could get away with just the 
> most recent point, SELECT DISTINCT ON might do a better job of this in a 
> single query.

At the moment, unfortunately yes - I have to do some calculations based on the 
past few data points. At some point I should be able to re-work the system such 
that said calculations are done when the points are saved, rather than when 
they are retrieved, which would be beneficial for a number of reasons. However, 
until I can get that done I need multiple points here.

> 
> As for the concern about issuing multiple queries, if you code this into a 
> database function it should still be quite fast because there won't be any 
> round-trip between your application and the database.

I've had phenomenally bad luck with coding queries into database functions. I 
had a number of functions written at one point that allowed me to do things 
like select <table>.function, <other_column> FROM table - until I noticed that 
said queries ran significantly slower than just doing the query I had encoded 
in the function as a sub-query instead. I was doing these same sub-queries in a 
bunch of different places, so I figured it would clarify things if I could just 
code them into a DB function that I called just like a column. It's been a 
while since I looked at those, however, so I can't say why they were slow. This 
usage may not suffer from the same problem.

> 
> Something else to consider is having a second table that only keeps the last 
> X aircraft positions. I would do this by duplicating every insert into that 
> table via a trigger, and then have a separate process that ran once a minute 
> to delete any records other than the newest X. Because that table would 
> always be tiny queries against it should be blazing fast. Do note that you'll 
> want to vacuum that table frequently, like right after each delete.

Indeed. I think I'm happy with the performance of the multiple queries, but 
this would doubtless be the "best" option (from a performance standpoint), as 
the table would be small and my select would be essentially SELECT * FROM 
TABLE, with a potential WHERE ... IN... clause.

Thanks for all the help!

> -- 
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Processor usage/tuning question

Reply via email to