On 01/06/2014 12:41 AM, Ann Harrison wrote:
Alex,
> Furthermore, despite the everyone's instinct, it's a good deal
faster in
> the general case to read a table in an optimal order and sort
the data
> in memory that to read the data in index order - random order
relative
> to storage.
>
Ann, from server POV you are definitely right. But when talking about
ways to get first record from server as fast as possible we care about
not server-only performance, but about performance of overall
client-server system.
When the server is bogged down running queries inefficiently, the overall
system performance does suffer. However, if all you want is the first few
records, reading hundreds of thousands and sorting them is wasteful - and
hard on the client. That's why Firebird's optimizer handles a
restriction on
the number of rows to be returned as an optimizer hint to navigate through
the index, rather than doing a natural read and sorting the results.
It's very bad idea to rely on the current optimizer behavior. I can
easily imagine that in a case of constant for number of returned rows
future optimizer will take it into an account and if (for example) it's
greater than total number of records in MOVIES it can use optimal from
server POV plan - SORT instead ORDER. It's like using "tbl.fld + 0 =
constant" to avoid use of index, and this is not what may be called
stable and portable way of providing hints to optimizer.
We have very simple table:
create table MOVIES
(NAME varchar(255) primary key,
COMMENTS varchar(8192),
ISO_IMAGE blob);
(A good example of the problems with natural primary keys.)
Yes certainly - I just did not plan to provide ideal style. Certainly
here we can have explicit index instead PK.
And want to run such a query:
select * from MOVIES where COMMENTS like '%yacht%' order by NAME;
Two plans are possible:
PLAN SORT (MOVIES NATURAL)
PLAN (MOVIES ORDER RDB$PRIMARY1)
Right. And if you want to give the optimizer a hint that it should choose
the second plan, change the query like this:
select first 1000000 * from MOVIES where COMMENTS like '%yacht%
order by NAME;
If you think your client may want more than a million rows, increase
that number.
As far as I know, Firebird's optimizer doesn't do anything clever like
trying to guess
what part of the table the "first <n>" represents - ask it for the
first 1 or the first
ten million, it will still choose the navigational path. So ask it
for ten times as many
records as you expect to find. Probably a bad idea to exceed the size
of an int64.
Returning to my sample we can provide a number of available empty
writable DVDs in DVD changer :)
But this does not change the fact that using SQL operators, originally
designed for some other purpose, is not a best way to provide optimizer
hints.
Cause we need to scan all the table first plan appears to be
better - it
requires less disk operations even taking into an account sort after
scanning all the table. But overall throughput depends upon what
are we
going to do with ISO_IMAGE at client side. Imagine we want to burn
DVDs
with all images (and have the set of DVDs sorted after burning).
Overall throughput also depends on what else is going on with the system -
if you've got lots of read queries that are hitting the disk hard,
making your
burn program wait a few seconds for its first results could improve the
performance of the system overall.
Certainly. This is what I call a detail which can't be predicted by any
optimizer taking into an account server-only aspects.
But that's moot. You have the ability
to ask for navigational access through the index, if that's what you think
you need.
If I understand you correctly this is FIRST clause? But this is what I
want to avoid...
Certainly this does not mean that for all queries, containing blobs,
natural scan should not be used :-)
And again, for the naive, the blobs aren't accessed until the records are
read, qualified by the comment containing 'yacht' - in this case, the non-
standard CONTAINING would be better because it is not case sensitive.
Certainly - but this is not important fact for a problem we discuss.
Ideally here should be used comparison which takes into an account sound
produced by word "yacht".
Therefore I agree with Dmitry - such a hint to
optimizer is required part of SELECT statement if we want to have
optimal performance for all the system, not for server only.
And I guess I agree also, but think that the hint is already part of
Firebird - as either FIRST or LIMIT or ROWS or whatever other silly
syntax exists for restricting the number of records returned. Remember,
the hint is needed only when the query is sorted and there is an index
with the same keys in the same order and direction as the sort and
that index can be used.
If this is the only question I want to repeat - in the future the way
how first/limit affects optimizer behavior may change, and therefore in
a case when we are not going to just limit a number of returned rows but
make first row be returned faster it's better to have explicit way to do
it instead of using very big values in first.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel