On Fri, 2001-11-09 at 07:20, Linas Vepstas wrote: > Hi Dave, > > May I hassle you some more?
Sure :)
> On Fri, Nov 09, 2001 at 08:04:24AM -0600, Dave Peticolas was heard to remark:
> > Index: src/backend/postgres/gncquery.c
> > ===================================================================
> > RCS file: /home/cvs/cvsroot/gnucash/src/backend/postgres/gncquery.c,v
> > retrieving revision 1.6
> > retrieving revision 1.7
> > diff -u -r1.6 -r1.7
> > --- src/backend/postgres/gncquery.c 2001/11/06 23:02:26 1.6
> > +++ src/backend/postgres/gncquery.c 2001/11/09 11:50:23 1.7
>
> > - /* determine whther the query will need to reference the account
> > - * or commodity tables. If it doesn't need them, then we can gain
> > - * a significant performance improvement by not specifying them.
> > - * The exact reason why this affects performance is a bit of a
> > - * mystery to me ... */
> > + /* determine whether the query will need to reference certain
> > + * tables. */
>
> I was somewhat disheartened to see this change; and would really like
> to get you to put it back. This routine would not exist at all, if I
> hadn't spent a week performance tuning the sql backend. Originally,
> I specified *all* of the tables that might be needed for a query; the
> code for this was very simple. But then on a hunch, I noticed that
> including a table that wasn't actually referenced could slow performance
> by a factor of 10 (!!!!).
Yes, I know, I was the one who originally pointed out to you
that removing unneeded tables would improve performance (you remember
the 10-minute startup times I saw?). However, upon reading a Postgres
book I found that in Postgres you don't actually need to use the FROM
clause at all, i.e., Postgres can figure out the tables it needs from
the rest of the query. So you only need to use FROM if you want to do
a self-join and need to specify a new name for a table. So the new code
only puts in FROM clauses for the self-joined commodity tables if
needed, the others are left out.
> > + /* add needed explicit tables. postgres can figure out the rest. */
>
> Bzzzt. I think this note says it all. Postgres can figure out the
> rest, but it will crater performance as it does so. You really, really
> have got to restore the missing tables.
Have you actually tested it to see if it craters performance? The
problem with the original queries was putting *unreferenced* tables
in the FROM clause, i.e., explicity telling Postgres to search a
table that didn't need to be. Postgres isn't going to search tables
that aren't referenced in the query at all.
> > - switch (xaccGUIDType (&pd->guid.guid, session))
> > +
> > + sq->pq = stpcpy (sq->pq, " (");
> > +
> > + if (guid_equal (&pd->guid.guid, xaccGUIDNULL ()))
> > {
> > - case GNC_ID_NONE:
> > - case GNC_ID_NULL:
> > - default:
> > - sq->pq = stpcpy(sq->pq, "FALSE ");
> > - break;
> > -
> > - case GNC_ID_ACCOUNT:
> > - sq->pq = stpcpy(sq->pq, "gncAccount.accountGuid = '");
> > - sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> > - sq->pq = stpcpy(sq->pq, "' ");
> > - break;
> > -
> > - case GNC_ID_TRANS:
> > - sq->pq = stpcpy(sq->pq, "gncTransaction.transGuid = '");
> > - sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> > - sq->pq = stpcpy(sq->pq, "' ");
> > - break;
> > -
> > - case GNC_ID_SPLIT:
> > - sq->pq = stpcpy(sq->pq, "gncEntry.entryGuid = '");
> > - sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> > - sq->pq = stpcpy(sq->pq, "' ");
> > - break;
> > + sq->pq = stpcpy(sq->pq, "FALSE ");
> > }
> > -
> > - if (0 == pd->guid.sense)
> > + else
> > {
> > - sq->pq = stpcpy (sq->pq, ") ");
> > + sq->pq = stpcpy(sq->pq, "gncAccount.accountGuid = '");
> > + sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> > + sq->pq = stpcpy(sq->pq, "' ");
> > + sq->pq = stpcpy(sq->pq, " OR ");
> > +
> > + sq->pq = stpcpy(sq->pq, "gncTransaction.transGuid = '");
> > + sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> > + sq->pq = stpcpy(sq->pq, "' ");
> > + sq->pq = stpcpy(sq->pq, " OR ");
> > +
> > + sq->pq = stpcpy(sq->pq, "gncEntry.entryGuid = '");
> > + sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> > + sq->pq = stpcpy(sq->pq, "' ");
>
> The above looks like another performance-cratering change. Why search
> all tables, when you can search the one that you need? I presume you
> made this change because you don't know which table to search ... but
> how can you possibly not know which table to search? Who would be
> stupid enough to ask 'get me this GUId, but I don't know if its a
> transaction or an account guid' ????
I am simply implementing the semantics of the Query predicate, which is
to match a split whose guid, transaction guid, or account guid matches
the given guid. That is what the C code implements. Don't be fooled by
the fact that the engine Query code uses xaccGUIDType -- that function
only works if the entity has already been loaded into the engine. The
sql backend can't assume that.
> > case PR_SHRS: {
> > PINFO("term is PR_SHRS");
> > sq->pq = stpcpy(sq->pq,
> > - "gncEntry.accountGuid = gncAccount.accountGuid AND "
> > "gncAccount.commodity = account_com.commodity AND ");
> > AMOUNT_TERM ("gncEntry.amount","account_com");
>
> ???? Don't you need to know which account is to be matched ??
Yes, but that is now added further up if need_account is true.
You also need that line if you want to sort on account fields,
so I figured to do it the same as need_entry.
dave
msg01367/pgp00000.pgp
Description: PGP signature
