Re: [PERFORM] Q on views and performance

Kynn Jones Sat, 23 Feb 2008 08:23:25 -0800

Hi, Dean.  The system I'm working with is very similar "in spirit" to a
large multilingual dictionary covering 100 languages.  Using this analogy,
the "type" column would correspond to the language, and the zipk column
would correspond to some language-independent key associated with a concept
("concept key" for short).  So, if it were indeed a multilingual dictionary,
records in T would look like
  word   | zipk | language
---------+------+-----------
 house   | 1234 | <english>
 casa    | 1234 | <spanish>
 haus    | 1234 | <german>
 piano   | 2345 | <english>
 piano   | 2345 | <spanish>
 cat     | 3456 | <english>
 chat    | 3456 | <french>
 chat    | 4567 | <english>
 plausch | 4567 | <german>


...where I used the notation <lang> to denote "the integer id assigned to
language lang".  Therefore typically there are about 100 records in T for
any given zipk, one for each language.  But the correspondence is not
perfect, since, for example, some languages have, proverbially, more than
one word for snow, and some (maybe from some tropical island in the South
Pacific) have none.  (This last case, BTW, is what accounts for the use of
left joins, as will become clear in a minute.)

The table S can be thought of a table consisting of a collection of words to
be translated to some target language.  In the first type of query (Q1), all
the words in S are effectively declared to belong to the same source
language, whereas in the second type of query (Q2) the source language for
the words in S is left unspecified (in this case S may contain words from
various languages, or words--like "piano" or "chat" in the example
above--that belong simultaneously to different languages, and which may (e.g.
piano) or may not (e.g. chat) have the same zipk [concept key] for each of
these languages).

So, regarding your question about (Q1**) and (Q2**):

(Q1**)  SELECT a1.word, sq.word FROM
               S      JOIN T a1 USING ( word )
                 LEFT JOIN ( SELECT * FROM T a2
                             WHERE a2.type = <int2> ) sq USING ( zipk )
         WHERE a1.type = <int1>;

(Q2**)  SELECT a1.word, sq.word FROM
               S      JOIN T a1 USING ( word )
                 LEFT JOIN ( SELECT * FROM T a2
                             WHERE a2.type = <int2> ) sq USING ( zipk )

...the inner join with S is intended to pick out all the records in the
source table (either T<int1> in Q1** or T in Q2**) corresponding to words in
S, while the second (left) join, is there to find all the "translations" in
the target language.  I use a left join so that even those words in S for
which no translations exist will show up in the query results.

3. Why not write:
>
> CREATE VIEW txt AS
>   SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type AS
> type2
>     FROM T a1 [LEFT] JOIN T a2 USING( zipk );  -- Use "LEFT" if
> appropriate
> SELECT word1, word1
>   FROM S JOIN txt ON word = word1
>   WHERE type1 = <int1> AND type2 = <int2>;
>

This is would indeed produce the same results as Q1, but this approach would
require defining about 10,000 views, one for each possible pair of int1 and
int2 (or pair of languages, to continue the multilingual dictionary
analogy), which freaks me out for some reason.  (Actually, the number of
such views would be many more than that, because in the actual application
there is not just one T but several dozen, similar to what would happen to
the schema in the multilingual dictionary analogy if we wanted to
pre-segregate the words according to some categories, say a T for animals, a
T for fruits, a T for verbs, a T for professions, etc.)

(I need to do a bit more work before I can post the EXPLAIN results.)

kynn

Re: [PERFORM] Q on views and performance

Reply via email to