So I'm jumping back into database stuff after a long absence.
(I guess you could say that I was never really *in* databases
to begin with. Anyway...) I'm working with various databases
and have settled on MySQL but am encountering a variety of
problems that I hope the MySQL community can help me solve.

First, a comment. In dealing with a variety of databases, I
was surprised to find the apparent lack of direct support for
international characters. So I've basically resolved to insert 
all of my international text as UTF-8 encoded BLOBs. Is this 
the general approach people use for this problem?

Second, I'm programming in Java and want to use JDBC. So I
downloaded mm.MySQL but was dismayed to find that it doesn't
support prepared statements. Is there any particular reason
why? 

I wanted to modify the source for mm.MySQL to convert to and
from UTF-8 streams when calling PreparedStatement#setString 
and ResultSet#getString so that I can write my program as if
the underlying database *did* support international chars. 
But without support for prepared statements, I don't know if 
I want to go through the trouble.

Next, I was wondering why multiple selects don't work in
MySQL. For example: "SELECT * FROM a WHERE value IN (SELECT
column FROM b);".

Last, I have a particular problem related to SQL that I
want to solve. This is not particularly related to MySQL 
but I am using MySQL as the database so any solution would 
have to be within the limits of what MySQL supports.

Here's the idea: I have a table for text content that may
be translated into various languages. Then I have a table
for user preferences that specify which languages the user
wants (and in what order). Lastly I have a mapping table
from language codes to their default language code. Make
sense so far? If not, here is the description of the three
tables:

mysql> describe content;
+-------+------------+------+-----+---------+----------------+
| Field | Type       | Null | Key | Default | Extra          |
+-------+------------+------+-----+---------+----------------+
| id    | int(11)    |      | PRI | NULL    | auto_increment |
| resid | int(11)    |      |     | 0       |                |
| lang  | varchar(5) |      |     |         |                |
| text  | blob       |      |     |         |                |
+-------+------------+------+-----+---------+----------------+

mysql> describe prefs;
+----------+-------------+------+-----+---------+-------+
| Field    | Type        | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| user     | varchar(50) |      |     |         |       |
| lang     | varchar(5)  |      |     |         |       |
| priority | tinyint(4)  |      |     | 1       |       |
+----------+-------------+------+-----+---------+-------+

mysql> describe langs;
+---------+---------+------+-----+---------+-------+
| Field   | Type    | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| code    | char(5) |      | PRI |         |       |
| defcode | char(5) |      |     |         |       |
+---------+---------+------+-----+---------+-------+

The last table, "langs", is used so that if a user's pref
is to read US English (en-us) and there is no content in
that language *but* there is one in Default English (en),
then they are returned that resource.

Here's a sampling of the data in these three tables:

mysql> select * from content;
+----+-------+-------+----------+
| id | resid | lang  | text     |
+----+-------+-------+----------+
|  1 |     1 | en-us | Color    |
|  2 |     1 | en-gb | Colour   |
|  3 |     1 | ja    | IRO      |
|  4 |     2 | en    | Mountain |
|  5 |     2 | ja    | YAMA     |
+----+-------+-------+----------+

mysql> select * from prefs;
+-------+-------+----------+
| user  | lang  | priority |
+-------+-------+----------+
| andyc | en-us |        1 |
| andyc | ja    |        2 |
+-------+-------+----------+

mysql> select * from langs;
+-------+---------+
| code  | defcode |
+-------+---------+
| en    | en-us   |
| en-us | en-us   |
| en-gb | en-us   |
| ja    | ja      |
+-------+---------+

As can be seen from the "prefs" table, I (being "andyc")
prefer to retrieve content in US English but can also read 
Japanese. Now if I only wanted the content that matched my
number one priority, then it would be easy. However... I
want to retrieve the content from my number one preference
*and* also know that the resource is available in my other
language preference(s).

The following select statement returns the information that
I want but there are a few problems. First, it returns all
of the information which is wasteful since I only want to
display the content from the first hit. Second, I'm sure
that crossing three tables is *not* a good idea in general.
But like I said, I'm kinda new to SQL so I don't know too
much about JOINs, etc. Anyway, here's the query and the
result:

mysql> select c.resid,c.lang,c.text,p.priority 
    -> from content as c,prefs as p,langs as l 
    -> where p.user='andyc' and 
    ->       (c.lang=p.lang or (c.lang=l.code and p.lang=l.defcode)) 
    -> group by c.resid,p.lang 
    -> order by resid,p.priority;
+-------+-------+----------+----------+
| resid | lang  | text     | priority |
+-------+-------+----------+----------+
|     1 | en-us | Color    |        1 |
|     1 | ja    | IRO      |        2 |
|     2 | en    | Mountain |        1 |
|     2 | ja    | YAMA     |        2 |
+-------+-------+----------+----------+

The first thing I'd like to know is what is the "right" way
of setting up my tables and then performing this kind of
query? The next problem is more a matter of programming...

Since returning all of the content seems pretty wasteful,
I'm thinking of using two select statements: one to return
the list of matches with priority 1 and another to return
the entire list of matched resources. But if I do this then
I don't know how many langauges match each resource. It
would be a lot easier if the original query could return
the number of matched languages as well.

I was thinking that the count() method could work but I
don't think I'm using it right. If I try to add a column
with "count(*)" or "count(c.resid)" or "count(p.lang)" I
get the following result, for example:

mysql> select c.resid,c.lang,c.text,p.priority,count(*) 
    -> from content as c,prefs as p,langs as l 
    -> where p.user='andyc' and p.priority=1 and 
    ->       (c.lang=p.lang or (c.lang=l.code and p.lang=l.defcode)) 
    -> group by c.resid,p.lang 
    -> order by resid,p.priority;
+-------+-------+----------+----------+----------+
| resid | lang  | text     | priority | count(*) |
+-------+-------+----------+----------+----------+
|     1 | en-us | Color    |        1 |        8 |
|     2 | en    | Mountain |        1 |        1 |
+-------+-------+----------+----------+----------+

I would like to have the count of the matched languages
per resid to be in the last column. Is this possible
with MySQL? with SQL in general? Do I need to restructure
my tables to be able to do this?

I'm sorry for the long-winded mail but I figured it was
best to try to solve as many problems as possible with
the fewest posts. :)

-- 
Andy Clark * [EMAIL PROTECTED]

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to