Hi All,

The temporary table wasn't a better idea,
the problem is that if the search returns 1500+
rows ( on a 20 000 recs table), the population of
temporary table is still slow making no significant difference.

Well, I'm in a strange situation as for the moment.

I've created helper tables, eg to perform searches ordered by end_time,
I've created a separate table:

DROP TABLE IF EXISTS end_time;
CREATE TABLE end_time (
   lot_id        int unsigned not null,

   category_id   int unsigned not null,
   end_time      int unsigned not null,
   title         char(50)     not null,
   description   text         not null,

   INDEX    end_time   (end_time),

   FULLTEXT ttitle     (title),
   FULLTEXT ttitle_description (title, description)
) TYPE = MyISAM COMMENT = "Search helper";

And filled each with sample 20 000 records (instead of 50 000),
but this seems to be enough to estimate the speed.

It have helped much, the search times dropped twice, but 5-10
seconds were still too much.

Then, I've set key_cache_block_size to 2048, and run
`repair table end_time` (hmm now I understand, that
key_cache_block_size should influence the key indexes on the disk,
so it looks like repair haven't been needed).
Miracle ! Without joining with `category_map` and `lot`,
all the searches dropped to 0.01-0.02 seconds (WOW!), and with INNER JOINing
like

SELECT end_time.lot_id, end_time.description
FROM end_time
INNER JOIN category_map
ON category_map.child_id = end_time.category_id AND category_map.parent_id = 10
INNER JOIN lot
ON lot.id = end_time.lot_id
WHERE
MATCH(end_time.title, end_time.description) AGAINST ('+keyword' IN BOOLEAN MODE)
ORDER BY
end_time.end_time;


the searches don't cross the line of 0.50, most of them are 0.05-0.06 seconds.

It is mysql-4.1.6-gamma.
The exact steps to reproduce, can someone check that out ?
1. fill in the data with default config.
2. put key_cache_block_size = 2048 in my.cnf
3. restart server
4. check key_cache_block_size via SHOW variables LIKE '%block%'; For me there is 1536 (?!)
5. "SET GLOBAL key_cache_block_size = 2048";
6. go "REPAIR TABLE end_time" (the table with fulltext search you are querying)
7. here we are, all the MATCH() with ORDER BY queries take 0.01, only some of them are 0.02 secs,
not more


Sincerely,
Aleksandr Guidrevitch



[EMAIL PROTECTED] wrote:


I found your query hard to understand, however it seems the optimizer could read it just fine. I VERY MUCH dislike using the comma-separated list of table names to declare INNER JOINS. I think it allows me too much opportunity to accidentally create a Cartesian product by accidentally forgetting a term in my WHERE clause. I firmly believe that a missing ON clause is much easier to spot.


SELECT lot.id, search.title
FROM search
        LEFT JOIN lot
                ON lot.id = search.lot_id
                INNER JOIN category_map
                        ON category_map.child_id = lot.category_id
                        AND category_map.parent_id = 10
                INNER JOIN exchange_rate
                        ON exchange_rate.currency_id = lot.currency_id
WHERE MATCH(search.title, search.description)
        AGAINST ('some query' IN BOOLEAN MODE)
ORDER BY lot.current_price * exchange_rate.exchange_rate
LIMIT 0, 50

If I just describe how each table contributes to each query I see that you are going to scan every row of the "search" table and optionally match those to records in the "lot" table but only if those "lot" records also match up with both a "category_map" record and an "exchange_rate" record. Are you sure those are the records you want to query?

What I think you should do is run the full-text portion of the search first. Then join those results to "lot", "category_map", and "exchange_rate". If you have the same number of records in "search" as you have in "lot" (I think you said you are testing with 50000) then you reduce the JOIN overhead for your query by quite a bit. If you have 50000 search records and your FT search only returns 200 you just eliminated 49800 records from an additional table join.

CREATE TEMPORARY TABLE tmpLots(key(lotID))
SELECT DISTINCT lotID, title
FROM search
WHERE MATCH(search.title, search.description)
        AGAINST ('some query' IN BOOLEAN MODE)

SELECT l.ID, tl.title
FROM lot l
INNER JOIN tmpLots tl
        on tl.lotId = l.id
INNER JOIN category_map
        ON category_map.child_id = l.category_id
        AND category_map.parent_id = 10
INNER JOIN exchange_rate
        ON exchange_rate.currency_id = l.currency_id
Order by l.current_price * exchange_rate.exchange_rate

DROP TEMPORARY TABLE tmpLots

Now, to address the speed of your ORDER BY. This is a quote from http://dev.mysql.com/doc/mysql/en/ORDER_BY_optimization.html
>>>
If you want to increase ORDER BY speed, first see whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:


* Increase the size of the sort_buffer_size variable.
* Increase the size of the read_rnd_buffer_size variable.
* Change tmpdir to point to a dedicated filesystem with lots of empty space. If you use MySQL 4.1 or later, this option accepts several paths that are used in round-robin fashion. Paths should be separated by colon characters (`:') on Unix and semicolon characters (`;') on Windows, NetWare, and OS/2. You can use this feature to spread the load across several directories. Note: The paths should be for directories in filesystems that are located on different physical disks, not different partitions of the same disk.
<<<


Since you are computing the value you are ordering by, you cannot possibly use an index so I guess that leaves you with just the other three suggestions. You could possibly try creating another temp table to sort your calculated values and apply a BTREE index to that value (HASH indexes are not useful in ORDER BY optimization) so that your (top 50 ) query will return faster. But you have to compare the difference in performance between the current query and the process of creating the new temporary table, populating it, and querying against it. I am not sure the extra overhead involved in creating another indexed table will help (as the number of records involved are much fewer), but it could.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Aleksandr Guidrevitch <[EMAIL PROTECTED]> wrote on 11/05/2004 08:25:00 AM:

> Hi All, this was already posted on mysql forum "preformance",
> but forums are really slow, so sorry for the crosspost :)
>
>
> My objective is to implement quick, really quick complex fulltext search
> with 'order by' (< 2 seconds).
> The actual table I'd like to search is `lot`. I've created 2 helper
> tables : 1. `search`, which contains normalized (or stemmed) title and
> description with fulltext search index 2. `category_map` which covers
> all relations of categories, and is especially useful when I need to
> perform search on all children categories of current category.
>
> The total count of records in lot, and correspondingly in `search`,
> tables is expected to be 50 000, each around 2048 (stemmed description)
> + 50 (stemmed title) bytes.
>
> I need 1) fulltextsearch within all records, 2) category listing, 3)
> fulltextsearch within a category.
> Also, the result set should be *always* ordered either by lot.end_time,
> lot.current_price * exchange_rate, lot.title or lot.bid_count. The
> fulltextsearch relevance is not important, I'm ready to ignore it in
> favor of performance.
>
> Some time ago I've tried to put fulltext right inside the `lot` table,
> but it seemed to produce much longer lasting queries than the current,
> in plus in this case i was unable to LOAD INDEX INTO CACHE because of
> different key length of FULLTEXT and other table's indexes. That's why
> I've separated fields which I'd like to index fulltext into `search`
> table. But now I'm stuck with performance degradation of ORDER BY. I've
> tried to play with the related sort_buffer_size, myisam_sort_buffer_size
> and other params described in the docs related ORDER BY and filesort,
> but with no significant performance influence. I do not mind to create
> as many helper tables as needed, but I cant figure out what else can I
> improve. Another disappointing problem is that it is rather slow to
> generate 50 000 records for different table structures (around 2 hours)
> just to play with them for 10-20 mins.
>
> Here is what I came up with at the moment:
>
> SELECT lot.id, search.title FROM search LEFT JOIN lot ON lot.id =
> search.lot_id, category_map, exchange_rate WHERE MATCH(search.title,
> search.description) AGAINST ('some query' IN BOOLEAN MODE) AND
> category_map.child_id = lot.category_id AND category_map.parent_id = 10
> AND exchange_rate.currency_id = lot.currency_id ORDER BY
> lot.current_price * exchange_rate.exchange_rate LIMIT 0, 50
>
> Which produces almost acceptable results (which I'd like to improve
> further) if I omit ORDER BY statement. I do understand that the slowdown
> occurs mostly during filesort, and partially due to temporary table.
>
> The question is
> 1) Which is a desirable tables structure which will allow to perform
> fulltextsearch + ordering
> 2) Which is preferable LEFT JOIN .. ON, or WHERE ? I ahaven't noticed
> any significant difference trying both of them
>
> EXPLAIN:
>
> +----+-------------+---------------+----------+---------------------
> +-------------------+---------+--------------------------+------
> +----------------------------------------------+
>
> | id | select_type | table | type | possible_keys | key | key_len | ref
> | rows | Extra |
> +----+-------------+---------------+----------+---------------------
> +-------------------+---------+--------------------------+------
> +----------------------------------------------+
>
> | 1 | SIMPLE | search | fulltext | title_description | title_description
> | 0 | | 1 | Using where; Using temporary; Using filesort |
> | 1 | SIMPLE | lot | eq_ref | PRIMARY,category_id | PRIMARY | 4 |
> tauction.search.lot_id | 1 | Using where |
> | 1 | SIMPLE | exchange_rate | ref | currency_id | currency_id | 4 |
> tauction.lot.currency_id | 2 | Using index |
> | 1 | SIMPLE | category_map | ref | parent_id,child_id | child_id | 4 |
> tauction.lot.category_id | 12 | Using where |
> +----+-------------+---------------+----------+---------------------
> +-------------------+---------+--------------------------+------
> +----------------------------------------------+
>
> 4 rows in set (0.00 sec)
>
>
> Table structure as follows.
>
> DROP TABLE IF EXISTS lot;
> CREATE TABLE lot (
> id int unsigned not null primary key auto_increment,
> owner_id int unsigned not null default 0,
>
> title char(50) not null,
> description text not null,
>
> current_price decimal(16,2) unsigned not null,
> quantity int unsigned not null,
> bid_count int unsigned not null,
>
> start_time int unsigned not null,
> end_time int unsigned not null,
>
> city char(35) not null,
>
> category_id int unsigned not null,
> currency_id int unsigned not null,
> country_id int unsigned not null,
> state_id int unsigned not null,
> delivery_id int unsigned not null,
>
> enabled bool default 1 not null,
>
> INDEX title (title),
>
> INDEX current_price (current_price),
> INDEX bid_count (bid_count),
>
> INDEX end_time (end_time),
>
> INDEX category_id (category_id)
>
> ) TYPE = MyISAM COMMENT = "Lots";
>
> DROP TABLE IF EXISTS search;
> CREATE TABLE search (
> lot_id int unsigned not null,
>
> title char(50) not null,
> description text not null,
>
> FULLTEXT title (title),
> FULLTEXT title_description (title, description)
> ) TYPE = MyISAM COMMENT = "Search helper";
>
> DROP TABLE IF EXISTS category_map;
> CREATE TABLE category_map (
> parent_id int unsigned not null,
> child_id int unsigned not null,
> INDEX parent_id (parent_id),
> INDEX child_id (child_id)
> ) TYPE = MyISAM COMMENT = "Categories Map";
>
>
>
> Sincerely,
> Aleks
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>



-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to