Hello Johan,
 Thanks for your comprehensive reply. I'll try to answer each of your
questions below.
-Hank

> > But if seeing some SQL will make you happy, here is just one example:
> >
> > UPDATE dest d straight_join source s set d.seq=s.seq WHERE d.key=s.key;
>
> See, this is why people ask to see your queries. You never mentioned you were 
> doing a join in the update :-)
>
> I'll ignore the join condition in the where clause, as it makes little 
> difference in this case; but I do note that
> you're using a straight join. Is the optimizer really reading the tables in 
> the wrong order, or is it just
> something you picked up off a blog without realising the impact?

Yes, I'm using a straight join intentionally.  I have 144 million
unindexed rows in "dest". I want Mysql to start with those rows
sequentially,  then join them to the matching record in "source" using
its index (244 million rows).  If I don't do that, mysql tries to use
the indexed table first, causing a full table scans on dest .  So with
straight_join, it does it in proper order.  During experimentation
with different joins, a regular join would run for days.  A straight
join runs for 3-4 hours.

> >  Source is indexed by key+seq (key is primary key, but seq is  included as 
> > a covering index).
>
> Good practice, that should prevent source from being read from disk, if your 
> index is fully in the cache.

With 244 million records in the "source" table, I'm not sure that
would fit in the cache.

> Do you have an index on dest.key, too? That might help performance as well if 
> it fits in memory, too,
> because you'll only need disk access for flushing writes, then.

I do not have an index on dest.key, also intentionally,  for two
reasons. First, updating 144 million records in place is slow enough,
but trying to update 144 million records AND the index on that field
would absolutely kill the performance of the update. Once the update
is complete, I re-create the index with a sort using myisamchk.
Second, the starting value of "dest.key" for all 144 million records
is "0" so an index on that field wouldn't really help, I think.

> > This query takes about 3.5 hours when I don't use LOCK TABLES, and over 4 
> > hours when I do
> > use LOCK TABLES.

> Most peculiar. Is the difference in performance consistent in repeated 
> executions?

yes, I've done these tests about a dozen times now, and while not
exactly scientific, the results are that LOCK TABLES always results in
longer running times. Not just for this query, but other full table
update/select/delete/insert queries.  Not more than twice as long, but
easily a 10% to 25% increase.

> > And before testing each run, I do restart the server so there is no
> > query caching and I also use FLUSH TABLES between each test run.
>
> That's good, as it will give you the worst-case scenario. However, since such 
> an update is going to wreck
>your index cache anyway, you may just as well preload the appropriate indices 
>into it beforehand, if the
>cache is sized big enough to hold them. That might give a minor performance 
>boost, too, as the server >won't have to go to disk every so often to fetch 
>index blocks.
> See http://dev.mysql.com/doc/refman/5.0/en/index-preloading.html for that.

I'll look into that, but the index is huge.  Ok, I just preloaded the
"source" index using "LOAD INDEX INTO CACHE source IGNORE LEAVES;"..
it took two minutes/15 seconds to pre-load the index. I then ran  two
small tests on smaller tables using the same update statement, and
they both yielded a massive increase in update speed. I'll have to
rebuild the large "dest" table again to try it on the biggest UPDATE,
but wow, if this is any indication, this was a great suggestion. I'll
report back on the results later today. Thank you!

> > | key_buffer_size          | 402653184 |
>
> 400MB of key buffer, that's not bad at all. Do a /show table status/ in your 
> database, and sum the index sizes. If your key buffer is larger than this 
> (and why not scale it for growth a bit?) all your indices will fit, which 
> will save on disk access for index lookups *and* for index-covered queries.

The index length for "source" is 5,889,037,312.

Thanks again for your assistance.

-Hank

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to