I'd expect it to be quicker, especially in your situation where you
are updating potentially hundreds of records at a time.  If you have
250 records to update, today you're performing 500 queries - first a
select and then an update for each parentid.  This is one query for
all 250 records.

I haven't ever used INSERT - SELECT - ON DUPLICATE KEY UPDATE myself
for anything in production, but I have used its cousin, REPLACE, on a
couple of projects which have been in production for more than 5
years, and it's very fast.

Dan

On 9/26/06, Robert DiFalco <[EMAIL PROTECTED]> wrote:
Hmmmm....do you think this would perform better than simply using the
two queries? I wonder if the overhead associated with the ON DUPLICATE
KEY exception and handler might not outweigh the benefits of a single
query.

-----Original Message-----
From: Dan Buettner [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 26, 2006 7:15 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: Re: Updating two fields from an aggregate query

Robert, you might give "insert ... select ... on duplicate key update" a
try:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html

something like this (untested):

INSERT INTO parent (id, maxChildAge, childCount) SELECT parentid,
MAX(age) as maxAge, COUNT(*) as ct FROM child WHERE parentid IN (1, 2,
3, 4) ON DUPLICATE KEY UPDATE maxChildAge=maxAge, childCount=ct

That seems like it ought to work as long as the id column in the parent
table is a unique key.

One consideration is that if you are writing this query programmatically
and using a long list of parentid values in the IN clause, the SQL could
get pretty long, perhaps too long for the default value of
MAX_ALLOWED_PACKET.  You can adjust that up pretty easily, but bear in
mind you need to adjust it for both client and server.
http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html

HTH,
Dan

On 9/26/06, Robert DiFalco <[EMAIL PROTECTED]> wrote:
> I have two tables that are related:
>
> Parent
>    LONG id
>    LONG childCount
>    LONG maxChildAge
>    ...
>
> Child
>    LONG parentId
>    LONG age
>    ...
>
> There can be thousands of parents and millions of children, that is
> why I have denormalized "childCount" and "maxChildAge". The values are

> too expensive to calculate each time the data is viewed so I update
> these values each time a Child is added, removed, or modified.
>
> I currently have to update the Parent table with two queries like so:
>
>    SELECT MAX( Child.age ), COUNT(*)
>    FROM Child
>    WHERE parentID = <x>;
>
>    UPDATE Parent
>    SET maxChildAge = MAX, childCount = COUNT
>    WHERE id = <x>;
>
> Worse yet I might be updating the stats for several hundred Parents at

> a time, so I have to loop through the above where <x> is the current
> Parent.id in the batch. What I would like to do is something like the
> following (made up syntax):
>
>    UPDATE Parent
>    SET maxChildAge AND childCount = COUNT
>    WITH ( SELECT MAX( Child.age ), COUNT(*)
>       FROM Child
>         WHERE parentID = Parent.id )
>    WHERE id IN ( <set_of_parents_to_update> );
>
> Any suggestions?
>
> TIA
>
> R.
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]
>
>



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to