Any SQL rewriting gurus know how I might be able to optimize this query? The schema:
mysql> show columns from transactionlog; +---------------+-------------------------------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------+-------------------------------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | transactionid | varchar(10) | NO | MUL | NULL | | | queryid | tinyint(4) | NO | | NULL | | | tableid | varchar(30) | NO | MUL | NULL | | | tupleid | int(11) | NO | | NULL | | | querytype | enum('select','insert','delete','update') | NO | | NULL | | | schemaname | varchar(20) | YES | | NULL | | | partition | tinyint(3) unsigned | YES | | NULL | | +---------------+-------------------------------------------+------+-----+---------+----------------+ 8 rows in set (0.04 sec) The query: select concat(weight, ' ', ids, '\n') from ( select tableid, tupleid, group_concat(id separator ' ') as ids, ( select count(distinct transactionid) from transactionlog where transactionid in ( select transactionid from transactionlog where (tableid, tupleid, querytype) = (t.tableid, t.tupleid, 'update') group by transactionid having count(*) > 0 ) ) weight from transactionlog t group by tableid, tupleid having weight > 0 and count(*) > 1 ) u; This is the output of EXPLAIN and mk-visual-explain: +----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+------------------------------ ----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 13 | | | 2 | DERIVED | t | ALL | NULL | NULL | NULL | NULL | 68 | Using filesort | | 3 | DEPENDENT SUBQUERY | transactionlog | index | NULL | transactionid | 12 | NULL | 68 | Using where; Using index | | 4 | DEPENDENT SUBQUERY | transactionlog | ref | tableid | tableid | 36 | func,func | 2 | Using where; Using temporary; Using filesort | +----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+ Table scan rows 13 +- DERIVED table derived(t,transactionlog,temporary(transactionlog)) +- DEPENDENT SUBQUERY +- DEPENDENT SUBQUERY | +- Filesort | | +- TEMPORARY | | table temporary(transactionlog) | | +- Filter with WHERE | | +- Bookmark lookup | | +- Table | | | table transactionlog | | | possible_keys tableid | | +- Index lookup | | key transactionlog->tableid | | possible_keys tableid | | key_len 36 | | ref func,func | | rows 2 | +- Filter with WHERE | +- Index scan | key transactionlog->transactionid | key_len 12 | rows 68 +- Filesort +- Table scan rows 68 +- Table table t That is a lot of work. I can write the equivalent logic in Python while making a single pass: results = query(""" select tableid, tupleid, transactionid, id, querytype from transactionlog_2warehouse """) _tab, _tup = None ids = [] weight = 0 saw_upd = False for tab, tup, txn, id, qt in results: if (_tab, _tup) != (tab, tup): if len(ids) > 1 and weight > 0: print weight, ids weight = 0 ids = [] _txn = None if _txn != txn: saw_upd = False if qt == 'update' and not saw_upd: weight += 1 saw_upd = True ids += [id] Is it possible to achieve the Python single-pass performance using pure SQL? Thanks in advance! -- Yang Zhang http://www.mit.edu/~y_z/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org