Any SQL rewriting gurus know how I might be able to optimize this
query? The schema:

mysql> show columns from transactionlog;
+---------------+-------------------------------------------+------+-----+---------+----------------+
| Field         | Type                                      | Null |
Key | Default | Extra          |
+---------------+-------------------------------------------+------+-----+---------+----------------+
| id            | int(11)                                   | NO   |
PRI | NULL    | auto_increment |
| transactionid | varchar(10)                               | NO   |
MUL | NULL    |                |
| queryid       | tinyint(4)                                | NO   |
  | NULL    |                |
| tableid       | varchar(30)                               | NO   |
MUL | NULL    |                |
| tupleid       | int(11)                                   | NO   |
  | NULL    |                |
| querytype     | enum('select','insert','delete','update') | NO   |
  | NULL    |                |
| schemaname    | varchar(20)                               | YES  |
  | NULL    |                |
| partition     | tinyint(3) unsigned                       | YES  |
  | NULL    |                |
+---------------+-------------------------------------------+------+-----+---------+----------------+
8 rows in set (0.04 sec)

The query:

select concat(weight, ' ', ids, '\n')
from (
  select
    tableid,
    tupleid,
    group_concat(id separator ' ') as ids,
    (
      select count(distinct transactionid)
      from transactionlog
      where transactionid in (
        select transactionid
        from transactionlog
        where (tableid, tupleid, querytype) =
              (t.tableid, t.tupleid, 'update')
        group by transactionid
        having count(*) > 0
      )
    ) weight
  from transactionlog t
  group by tableid, tupleid
  having weight > 0 and count(*) > 1
) u;

This is the output of EXPLAIN and mk-visual-explain:

+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+------------------------------
----------------+
| id | select_type        | table          | type  | possible_keys |
key           | key_len | ref       | rows | Extra
                   |
+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+
|  1 | PRIMARY            | <derived2>     | ALL   | NULL          |
NULL          | NULL    | NULL      |   13 |
                   |
|  2 | DERIVED            | t              | ALL   | NULL          |
NULL          | NULL    | NULL      |   68 | Using filesort
                   |
|  3 | DEPENDENT SUBQUERY | transactionlog | index | NULL          |
transactionid | 12      | NULL      |   68 | Using where; Using index
                   |
|  4 | DEPENDENT SUBQUERY | transactionlog | ref   | tableid       |
tableid       | 36      | func,func |    2 | Using where; Using
temporary; Using filesort |
+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+
Table scan
rows           13
+- DERIVED
   table          derived(t,transactionlog,temporary(transactionlog))
   +- DEPENDENT SUBQUERY
      +- DEPENDENT SUBQUERY
      |  +- Filesort
      |  |  +- TEMPORARY
      |  |     table          temporary(transactionlog)
      |  |     +- Filter with WHERE
      |  |        +- Bookmark lookup
      |  |           +- Table
      |  |           |  table          transactionlog
      |  |           |  possible_keys  tableid
      |  |           +- Index lookup
      |  |              key            transactionlog->tableid
      |  |              possible_keys  tableid
      |  |              key_len        36
      |  |              ref            func,func
      |  |              rows           2
      |  +- Filter with WHERE
      |     +- Index scan
      |        key            transactionlog->transactionid
      |        key_len        12
      |        rows           68
      +- Filesort
         +- Table scan
            rows           68
            +- Table
               table          t

That is a lot of work. I can write the equivalent logic in Python
while making a single pass:

  results = query("""
    select tableid, tupleid, transactionid, id, querytype
    from transactionlog_2warehouse
  """)
  _tab, _tup = None
  ids = []
  weight = 0
  saw_upd = False
  for tab, tup, txn, id, qt in results:
    if (_tab, _tup) != (tab, tup):
      if len(ids) > 1 and weight > 0:
        print weight, ids
      weight = 0
      ids = []
      _txn = None
    if _txn != txn:
      saw_upd = False
    if qt == 'update' and not saw_upd:
      weight += 1
      saw_upd = True
    ids += [id]

Is it possible to achieve the Python single-pass performance using
pure SQL? Thanks in advance!
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to