how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Mike Spreitzer
Suppose I have a table T with two column, S holding strings (say, 
VARCHAR(200)) and I holding integers.  No row appears twice.  A given 
string appears many times, on average about 100 times.  Suppose I have 
millions of rows.  I want to make a table U holding those same columns 
plus one more, J holding the next integer that T has for S (U having no 
row for the last integer of each string).  I could index T on (S,I) and 
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I 
and t12.I  t2.I)

but the query planner says this is quite expensive to run: it will 
enumerate all of T as t1, do a nested enumeration of all t2's entries for 
S=t1.S, and inside that do a further nested enumeration of t12's entries 
for S=t1.S --- costing about 10,000 times the size of T.  There has to be 
a better way!

Thanks,
Mike Spreitzer


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Peter Brawley

Mike

J holding the next integer that T has for S

You mean for each i, the next value of i with that s?

(U having no row for the last integer of each string).

I do not understand that at all.

PB


Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say, 
VARCHAR(200)) and I holding integers.  No row appears twice.  A given 
string appears many times, on average about 100 times.  Suppose I have 
millions of rows.  I want to make a table U holding those same columns 
plus one more, J holding the next integer that T has for S (U having no 
row for the last integer of each string).  I could index T on (S,I) and 
write this query as


select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I 
and t12.I  t2.I)


but the query planner says this is quite expensive to run: it will 
enumerate all of T as t1, do a nested enumeration of all t2's entries for 
S=t1.S, and inside that do a further nested enumeration of t12's entries 
for S=t1.S --- costing about 10,000 times the size of T.  There has to be 
a better way!


Thanks,
Mike Spreitzer

  




No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 06:53:00


  


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Johnny Withers
Huh???

On Saturday, June 20, 2009, Peter Brawley peter.braw...@earthlink.net wrote:
 Mike

J holding the next integer that T has for S

 You mean for each i, the next value of i with that s?

(U having no row for the last integer of each string).

 I do not understand that at all.

 PB


 Mike Spreitzer wrote:

 Suppose I have a table T with two column, S holding strings (say, 
 VARCHAR(200)) and I holding integers.  No row appears twice.  A given string 
 appears many times, on average about 100 times.  Suppose I have millions of 
 rows.  I want to make a table U holding those same columns plus one more, J 
 holding the next integer that T has for S (U having no row for the last 
 integer of each string).  I could index T on (S,I) and write this query as

 select t1.*, t2.I as J from T as t1, T as t2
 where t1.S=t2.S and t1.I  t2.I
 and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I and 
 t12.I  t2.I)

 but the query planner says this is quite expensive to run: it will enumerate 
 all of T as t1, do a nested enumeration of all t2's entries for S=t1.S, and 
 inside that do a further nested enumeration of t12's entries for S=t1.S --- 
 costing about 10,000 times the size of T.  There has to be a better way!

 Thanks,
 Mike Spreitzer

   


 No virus found in this incoming message.
 Checked by AVG - www.avg.com http://www.avg.com Version: 8.5.364 / Virus 
 Database: 270.12.80/2187 - Release Date: 06/19/09 06:53:00





-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Mike Spreitzer
Yes, for each (S, I) pair the goal is to efficiently find the next largest 
integer associated with S in T.  For the highest integer I associated with 
S in T, there is no next larger.

Thanks,
Mike Spreitzer




Peter Brawley peter.braw...@earthlink.net 
06/20/09 08:56 AM
Please respond to
peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?






Mike

J holding the next integer that T has for S 

You mean for each i, the next value of i with that s?

(U having no row for the last integer of each string).

I do not understand that at all.

PB


Mike Spreitzer wrote: 
Suppose I have a table T with two column, S holding strings (say, 
VARCHAR(200)) and I holding integers.  No row appears twice.  A given 
string appears many times, on average about 100 times.  Suppose I have 
millions of rows.  I want to make a table U holding those same columns 
plus one more, J holding the next integer that T has for S (U having no 
row for the last integer of each string).  I could index T on (S,I) and 
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I 
and t12.I  t2.I)

but the query planner says this is quite expensive to run: it will 
enumerate all of T as t1, do a nested enumeration of all t2's entries for 
S=t1.S, and inside that do a further nested enumeration of t12's entries 
for S=t1.S --- costing about 10,000 times the size of T.  There has to be 
a better way!

Thanks,
Mike Spreitzer

 



No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 
06:53:00

 


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Peter Brawley

Mike,

Yes, for each (S, I) pair the goal is to efficiently find the next largest 
integer associated with S in T.  For the highest integer I associated with 
S in T, there is no next larger.


Here's a more efficient query for the next i values with matching s values:

SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
 SELECT a.i, MIN(b.i) AS j
 FROM t AS a
 JOIN t AS b ON b.i  a.i AND a.s = b.s
 GROUP BY  a.i
) AS n JOIN t USING (i);

PB

-

Mike Spreitzer wrote:
Yes, for each (S, I) pair the goal is to efficiently find the next largest 
integer associated with S in T.  For the highest integer I associated with 
S in T, there is no next larger.


Thanks,
Mike Spreitzer




Peter Brawley peter.braw...@earthlink.net 
06/20/09 08:56 AM

Please respond to
peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?







Mike

  
J holding the next integer that T has for S 



You mean for each i, the next value of i with that s?

  

(U having no row for the last integer of each string).



I do not understand that at all.

PB


Mike Spreitzer wrote: 
Suppose I have a table T with two column, S holding strings (say, 
VARCHAR(200)) and I holding integers.  No row appears twice.  A given 
string appears many times, on average about 100 times.  Suppose I have 
millions of rows.  I want to make a table U holding those same columns 
plus one more, J holding the next integer that T has for S (U having no 
row for the last integer of each string).  I could index T on (S,I) and 
write this query as


select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I 
and t12.I  t2.I)


but the query planner says this is quite expensive to run: it will 
enumerate all of T as t1, do a nested enumeration of all t2's entries for 
S=t1.S, and inside that do a further nested enumeration of t12's entries 
for S=t1.S --- costing about 10,000 times the size of T.  There has to be 
a better way!


Thanks,
Mike Spreitzer

 




No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 
06:53:00


 

  




No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 06:15:00


  


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Mike Spreitzer
Ah, yes, the MIN should be very helpful.  Can I expect that ordering the 
storage by (S, I) or having an (S, I) index will make that MIN take O(1) 
time, for both MyISAM and InnoDB?

I do not follow why you suggested a join to get the associated S, that can 
be done in the original query (and I did NOT say a given integer I is 
associated with only one string S):

SELECT a.s, a.i, MIN(b.i) AS j 
FROM t AS a 
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

Thanks,
Mike Spreitzer




Peter Brawley peter.braw...@earthlink.net 
06/20/09 12:39 PM
Please respond to
peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?






Mike,
Yes, for each (S, I) pair the goal is to efficiently find the next 
largest 
integer associated with S in T.  For the highest integer I associated 
with 
S in T, there is no next larger.
Here's a more efficient query for the next i values with matching s 
values:

SELECT a.i, MIN(b.i) AS j 
FROM t AS a 
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
  SELECT a.i, MIN(b.i) AS j 
  FROM t AS a 
  JOIN t AS b ON b.i  a.i AND a.s = b.s
  GROUP BY  a.i
) AS n JOIN t USING (i);

PB

-

Mike Spreitzer wrote: 
Yes, for each (S, I) pair the goal is to efficiently find the next largest 

integer associated with S in T.  For the highest integer I associated with 

S in T, there is no next larger.

Thanks,
Mike Spreitzer




Peter Brawley peter.braw...@earthlink.net 
06/20/09 08:56 AM
Please respond to
peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?






Mike

 
J holding the next integer that T has for S 
 

You mean for each i, the next value of i with that s?

 
(U having no row for the last integer of each string).
 

I do not understand that at all.

PB


Mike Spreitzer wrote: 
Suppose I have a table T with two column, S holding strings (say, 
VARCHAR(200)) and I holding integers.  No row appears twice.  A given 
string appears many times, on average about 100 times.  Suppose I have 
millions of rows.  I want to make a table U holding those same columns 
plus one more, J holding the next integer that T has for S (U having no 
row for the last integer of each string).  I could index T on (S,I) and 
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I 
and t12.I  t2.I)

but the query planner says this is quite expensive to run: it will 
enumerate all of T as t1, do a nested enumeration of all t2's entries for 
S=t1.S, and inside that do a further nested enumeration of t12's entries 
for S=t1.S --- costing about 10,000 times the size of T.  There has to be 
a better way!

Thanks,
Mike Spreitzer

 



No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 

06:53:00

 

 



No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 
06:15:00

 


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Peter Brawley

I do not follow why you suggested a join to get the associated S,
that can be done in the original query (and I did NOT say a given
integer I is associated with only one string S):

A Group By query returns arbitrary values for a column which (i) does 
not Group By, (ii) does not aggregate, and (iii) does not have a 1:1 
relationship with the grouping expression.


PB

-

Mike Spreitzer wrote:


Ah, yes, the MIN should be very helpful.  Can I expect that ordering 
the storage by (S, I) or having an (S, I) index will make that MIN 
take O(1) time, for both MyISAM and InnoDB?


I do not follow why you suggested a join to get the associated S, that 
can be done in the original query (and I did NOT say a given integer I 
is associated with only one string S):


SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

Thanks,
Mike Spreitzer



*Peter Brawley peter.braw...@earthlink.net*

06/20/09 12:39 PM
Please respond to
peter.braw...@earthlink.net



To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
	Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?










Mike,
Yes, for each (S, I) pair the goal is to efficiently find the next 
largest
integer associated with S in T.  For the highest integer I associated 
with

S in T, there is no next larger.
Here's a more efficient query for the next i values with matching s 
values:


SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
 SELECT a.i, MIN(b.i) AS j
 FROM t AS a
 JOIN t AS b ON b.i  a.i AND a.s = b.s
 GROUP BY  a.i
) AS n JOIN t USING (i);

PB

-

Mike Spreitzer wrote:
Yes, for each (S, I) pair the goal is to efficiently find the next 
largest
integer associated with S in T.  For the highest integer I associated 
with

S in T, there is no next larger.

Thanks,
Mike Spreitzer




Peter Brawley _peter.braw...@earthlink.net_ 
mailto:peter.braw...@earthlink.net

06/20/09 08:56 AM
Please respond to
_peter.braw...@earthlink.net_ mailto:peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
_my...@lists.mysql.com_ mailto:mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?






Mike

 
J holding the next integer that T has for S
   


You mean for each i, the next value of i with that s?

 
(U having no row for the last integer of each string).
   


I do not understand that at all.

PB


Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers.  No row appears twice.  A given
string appears many times, on average about 100 times.  Suppose I have
millions of rows.  I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string).  I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I
and t12.I  t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for
S=t1.S, and inside that do a further nested enumeration of t12's entries
for S=t1.S --- costing about 10,000 times the size of T.  There has to be
a better way!

Thanks,
Mike Spreitzer





No virus found in this incoming message.
Checked by AVG - _www.avg.com_ http://www.avg.com/
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 
06/19/09

06:53:00



 





No virus found in this incoming message.
Checked by AVG - _www.avg.com_ http://www.avg.com/
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 
06/20/09 06:15:00


 




No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 06:15:00


  


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Mike Spreitzer
Oops, I did not read your original query closely enough.  You actually 
meant to group by S, not I, right?  I can get S, I, and J with


SELECT a.s, a.i, MIN(b.i) AS j 
FROM t AS a 
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.s

Right?

My integers are not unique; a given integer can be paired with several 
strings.

Thanks,
Mike Spreitzer
SMTP: mspre...@us.ibm.com, Lotus Notes: Mike Spreitzer/Watson/IBM
Office phone: +1-914-784-6424 (IBM T/L 863-)
AOL Instant Messaging: M1k3Sprtzr



Peter Brawley peter.braw...@earthlink.net 
06/20/09 03:59 PM
Please respond to
peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?






I do not follow why you suggested a join to get the associated S, 
that can be done in the original query (and I did NOT say a given 
integer I is associated with only one string S): 

A Group By query returns arbitrary values for a column which (i) does not 
Group By, (ii) does not aggregate, and (iii) does not have a 1:1 
relationship with the grouping expression.

PB

-

Mike Spreitzer wrote: 

Ah, yes, the MIN should be very helpful.  Can I expect that ordering the 
storage by (S, I) or having an (S, I) index will make that MIN take O(1) 
time, for both MyISAM and InnoDB? 

I do not follow why you suggested a join to get the associated S, that can 
be done in the original query (and I did NOT say a given integer I is 
associated with only one string S): 

SELECT a.s, a.i, MIN(b.i) AS j 
FROM t AS a 
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i 

Thanks, 
Mike Spreitzer



Peter Brawley peter.braw...@earthlink.net 
06/20/09 12:39 PM 

Please respond to
peter.braw...@earthlink.net



To
Mike Spreitzer/Watson/i...@ibmus 
cc
mysql@lists.mysql.com 
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?








Mike, 
Yes, for each (S, I) pair the goal is to efficiently find the next 
largest 
integer associated with S in T.  For the highest integer I associated 
with 
S in T, there is no next larger. 
Here's a more efficient query for the next i values with matching s 
values:

SELECT a.i, MIN(b.i) AS j 
FROM t AS a 
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
 SELECT a.i, MIN(b.i) AS j 
 FROM t AS a 
 JOIN t AS b ON b.i  a.i AND a.s = b.s
 GROUP BY  a.i
) AS n JOIN t USING (i);

PB

-

Mike Spreitzer wrote: 
Yes, for each (S, I) pair the goal is to efficiently find the next largest 

integer associated with S in T.  For the highest integer I associated with 

S in T, there is no next larger.

Thanks,
Mike Spreitzer




Peter Brawley peter.braw...@earthlink.net 
06/20/09 08:56 AM
Please respond to
peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?






Mike

  
J holding the next integer that T has for S 


You mean for each i, the next value of i with that s?

  
(U having no row for the last integer of each string).


I do not understand that at all.

PB


Mike Spreitzer wrote: 
Suppose I have a table T with two column, S holding strings (say, 
VARCHAR(200)) and I holding integers.  No row appears twice.  A given 
string appears many times, on average about 100 times.  Suppose I have 
millions of rows.  I want to make a table U holding those same columns 
plus one more, J holding the next integer that T has for S (U having no 
row for the last integer of each string).  I could index T on (S,I) and 
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I 
and t12.I  t2.I)

but the query planner says this is quite expensive to run: it will 
enumerate all of T as t1, do a nested enumeration of all t2's entries for 
S=t1.S, and inside that do a further nested enumeration of t12's entries 
for S=t1.S --- costing about 10,000 times the size of T.  There has to be 
a better way!

Thanks,
Mike Spreitzer





No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 

06:53:00



  



No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 
06:15:00

  



No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 
06:15:00

 


Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Peter Brawley

Mike,

Oops, I did not read your original query closely enough. 
You actually meant to group by S, not I, right? 

No, it's a query for next i values with matching s values, so it groups 
by i.


I can get S, I, and J with

SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.s

For this dataset ...

drop table if exists t;
create table t(i int,s char(1));
insert into t 
values(1,'a'),(2,'b'),(3,'c'),(4,'a'),(5,'a'),(6,'d'),(7,'e'),(8,'d');


are these the correct next values of i?

+--+--+
| i| j|
+--+--+
|1 |4 |
|4 |5 |
|6 |8 |
+--+--+

Your query doesn't return that.

PB

-

Mike Spreitzer wrote:


Oops, I did not read your original query closely enough.  You actually 
meant to group by S, not I, right?  I can get S, I, and J with



SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.s

Right?

My integers are not unique; a given integer can be paired with several 
strings.


Thanks,
Mike Spreitzer
SMTP: mspre...@us.ibm.com, Lotus Notes: Mike Spreitzer/Watson/IBM
Office phone: +1-914-784-6424 (IBM T/L 863-)
AOL Instant Messaging: M1k3Sprtzr


*Peter Brawley peter.braw...@earthlink.net*

06/20/09 03:59 PM
Please respond to
peter.braw...@earthlink.net



To
Mike Spreitzer/Watson/i...@ibmus
cc
mysql@lists.mysql.com
Subject
	Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?










I do not follow why you suggested a join to get the associated S,
that can be done in the original query (and I did NOT say a given
integer I is associated with only one string S):

A Group By query returns arbitrary values for a column which (i) does 
not Group By, (ii) does not aggregate, and (iii) does not have a 1:1 
relationship with the grouping expression.


PB

-

Mike Spreitzer wrote:

Ah, yes, the MIN should be very helpful.  Can I expect that ordering 
the storage by (S, I) or having an (S, I) index will make that MIN 
take O(1) time, for both MyISAM and InnoDB?


I do not follow why you suggested a join to get the associated S, that 
can be done in the original query (and I did NOT say a given integer I 
is associated with only one string S):


SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

Thanks,
Mike Spreitzer


*Peter Brawley **_peter.braw...@earthlink.net_* 
mailto:peter.braw...@earthlink.net


06/20/09 12:39 PM
Please respond to_
__peter.braw...@earthlink.net_ mailto:peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc
_my...@lists.mysql.com_ mailto:mysql@lists.mysql.com
Subject
	Re: how to efficiently query for the next in MySQL Community Edition 
5.1.34?












Mike,
Yes, for each (S, I) pair the goal is to efficiently find the next 
largest
integer associated with S in T.  For the highest integer I associated 
with

S in T, there is no next larger.
Here's a more efficient query for the next i values with matching s 
values:


SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i  a.i AND a.s = b.s
GROUP BY  a.i
) AS n JOIN t USING (i);

PB

-

Mike Spreitzer wrote:
Yes, for each (S, I) pair the goal is to efficiently find the next 
largest
integer associated with S in T.  For the highest integer I associated 
with

S in T, there is no next larger.

Thanks,
Mike Spreitzer




Peter Brawley _peter.braw...@earthlink.net_ 
mailto:peter.braw...@earthlink.net

06/20/09 08:56 AM
Please respond to_
__peter.braw...@earthlink.net_ mailto:peter.braw...@earthlink.net


To
Mike Spreitzer/Watson/i...@ibmus
cc_
__my...@lists.mysql.com_ mailto:mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?






Mike

 
J holding the next integer that T has for S
   


You mean for each i, the next value of i with that s?

 
(U having no row for the last integer of each string).
   


I do not understand that at all.

PB


Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers.  No row appears twice.  A given
string appears many times, on average about 100 times.  Suppose I have
millions of rows.  I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string).  I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I  t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I  t12.I
and t12.I  t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for

Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

2009-06-20 Thread Mike Spreitzer
Sorry I have not been careful enough.  Following is a very concrete, 
worked example -- so I think I have finally gotten the bugs out.  After 
the example I resume with unanswered questions.

Remember I did not say each integer appears only once, and consider this 
dataset:

create table t (s char(1), i int);
insert into t values ('a', 1), ('b', 1), ('b', 3), ('c', 3), ('b', 4), 
('a', 5), ('c', 5);
mysql select * from t;
+--+--+
| s| i|
+--+--+
| a|1 | 
| b|1 | 
| b|3 | 
| c|3 | 
| b|4 | 
| a|5 | 
| c|5 | 
+--+--+
7 rows in set (0.00 sec)

Here is the ineffecient way to compute the desired answer:

mysql select t1.*, t2.i as j from t as t1, t as t2 where t1.s=t2.s and 
t1.i  t2.i and not exists (select * from t as t12 where t12.s=t1.s and 
t1.i  t12.i  and t12.i  t2.i);
+--+--+--+
| s| i| j|
+--+--+--+
| b|1 |3 | 
| b|3 |4 | 
| a|1 |5 | 
| c|3 |5 | 
+--+--+--+
4 rows in set (0.01 sec)

Here is the better way, using min() (the order of the rows is unimportant 
here):

mysql select t1.*, min(t2.i) as j from t as t1, t as t2 where t1.s=t2.s 
and t1.it2.i group by t1.s, t1.i;
+--+--+--+
| s| i| j|
+--+--+--+
| a|1 |5 | 
| b|1 |3 | 
| b|3 |4 | 
| c|3 |5 | 
+--+--+--+
4 rows in set (0.00 sec)

Code that assumes uniqueness of the integers does not work:

mysql SELECT a.i, MIN(b.i) AS j 
- FROM t AS a 
- JOIN t AS b ON b.i  a.i AND a.s = b.s
- GROUP BY  a.i;
+--+--+
| i| j|
+--+--+
|1 |3 | 
|3 |4 | 
+--+--+
2 rows in set (0.00 sec)


What remains unclear to me is how fast the correct min-based query (the 
better way above) will run.  You and I know that we could walk a BTREE 
on (s, i) and compute the answer in linear time.  As I read the MySQL 
documentation, however, this query does not fit the constraints for fast 
range-based indexing into t2 because the t1.i  t2.i comparison does not 
compare t2's value with something that meets the criteria for a 
constant.  It looks like the query planner plans to do a nested 
enumeration of the integers associated with s, for each (s, i) row in the 
outer enumeration:

mysql alter table t add primary key (s, i);
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql explain select t1.*, min(t2.i) as j from t as t1, t as t2 where 
t1.s=t2.s and t1.it2.i group by t1.s, t1.i;
++-+---+---+---+-+-+-+--+--+
| id | select_type | table | type  | possible_keys | key | key_len | 
ref | rows | Extra|
++-+---+---+---+-+-+-+--+--+
|  1 | SIMPLE  | t1| index | PRIMARY   | PRIMARY | 5   | 
NULL|7 | Using index  | 
|  1 | SIMPLE  | t2| ref   | PRIMARY   | PRIMARY | 1   | 
mjs090605a.t1.s |3 | Using where; Using index | 
++-+---+---+---+-+-+-+--+--+

That is an improvement over my original formulation:

mysql explain select t1.*, t2.i as j from t as t1, t as t2 where 
t1.s=t2.s and t1.i  t2.i and not exists (select * from t as t12 where 
t12.s=t1.s and t1.i  t12.i  and t12.i  t2.i);
+++---+---+---+-+-+-+--+--+
| id | select_type| table | type  | possible_keys | key | 
key_len | ref | rows | Extra|
+++---+---+---+-+-+-+--+--+
|  1 | PRIMARY| t1| index | PRIMARY   | PRIMARY | 5  | 
NULL|7 | Using index  | 
|  1 | PRIMARY| t2| ref   | PRIMARY   | PRIMARY | 1  | 
mjs090605a.t1.s |3 | Using where; Using index | 
|  2 | DEPENDENT SUBQUERY | t12   | ref   | PRIMARY   | PRIMARY | 1  | 
mjs090605a.t1.s |3 | Using where; Using index | 
+++---+---+---+-+-+-+--+--+

We have reduced the time complexity from O( (size of T) * (avg num 
integers per string)^2 ) to O( (size of T) * (avg num integers per 
string)^1 ).  That's great.  We have saved about a factor of 100 in my 
real application (a given string is paired with something on the order of 
100 different integers).  But we could save another factor of 100.  How do 
I save (even a portion of) that?

Thanks,
Mike Spreitzer




Peter Brawley peter.braw...@earthlink.net 
06/20/09